Table of Contents
- Theoretical Foundations
- System Overview
- Crate Structure
- Data Types
- Compilation Pipeline
- Execution Pipeline
- CUDA Kernels
- Memory Management
- Key Algorithms
- Public APIs
- Dataflow Diagram
- Error Handling
- Configuration
- Testing
- See Also
Glossary
For release and version context around technical terms used throughout this document (SCC, RIR, PIR, CNF, XGCF, WMC, Decision-DNNF, DLPack, Arrow IPC, Arrow C Data Interface, Semi-Naive, HISA, etc.), see the Roadmap.Theoretical Foundations
This section describes the formal semantics, design principles, and research foundations underlying XLOG.Design Goals
XLOG is designed around these core principles:| Goal | Description |
|---|---|
| GPU-Resident Execution | All semantic evaluation data structures (facts, derived relations, solver state, circuit values) remain GPU-resident during execution. Host involvement is limited to orchestration, I/O, and compilation. |
| Formal Semantics with Explicit Tiers | XLOG provides explicit semantics choices and “tiers” (exact vs approximate) per subsystem, with machine-checkable boundaries. |
| Practical Implementability | A staged implementation plan delivered value early (GPU Datalog + probabilistic facts) and now ships GPU-native epistemic execution, with bounded honest limits at the genuine semantic boundaries. |
| Robustness and Verifiability | Where “exactness” is claimed, XLOG includes proof/certificate artifacts or cross-check capability. |
Declarative Programming Paradigms
XLOG is a unified platform spanning four closely-related reasoning paradigms:| Subsystem | Purpose | Primary Inspirations |
|---|---|---|
| xlog-logic | Deterministic Datalog-style recursion and stratified negation | GPUlog (HISA indexing), VFLog (columnar GPU Datalog) |
| xlog-prob | Probabilistic + differentiable reasoning | ProbLog knowledge compilation (d-DNNF/SDD/BDD), WMC |
| EIR/GPU epistemic | Accepted epistemic execution surface with explicit EIR, FAEEL default semantics, Gelfond 1991 compatibility mode, split/joint planning, and GPU-backed runtime paths | eclingo (Gelfond 1991 context), FAEEL founded world views |
| xlog-solve | SAT/MaxSAT solver services (GPU CDCL verifier + CLS) | Deterministic CDCL (watched literals + 1-UIP + on-GPU model/proof validation), FastFourierSAT-inspired CLS |
Intermediate Representations
XLOG uses a layered IR stack to keep the platform coherent across paradigms:| IR | Purpose | Key Structures |
|---|---|---|
| RIR | Relational operations for deterministic logic | Unit (identity), Scan, Filter, Project, Join, GroupBy, Union, Distinct, Diff, Fixpoint, TensorMaskedJoin |
| PIR | Provenance tracking for probabilistic inference | PIR_Lit, PIR_NegLit, PIR_And, PIR_Or, PIR_Decision — weighted Boolean formula / circuit terms |
| EIR | Epistemic reasoning structures for the accepted GPU-backed surface | epistemic literals/modes, executable plans, split plans, world-view candidates, GPU runtime traces |
| SIR | Boolean satisfiability and optimization | SIR_CNF, SIR_Cardinality, SIR_Weights, SIR_Objective, SIR_ProofPolicy |
| XGCF | GPU-evaluable circuit format | Levelized DAG with node_type[], value[], adj[] for forward/backward evaluation |
- Estimated cardinality ranges
- Memory peak estimates
- Skew hints (for join partitioning)
- Incremental update semantics (delta/full)
- Tier requirements (exact vs approximate)
Formal Semantics
Deterministic Logic (xlog-logic)
XLOG implements Datalog with stratified negation:
- Evaluation model: Semi-naive fixpoint iteration
- Negation: Must be stratifiable (no cycles through negation)
- Aggregation: Stratified (no recursion through aggregates)
- Exactness: Exact for the supported fragment
Probabilistic Logic (xlog-prob)
XLOG implements ProbLog-style distribution semantics:
- Base: Each probabilistic fact
p::f.independently holds with probabilityp - Semantics: The probability of a query is the sum of probabilities of all worlds where the query holds
- Evaluation: Weighted Model Counting (WMC) via knowledge compilation
| Tier | Name | Description | Requirements |
|---|---|---|---|
| Exact circuit tier | Exact (circuit-evaluable) | Programs compiled into decomposable circuits evaluated exactly on GPU | Compilation succeeds; positive rule bodies only |
| Exact structural tier | Exact (restricted structure) | Acyclic probabilistic dependencies or bounded-treewidth fragments | Structural restrictions satisfied |
| Approximate sampling tier | Approximate | Monte Carlo sampling with GPU-parallel estimators and calibrated uncertainty | Any program; results include confidence intervals |
- LIT node:
v = log(w_lit)with evidence masking - AND node:
v = v_a + v_b + … - OR node:
v = logsumexp(v_a, v_b, …) - DECISION node:
v = logsumexp(log(p(var=1)) + v_true, log(p(var=0)) + v_false)
- Initialize
adj[root] = dL/dv_rootfrom the loss - AND: distribute
adjunchanged to children - OR:
adj[child_j] += adj[parent] * exp(v_child_j - v_parent)(softmax weights) - Gradients flow back to neural predicates via PyTorch custom autograd
Epistemic Logic (EIR/GPU Accepted Surface)
XLOG implements an accepted epistemic logic programming surface with world views through explicit EIR and GPU-backed execution paths:- Modal operators:
know atom(...),possible atom(...),not know atom(...),not possible atom(...), and finite nested modal chains normalized by parity/duality. - Default semantics: FAEEL-style founded world views, including foundedness gates for self-support.
- Compatibility mode: Gelfond 1991 semantics for explicit interoperability, including accepted self-supported
possiblecases in compatibility mode. - Execution surface: EIR-derived candidate generation, value-level tuple-key membership, safe split/joint solving, same-name multi-arity disambiguation, determined-head stratification, and GPU-backed runtime/WCOJ dispatch for accepted epistemic programs.
| Tier | Name | Description | Feasibility Bounds |
|---|---|---|---|
| Exact structural tier | Exact (structural) | Epistemically stratified programs or successful splitting | k <= 24 epistemic atoms per component |
| Exact bounded-enumeration tier | Exact (bounded enumeration) | Bounded candidate enumeration with propagation | k <= 16 or |candidates| <= 50,000 after propagation |
| Approximate sampling tier | Approximate | Sampling-based world-view approximation | Any program; explicit UNKNOWN/approx labeling |
- Normalize and split: Apply epistemic splitting to decompose into components
- Generate: Represent guesses as bitvectors over epistemic atoms (GPU-friendly)
- Propagate: Structural propagation (modal consistency) + semantic propagation (batched solver calls)
- Test: Verify candidates against brave/cautious consequences
XGCF: GPU Circuit Format
The XLOG GPU Circuit Format (XGCF) is a levelized DAG representation optimized for parallel GPU evaluation:circuit_offsets for batch evaluation. Neural batch dimensions are expressed as multiple leaf-weight tables with fused kernels for leaf-gather.
Knowledge Compilation Pipeline
The probabilistic exact inference path uses knowledge compilation:Memory Residency Contract
XLOG enforces strict GPU residency by default: Strict GPU-Resident Mode (default):- All runtime semantic state (relations, deltas, circuit values, solver state) must fit in device memory
- If the plan cannot execute within GPU memory budget, XLOG returns a deterministic
RESOURCE_EXHAUSTEDerror with diagnostics
--allow-ooc):
- XLOG may spill immutable intermediates to host-pinned memory
- Never used silently; performance not guaranteed
Research Foundations
XLOG builds on established research in GPU-accelerated databases and probabilistic logic programming:| System/Paper | Contribution to XLOG |
|---|---|
| GPUlog | HISA (Hash-Indexed Sorted Array) for efficient range queries, lock-free deduplication, parallel iteration. Reports up to 45× speedups. |
| VFLog | Column-oriented GPU Datalog runtime. Demonstrates 200× gains over CPU column engines. |
| mnmgDatalog | Multi-node multi-GPU Datalog. Radix-hash partitioning, GPU-aware all-to-all. |
| ProbLog | Knowledge compilation approach to probabilistic logic. Compile-once evaluate-many pattern. |
| Neural-symbolic AI | Neural predicates integrated with probabilistic logic. End-to-end differentiable inference. |
| D4 | State-of-the-art Decision-DNNF compiler for weighted model counting. |
| eclingo | Epistemic logic solver implementing Gelfond 1991 semantics via guess-and-check. |
| FAEEL | Founded Autoepistemic Equilibrium Logic. Avoids self-supported world views. |
| FastFourierSAT | Massively parallel continuous local search for GPU SAT solving. |
| ParaFROST | Certified GPU-accelerated SAT inprocessing with proof generation. |
System Overview
- Parsing: PEG grammar with Pest in
crates/xlog-logic - Stratification: SCC analysis over the predicate dependency graph
- Lowering: AST to backend-specific IR paths (
RIR,PIR, planned solver IR) - Optimization: predicate pushdown, join planning, and execution-shape rewrites
- Execution: backend interpreters dispatch against the shared CUDA provider
Crate Structure
Dependency Graph
Production[dependencies] only (dev-dependencies omitted):
xlog-logic → xlog-runtime (moved to dev-deps),
removed xlog-stats → xlog-cuda (was unused), added xlog-neural → xlog-core.
Epistemic WFS boundary: xlog-gpu intentionally has no xlog-prob
dependency. Accepted cyclic negated-modal WFS execution must route through the
GPU-backed WFS plan in xlog-gpu; the host HashMap/HashSet WFS implementation
in xlog-prob remains a probabilistic/provenance subsystem and is not an accepted
fallback for production epistemic execution. This boundary is not a
device-resident/no-host-interaction WFS residency guarantee: host orchestration
and metadata row-count reads may still participate in WFS convergence.
Crate Responsibilities
| Crate | Purpose |
|---|---|
xlog-core | Shared types (ScalarType, Schema, AggOp), traits (KernelProvider), errors |
xlog-ir | Relational IR nodes (RirNode), expressions (Expr), execution plans |
xlog-logic | Parser, stratification, lowering (AST → RIR), optimizer (predicate pushdown + join planning), neural predicate syntax (nn/4) |
xlog-runtime | Executor, versioned RelationStore, profiling, incremental maintenance, adaptive join index cache |
xlog-cuda | CudaKernelProvider, GpuMemoryManager, CudaBuffer/CudaColumn, PTX embedding, Arrow IPC/C Data + DLPack interop |
xlog-stats | StatsManager + StatsSnapshot (compiler feedback + runtime tracking) |
xlog-prob | Probabilistic tier: provenance → CNF → GPU Decision-DNNF compiler → XGCF; exact inference + Monte Carlo sampling + circuit caching; includes GPU-native PIR→CNF encoder and GPU Decision-DNNF/CDCL compilation utilities |
xlog-neural | Neural-symbolic integration: NetworkRegistry, NetworkHandle, TensorSourceRegistry, NeuralBridge |
xlog-solve | Solver services: GPU CDCL verifier (complete SAT/UNSAT, on-GPU validation) + CLS SAT/MaxSAT (heuristic) |
xlog-gpu | High-level GPU API: deterministic execution + input/output buffers for integration layers |
xlog-induce | Bounded exact-induction engine — scores all (left, right) candidate pairs across four fixed 2-body topologies (chain/star/fanout/fanin) in one batched GPU pass via ilp_exact kernel; top-K per topology with structured candidate metadata. See Bounded Exact Induction |
xlog-cli | xlog CLI for deterministic and probabilistic execution with Arrow IPC I/O |
pyxlog | PyO3 extension (pyxlog Python module) exposing DLPack-first deterministic + probabilistic evaluation + neural-symbolic training API |
xlog-cuda-tests | CUDA/PTX certification suite (release gating; publish = false) |
Data Types
Scalar Types (xlog-core)
Symbol values are stored as u32 IDs that map bidirectionally to strings via a global string table. Symbols are reversible: the original string can be recovered for display in query output.
Schema
Relational IR Nodes (xlog-ir)
Expressions
Aggregation Operations
Compilation Pipeline
Phase 1: Parsing
File:crates/xlog-logic/src/parser.rs
Uses Pest PEG grammar to parse Datalog source into AST:
Program AST with rules, facts, predicate declarations, constraints, and queries
Queries and Constraints (Desugaring)
XLOG supports:- Constraints:
:- body.(must have no solutions) - Queries:
?- atom.(what to print)
:- body.→__xlog_constraint_N(1) :- body.?- p(1, X).→__xlog_query_N(X) :- p(1, X).
Phase 2: Stratification
File:crates/xlog-logic/src/stratify.rs
Ensures safe negation ordering using dependency analysis:
- Build dependency graph between predicates
- Classify edges: Positive, Negative, Aggregate
- Find SCCs using Tarjan’s algorithm
- Detect cycles through negation (stratification failure)
- Topologically sort strata
Vec<Stratum> ordered by dependencies
Phase 3: Lowering
File:crates/xlog-logic/src/lower.rs
Transforms AST to Relational IR:
- Schema inference: Derive column types from facts
- Join planning: Build join trees for positive atoms (bushy dynamic programming for small bodies, greedy for large bodies) using a cost model; can be seeded from runtime
StatsSnapshot - Variable tracking: Map variables to column indices for join keys and projections
- Comparisons + arithmetic: Lower comparisons to
Filterandisexpressions to computed projections (ProjectExpr::Computed) - Negation handling: Lower stratified negation via
Diff+Semijoin (anti-semi pattern over shared variables) - Recursion: Mark recursive predicate groups as SCCs in the plan; runtime executes recursive SCCs with semi-naive deltas
- Projection: Project result columns to match rule head (and lower aggregates to
GroupBy)
ExecutionPlan with SCCs, strata, compiled rules
Compiler Orchestration
File:crates/xlog-logic/src/compile.rs
Execution Pipeline
Executor
File:crates/xlog-runtime/src/executor.rs
Execution Flow
- Execute strata in order (dependency-first)
- For each stratum: Execute all rules in SCCs
- For recursive SCCs: Use semi-naive fixpoint iteration
- For each rule: Recursively evaluate RIR node tree
- Store results: Put computed relations in
RelationStore
RIR Node Execution
GPU-Resident Execution
Filter execution is fully GPU-resident. Predicate trees are lowered to a mask DAG: typed compare kernels generate masks, boolean operators (mask_and, mask_or, mask_not) compose them, and stream compaction selects rows without CPU round-trips. Arithmetic expressions referenced by predicates are materialized on the GPU using the arithmetic kernels.
GroupBy finalization is also GPU-resident: group boundaries are detected over packed key bytes, a GPU prefix-sum generates group IDs and group start indices, and packed rows are gathered + unpacked on-device to extract group keys. Aggregation outputs remain on the GPU until final output conversion.
Recursive SCC Evaluation (Semi-Naive)
Recursive programs are executed at the SCC level using semi-naive deltas. High-level algorithm (mirrorsExecutor::execute_recursive_scc):
- Seed each recursive predicate
P:full[P] = dedup(⋃ base-rule outputs for P)delta[P] = full[P]
- Iterate up to
MAX_SCC_ITERATIONS:- For each recursive rule, evaluate it semi-naively by rewriting exactly one recursive scan occurrence in the rule body to use
delta[...], then union all such variants for the head predicate. delta_new[P] = dedup(delta_raw[P] - full[P])- Stop if all
delta_new[P]are empty. full[P] = dedup(full[P] ∪ delta_new[P]); setdelta[P] = delta_new[P].
- For each recursive rule, evaluate it semi-naively by rewriting exactly one recursive scan occurrence in the rule body to use
RirNode::Fixpoint exists and is interpreted by Executor::execute_fixpoint, but the current compiler emits recursion via SCC metadata + delta rewriting rather than explicit Fixpoint nodes.
CUDA Kernels
Kernel Files
| File | Kernels | Purpose |
|---|---|---|
arith.cu | arith_add_*, arith_sub_*, arith_mul_*, arith_div_*, arith_mod_*, arith_abs_*, arith_neg_* | Arithmetic operations for is expressions |
join.cu | hash_join_bucket_count_v2, hash_join_scatter_v2, hash_join_probe_v2, hash_join_semi, hash_join_anti, compute_composite_hash | Hash joins (v2 with bucketed layout) + composite hashing |
pack.cu | pack_keys, pack_and_hash_keys, hash_packed_keys, gather_packed_rows, compare_packed_keys | Key packing/hashing + packed-row utilities |
dedup.cu | mark_unique_*, compact_rows | Sort-based deduplication |
filter.cu | filter_compare_*, compact_*_by_mask, mask_{and,or,not} | Filtering and stream compaction |
sort.cu | radix_histogram, radix_scatter_*, init_indices, apply_permutation_*, gather_keys_* | Stable radix sort + permutation apply |
groupby.cu | detect_group_boundaries, extract_group_keys, groupby_*, groupby_logsumexp_* | Sorted aggregation |
scan.cu | exclusive_scan_mask, count_mask, multiblock_scan_* | Prefix sum operations |
set_ops.cu | concat_{u32,bytes}, sorted_diff_mark | Union/difference operations |
circuit.cu | xgcf_forward_level, xgcf_backward_level_* | XGCF circuit eval + reverse-mode gradients (probabilistic) |
sat.cu | sat_cdcl_solve, sat_check_model, sat_proof_check, sat_assert_*, sat_xgcf_cnf_*, sat_emit_not_phi | GPU CDCL verifier + equivalence query construction helpers |
mc_sample.cu | mc_sample_bernoulli | Bernoulli sampling (Monte Carlo inference) |
Hash Join Implementation (v2)
The runtime uses a bucketed “CSR buckets” layout for joins:- Compute a 64-bit composite hash for the join key columns (via packed key bytes).
- Build (right side):
hash_join_bucket_count_v2: count build rows per bucket (bucket = low bits of hash)- GPU exclusive scan →
bucket_offsets hash_join_scatter_v2: scatter build row indices contiguously per bucket and store aligned hashes
- Probe (left side):
hash_join_probe_v2: probe the bucket range and compare hashes- optional key verification compares packed key bytes to eliminate hash-collision false positives
Multi-Column Composite Hash
Radix Sort (4-bit, stable)
GroupBy Aggregation
- Sort by
key_colson GPU - Detect group boundaries over packed key bytes
- Compute group IDs and start indices via prefix-sum
- Gather group key columns on GPU (packed-row gather + unpack)
- Run aggregation kernels
count: any value type (counts rows)sum/min/max:u32values (outputu64forsum,u32formin/max)logsumexp:f64values (outputf64)
Memory Management
CudaBuffer
File:crates/xlog-cuda/src/memory.rs
GPU Memory Manager
File:crates/xlog-cuda/src/memory.rs
Device Runtime + Recorded Launch Discipline
The device runtime adds an opt-in stack atop theGpuMemoryManager shown above:
AsyncCudaResource → LoggingResource → GlobalDeviceBudget → XlogDeviceRuntime + StreamPool. It tracks per-allocation
events for cross-stream lifetime safety and exposes an
access-aware prepare_block_use / finish_block_use /
prepare_first_use / finish_first_use API that the
LaunchRecorder uses to fence kernels safely on
non-default streams. Activated via
CudaKernelProvider::with_runtime /
GpuMemoryManager::with_runtime; XLOG_USE_DEVICE_RUNTIME=1
flips integration-test fixtures onto the runtime stack and
XLOG_USE_RECORDED_OPS=1 (or per-operator
XLOG_USE_RECORDED_*) routes operator dispatch through the
recorded variants. Default new constructors are unchanged.
- Architecture: Device Runtime.
- An operator-author migration checklist for recorded launches is maintained alongside the runtime code.
Memory Budget Configuration
Key Algorithms
Hash Join
- Key prep: Pack join key columns into row-major bytes and compute a 64-bit composite hash (FNV-1a) on GPU.
- Build phase (v2 buckets): Bucket build rows by low hash bits (count → scan offsets → scatter contiguous bucket entries).
- Probe phase: For each probe row, scan the bucket range, compare hashes, and (optionally) verify key bytes to eliminate hash collisions.
- Materialization: Gather left/right columns into the output schema; for left-outer joins, unmatched right columns are zero-filled.
- Inner: Output matching pairs
- Semi: Output left rows that have any match (existence check)
- Anti: Output left rows with no matches
- LeftOuter: Output all left rows; unmatched right-side columns are zero-filled
Sort-Based Deduplication
- Sort rows by all columns (or key columns)
- Mark duplicates (compare adjacent rows)
- Stream compaction to remove marked rows
Fixpoint Iteration (Semi-Naive)
- Seed
fullanddeltarelations for each recursive predicate in an SCC. - Iterate:
- Evaluate rule variants where exactly one recursive scan uses
delta(semi-naive). delta_new = dedup(delta_raw - full)full = dedup(full ∪ delta_new); setdelta = delta_new
- Evaluate rule variants where exactly one recursive scan uses
- Stop when all
delta_neware empty (or iteration limit reached).
Stream Compaction
- Create mask array (1 = keep, 0 = discard)
- Compute exclusive prefix sum of mask
- Scatter elements to positions indicated by prefix sum
Radix Sort
Radix sort uses 4-bit digits overu32 segments; the number of passes depends on key width (e.g., 8 passes per 32 bits):
- Compute per-block histograms for current digit
- Global prefix sum of histograms
- Scatter elements to sorted positions
- Swap input/output buffers, move to next digit
Public APIs
Compilation
Execution
High-Level API (xlog-gpu)
CLI
Thexlog CLI is a production entry point for deterministic and probabilistic programs. It reads .xlog sources, accepts Arrow IPC inputs for EDB relations, and emits query results as pretty tables, CSV, or Arrow IPC streams.
Python API
Profiling
Dataflow Diagram
The diagram below depicts the deterministic Datalog evaluation path end-to-end. The probabilistic, solver, and neural-symbolic paths share the parser, stratifier, and kernel provider but branch into their own IRs and executors after lowering.Error Handling
Configuration
Memory Budget
Execution Limits
Testing
Workspace Tests
CUDA Certification Suite
The certification suite validates all CUDA kernel operations:See Also
Core Documentation
- Roadmap — Feature status, planned development, glossary
Architecture Documents
| Document | Description |
|---|---|
| GPU Execution | GPU-resident filter, groupby, and arithmetic evaluation |
| Query Optimizer | Cost-based join ordering, predicate pushdown, statistics |
| Arithmetic Expressions | is syntax, type inference, GPU evaluation |
| Probabilistic Engines | Exact inference (Decision-DNNF/XGCF) and Monte Carlo sampling |
| Rule Learning | Differentiable ILP trainer architecture and the RFC-backed execution model |
| Solver Services | GPU CDCL verifier (zero host reads) + CLS SAT/MaxSAT services |
| Adaptive Indexing | HISA-based heat tracking and index selection |
| Multi-GPU Joins | Distributed join design (planned) |
| Data Interoperability | Arrow IPC/C Data and DLPack integration |
| CUDA Certification | PTX kernel test suite (140 tests, 24 categories) |
| CLI Reference | xlog run and xlog prob commands |
| Python Bindings | PyO3 + DLPack API (pyxlog module) |