Rule Learning (Differentiable ILP)

This document describes the dILP training subsystem: a GPU-accelerated differentiable Inductive Logic Programming engine that learns Datalog rules from positive/negative examples via gradient descent.

Design Goals

Learn rules, not weights — discover symbolic Datalog clauses (e.g., reach(X,Y) :- edge(X,Z), edge(Z,Y).) from data
GPU-resident hot loop — no semantic column downloads in the training step loop
Sparse by default — candidate-indexed soft-probs instead of materializing N³ tensors
Transactional promotion — learned rules pass gate checks before entering the knowledge base
Auditable transfer evidence — learned rules carry fold, held-out-domain, gate, and base-kernel checksum metadata

Core Idea: Tensorized Super-Graph Masking

Traditional ILP systems compile candidate rules into executable programs — impossible at millisecond timescales. XLOG’s approach pre-compiles a “super-graph” of all candidate rules and activates them via continuous mask tensors optimized with Gumbel-Softmax:

Candidate rules          Logit tensor W (C floats)
  ┌──────────┐              ┌───────────┐
  │ r1: A←B,C│              │ w1  w2 .. │ ─── Gumbel-Softmax(τ) ──►  soft mask
  │ r2: A←B,D│              │           │                               │
  │ ...       │              └───────────┘                               ▼
  └──────────┘                                              set_rule_mask_sparse()
                                                                        │
                                                                        ▼
                                                            xlog evaluate (GPU)
                                                                        │
                                                                        ▼
                                                              BCE loss + ∇W

At convergence, argmax(W) picks the winning rule. Temperature annealing (τ → τ_floor) drives the soft mask toward a one-hot selection.

Architecture Overview

                           Python (pyxlog.ilp)
     ┌─────────────────────────────────────────────────────┐
     │  train_only()                                       │
     │    ├─ valid_candidates()  → candidate map           │
     │    ├─ MaskBackend.init_weights()                    │
     │    ├─ AdaptiveTempController                        │
     │    └─ step loop ──────────────────────────┐         │
     │         ├─ MaskBackend.apply_mask()       │         │
     │         ├─ program.evaluate_device()      │ GPU     │
     │         ├─ BCE loss (torch)               │ only    │
     │         ├─ loss.backward()                │         │
     │         └─ optimizer.step()               │         │
     │                                           │         │
     │  train_and_promote()                      │         │
     │    ├─ train_only()                        │         │
     │    ├─ trial compile (Rust)                │         │
     │    └─ promotion gates                     │         │
     └───────────────────────────────────────────┘         │
                        │                                   │
                        ▼                                   │
     ┌────────────────────────────────────┐                │
     │  Rust (xlog-runtime, xlog-cuda)   │                │
     │    ├─ set_rule_mask_sparse()       │◄───────────────┘
     │    ├─ IlpRegistry (mask storage)  │
     │    ├─ TensorMaskedJoin (executor) │
     │    ├─ batch_fact_membership()     │
     │    ├─ batch_fact_membership_device() │
     │    ├─ batch_tagged_credit()       │
     │    └─ batch_tagged_credit_device()│
     └────────────────────────────────────┘
                        │
                        ▼
     ┌────────────────────────────────────┐
     │  CUDA (kernels/ilp.cu)            │
     │    └─ extract_nonzero_indices()   │
     └────────────────────────────────────┘

Key Entry Points

Python (`pyxlog.ilp`)

File	Purpose
`pyxlog/ilp/trainer.py`	`train_only()` — multi-start training loop
`pyxlog/ilp/promoter.py`	`train_and_promote()` — training + gate pipeline
`pyxlog/ilp/neurosymbolic.py`	`train_neurosymbolic_program()` — joint `nn/4` and symbolic rule-weight training
`pyxlog/ilp/inventory.py`	`build_rule_inventory()` — selected/rejected clause inventory with transfer metadata
`pyxlog/ilp/backend.py`	`MaskBackend` protocol, `SparseMaskBackend`, `DenseMaskBackend`
`pyxlog/ilp/temperature.py`	`AdaptiveTempController` — cosine-annealed τ schedule
`pyxlog/ilp/entropy.py`	Entropy regularization helpers
`pyxlog/ilp/holdout.py`	`holdout_f1_and_variance()` — LOO (`<=20`) and k-fold (`>20`) F1 scoring
`pyxlog/ilp/types.py`	`TrainConfig`, `TrainResult`, `PromotionResult`, `LearnedArtifact`, etc.
`pyxlog/ilp/exceptions.py`	`IlpConfigError`, `IlpCandidateError`, `IlpTrainingError`

Rust (`xlog-runtime`, `xlog-cuda`)

File	Purpose
`crates/xlog-runtime/src/ilp_registry.rs`	`IlpRegistry` — mask storage, `IlpTaggedResult` metadata
`crates/xlog-runtime/tests/ilp_integration_tests.rs`	Rust-side integration tests for mask round-trips
`crates/xlog-cuda/tests/ilp_kernel_tests.rs`	CUDA kernel unit tests (`extract_nonzero_indices`)

CUDA Kernels

File	Purpose
`kernels/ilp.cu`	`extract_nonzero_indices()` — N³ mask → sparse index extraction

Mask Backends

The MaskBackend protocol abstracts how the learnable tensor W is applied to the XLOG executor:

SparseMaskBackend (default)

W (C logits) → Gumbel-Softmax(τ) → candidate_soft_probs (C,)
                                          │
                    argsort/top-k on CUDA in Python/Torch
                                          │
                    set_rule_mask_sparse_selected(selected_ids, selected_soft_probs)
                                          │
                                          ▼
                              Rust builds executor mask internally
                              (no N³ tensor materialized, no full soft-vector device-to-host transfer)

Learnable params: C floats (one per candidate rule)
Memory: O(C) — typically C < 100
Preferred hot-loop path calls set_rule_mask_sparse_selected() on the compiled program
Legacy compatibility path set_rule_mask_sparse() remains available when Rust-side ranking is desired

DenseMaskBackend (alpha-compatible, debug)

W (N×N×N logits) → Gumbel-Softmax(τ) → 3D soft mask
                                              │
                           flatten → DLPack → set_rule_mask()
                                              │
                                              ▼
                                    Rust imports N³ tensor

Learnable params: N³ floats (N = schema size)
Memory: O(N³) — expensive for large schemas
Enabled via TrainConfig(debug_dense_mask=True) for parity testing

Training Pipeline

`train_only()`

Candidate enumeration — valid_candidates(source, mask_name) returns all syntactically legal body-pair assignments
Multi-start — up to max_attempts independent restarts with fresh logits
Step loop (per attempt, up to step_budget_per_attempt):
- Apply mask via backend
- Forward pass: program.evaluate_device() (GPU-only, no host reads)
- BCE loss between predicted and target fact membership
- Backward pass: loss.backward() through PyTorch autograd
- Optimizer step on W
- Temperature anneal: τ_start → τ_floor (cosine schedule)
- Optional deterministic controls (deterministic=True) for reproducible attempt seeding
- Early stopping: when argmax is stable and loss < threshold
Decode — argmax(W) maps to winning candidate → discovered rule string

`train_and_promote()`

Call train_only() — get TrainResult
If not converged → PromotionStatus.NOT_CONVERGED
Trial compile — substitute discovered rule into source, compile via Rust
Promotion gates (all must pass for PROMOTED):
- Convergence gate — training converged (already checked)
- Novel-rate gate — fraction of non-example derivations ≤ max_novel_rate
- Protected-relation gate — no unwanted relation side-effects
- Holdout F1 gate — F1 on held-out examples ≥ threshold
- Ambiguity gate — top-M scan (or exhaustive mode) detects no alternative winning candidates
- Typed-schema gate — optional hard gate requiring relation type metadata (or waiver-driven manual review)
All pass → PromotionStatus.PROMOTED with committed_source

Higher-Level Neuro-Symbolic Training Surface

A higher-level training entry point handles sources that mix neural predicates and trainable symbolic clauses:

from pyxlog.ilp.neurosymbolic import NeuroSymbolicTrainingConfig, train_neurosymbolic_program

result = train_neurosymbolic_program(
    source,
    networks={"score": torch_module},
    examples=training_rows,
    config=NeuroSymbolicTrainingConfig(steps=16, learning_rate=0.05),
)

The source owns declarative nn(...), trainable_rule(...), and train(...) declarations. The result reports neural gradient norms, symbolic gradients, final symbolic weights, and a RuleInventory suitable for transfer audits.

Existential-join trainable bodies (Stage B)

A trainable_rule body may join a neural predicate to an ordinary relation on an existential (non-head) variable — the neural predicate is grounded over the real join domain inside the circuit and OR-aggregated at the head:

plastic(Edge) :- saliency(Event, strengthen), pre_before_post(Event, Edge).

Here Event appears only in the body. The engine materializes the join domain from pre_before_post’s ground facts, emits one neural leaf per joined event, and the differentiable provenance OR-aggregates the per-event contributions per head binding, yielding P(plastic(Edge)) = σ(w) · (1 − ∏_{e : pre_before_post(e,Edge)} (1 − p_saliency(e))). Gradient flows into the neural predicate (all joined events) and the rule guard, but never into the deterministic join relation. The per-event features arrive through a domain_inputs={"net": features} channel (row i = the i-th join-domain constant in sorted order), and examples carry only per-head-binding targets. Because saliency is learned as a function of the event feature (not an id lookup), the trained predicate generalizes to unseen events. Constraints: the join domain must be ground facts (a derived relation is rejected, since its extension is not materialized); head-binding ids must be 0..N-1 row-aligned with targets; a single join network is supported; and the exact d-DNNF compiler builds one circuit over all head-binding queries, so the planted graph must stay within the compiler’s fixed buffer (empirically ~6–7 events). A worked example lives in examples/plasticity_incircuit/ with a CUDA-gated recovery test in python/tests/test_plasticity_incircuit.py. Head-variable (“hard filter”) joins remain supported as pre-filters; only the existential-join case is new. train_and_promote(...) also accepts training_fold, held_out_domains, base_kernel_checksum_before, and base_kernel_checksum_after. These fields are recorded on PromotionResult.rule_inventory, along with selected and rejected candidate clauses and gate outcomes.

Artifact Persistence

LearnedArtifact captures the full training result for reproducibility:

artifact.save("artifact.json")   # JSON with SHA-256 candidate-map hash
loaded = LearnedArtifact.load("artifact.json", verify_hash=True)

Schema version: beta-v1. Fields: discovered rule, logits, candidate map, config, telemetry, precision/recall, metadata (timestamp, schema version, candidate map hash).

GPU Contract

The training step loop obeys XLOG’s GPU-resident contract:

evaluate_device() — no host reads for semantic results
batch_fact_membership_device() — returns a CUDA bool mask via DLPack with zero semantic-loop device-to-host transfer
batch_tagged_credit_device() — returns CSR-style CUDA credit data via DLPack with zero semantic-loop device-to-host transfer
batch_fact_membership() / batch_tagged_credit() remain available when host materialization is desired
AtomicU64 device-to-host counter on CudaKernelProvider — hard gate raises if download_column_* is observed during step loop
host_transfer_stats() / reset_host_transfer_stats() expose broader host transfer accounting for profiling
Legacy set_rule_mask_sparse() still performs a control-plane soft-probability download; the selected-candidate sparse path avoids it

Testing

86+ static test functions across ILP Python test files (expanded by parametrized GA/beta gates)
Reliability gate: 20 consecutive train_only() runs must all converge (20/20 pass)
GA reliability gate: default 50-seed statistical run (test_ilp_ga_reliability.py)
GA performance/transfer tests: forward_p95_us + host transfer accounting (test_ilp_performance.py)
Dense/sparse parity: every sparse-path test has a debug_dense_mask=True variant
Rust-side: ilp_integration_tests.rs, ilp_kernel_tests.rs
CUDA certification: extract_nonzero_indices covered by kernel test suite

​Design Goals

​Core Idea: Tensorized Super-Graph Masking

​Architecture Overview

​Key Entry Points

​Python (pyxlog.ilp)

​Rust (xlog-runtime, xlog-cuda)

​CUDA Kernels

​Mask Backends

​SparseMaskBackend (default)

​DenseMaskBackend (alpha-compatible, debug)

​Training Pipeline

​train_only()

​train_and_promote()

​Higher-Level Neuro-Symbolic Training Surface

​Existential-join trainable bodies (Stage B)

​Artifact Persistence

​GPU Contract

​Testing

​See Also