main. The routes are unreleased beyond 0.9.2.
Design Pattern
Every factorized route follows the same pattern:- recognize a rule shape that can avoid a large intermediate;
- prove the needed key, width, and variable-layout constraints;
- choose a compact representation;
- run a CUDA route over that representation;
- install the result only after row-set or aggregate parity is preserved;
- decline to fallback when the contract is not met.
Aggregate-Fused WCOJ
Aggregate-fused WCOJ routes grouped aggregate shapes through kernels that reduce by a planned root variable. Instead of:- materialize all joined tuples;
- group by the root key;
- reduce values;
Free Join Frontiers
Free Join represents multiway execution as a frontier over sorted range tries:- relation columns are laid out so prefix probes can advance level by level;
- each frontier level binds another variable or proves that a branch has no compatible tuples;
- identity groups and probe filters avoid unnecessary expansion;
- count-by-root can multiply remaining trie-range lengths for private variables instead of enumerating each full binding.
Recursive Delta Factorization
Factorized recursive deltas target rules such as transitive closure, where a delta join can produce many witnesses for the same novel tuple. The dispatcher chooses among:| Route | Representation | Decline reason |
|---|---|---|
| Dense | Root-indexed bitvectors over a bounded domain. | Domain too large or unsupported shape. |
| Sparse | GPU open-addressing hash set over distinct candidates. | Table estimate or byte budget exceeds limits. |
| Legacy | Hash join followed by diff. | Fallback for unsupported or declined cases. |
Loss Vetoes
Factorized execution is not always better. The cost model can veto a route when available statistics show that the ordinary binary path should be cheaper or no prefix-key-compatible Free Join order is viable. Missing stats should not manufacture confidence. The route either uses a safe default, keeps the existing order, or declines.Device And Host Boundaries
The executor remains host-side control. It chooses routes, owns counters, and installs resulting relation buffers. CUDA kernels own the data-plane work: frontier expansion, grouped accumulation, dense bitvector novel-set computation, and sparse hash-set novel-set computation. Use transfer telemetry when a route claims no tracked host data movement in its hot loop.Verification Obligations
A factorized route is not complete until evidence shows:- fallback parity with the route disabled;
- dispatch counter increments with the route enabled;
- unsupported shapes decline cleanly;
- budget and overflow conditions fail closed;
- recursive routes converge to the same full relation as the legacy path;
- aggregate routes match the materialize-plus-groupby result.