Route Families
| Route | Status | Notes |
|---|---|---|
| Triangle WCOJ | Released in the 0.9.2 line | Dedicated route for recognized triangle bodies over supported key widths. |
| 4-cycle WCOJ | Released in the 0.9.2 line | Dedicated route for recognized 4-cycle bodies. |
| K-clique WCOJ | Released in the 0.9.2 line | Planned clique routes with variable-order metadata and helper-split support. |
| Aggregate-fused WCOJ | On main, unreleased beyond 0.9.2 | Computes selected grouped aggregates without materializing the full join output. |
| Free Join | On main, unreleased beyond 0.9.2 | Generalized GPU multiway route for eligible bodies that do not match a dedicated shape. |
| Factorized recursive deltas | On main, unreleased beyond 0.9.2 | Computes novel recursive tuples without materializing witness-multiplied delta joins. |
Planning Pipeline
WCOJ starts in the compiler and finishes in the runtime:- The lowerer emits ordinary RIR for the rule body.
- Optimizer passes preserve semantics and may reorder recognized shapes when statistics make a better inner pair clear.
promote_multiwayconverts eligible bodies toRirNode::MultiWayJoin.- The executor dispatches the multiway node through
wcoj_dispatch. - The CUDA provider runs the dedicated WCOJ, Free Join, aggregate-fused, or factorized-delta kernel if the final runtime gate accepts it.
- If a gate declines, the executor uses the embedded fallback route.
Dispatch Counters
The executor exposes counters so you can distinguish route eligibility from route execution:- triangle, 4-cycle, and clique WCOJ dispatch counts;
- WCOJ error-decline count;
- aggregate-fused groupby dispatch count;
- Free Join dispatch count on main;
- factorized recursive-delta dispatch count on main.
Aggregate Fusion
Aggregate-fused WCOJ is main-only beyond 0.9.2. It routes selected grouped aggregate shapes through kernels that reduce by a root variable directly instead of first materializing all joined tuples. The route is intentionally narrow. It accepts only the shapes, widths, and aggregate operators implemented by the CUDA provider. Other aggregates decline to the materialize-plus-groupby path or return the same error the fallback would produce.Free Join
Free Join is main-only beyond 0.9.2. It handles broader multiway bodies with a frontier-based GPU algorithm instead of requiring a dedicated triangle or 4-cycle kernel. The planner can reorder inputs when the prefix-key constraints and statistics make a better route available. It can also decline when a candidate order would lose the factorized benefit or when required statistics are absent. Dedicated WCOJ shapes remain dedicated; Free Join is the general route, not a replacement for the specialized kernels.Factorized Recursive Deltas
Factorized recursive deltas are main-only beyond 0.9.2. The route targets transitive-closure-shaped recursive rules where the delta step can compute novel tuples by root rather than materializing every witness. The dispatcher chooses between:- a dense-domain bitvector route;
- a sparse-domain hash-set route;
- the legacy hash-join and diff path.
What Not To Claim
- Do not describe WCOJ as the universal join engine.
- Do not describe Free Join or factorized deltas as released in 0.9.2.
- Do not infer optimized dispatch from result equality.
- Do not mark a fallback as failure when the route contract says to decline cleanly.