Worst-case optimal join (WCOJ) support is XLOG’s route family for multiway join shapes where a binary join chain can create avoidable intermediate blow-up. The runtime recognizes eligible RIR bodies, dispatches shape-specific CUDA kernels, and falls back to ordinary execution when a route is not applicable. The contract is opportunistic: a WCOJ decline must preserve the same row set through the fallback path. A successful answer does not prove WCOJ fired; use the executor counters.

Route Families

RouteStatusNotes
Triangle WCOJReleased in the 0.9.2 lineDedicated route for recognized triangle bodies over supported key widths.
4-cycle WCOJReleased in the 0.9.2 lineDedicated route for recognized 4-cycle bodies.
K-clique WCOJReleased in the 0.9.2 linePlanned clique routes with variable-order metadata and helper-split support.
Aggregate-fused WCOJOn main, unreleased beyond 0.9.2Computes selected grouped aggregates without materializing the full join output.
Free JoinOn main, unreleased beyond 0.9.2Generalized GPU multiway route for eligible bodies that do not match a dedicated shape.
Factorized recursive deltasOn main, unreleased beyond 0.9.2Computes novel recursive tuples without materializing witness-multiplied delta joins.

Planning Pipeline

WCOJ starts in the compiler and finishes in the runtime:
  1. The lowerer emits ordinary RIR for the rule body.
  2. Optimizer passes preserve semantics and may reorder recognized shapes when statistics make a better inner pair clear.
  3. promote_multiway converts eligible bodies to RirNode::MultiWayJoin.
  4. The executor dispatches the multiway node through wcoj_dispatch.
  5. The CUDA provider runs the dedicated WCOJ, Free Join, aggregate-fused, or factorized-delta kernel if the final runtime gate accepts it.
  6. If a gate declines, the executor uses the embedded fallback route.
Runtime checks include body shape, key width, variable layout, relation availability, stream/runtime support, memory budget, and kill switches.

Dispatch Counters

The executor exposes counters so you can distinguish route eligibility from route execution:
  • triangle, 4-cycle, and clique WCOJ dispatch counts;
  • WCOJ error-decline count;
  • aggregate-fused groupby dispatch count;
  • Free Join dispatch count on main;
  • factorized recursive-delta dispatch count on main.
Counters should be paired with kill-switch parity checks. If a kill switch changes only which route fires and not the result, the optimized route is preserving semantics for that workload.

Aggregate Fusion

Aggregate-fused WCOJ is main-only beyond 0.9.2. It routes selected grouped aggregate shapes through kernels that reduce by a root variable directly instead of first materializing all joined tuples. The route is intentionally narrow. It accepts only the shapes, widths, and aggregate operators implemented by the CUDA provider. Other aggregates decline to the materialize-plus-groupby path or return the same error the fallback would produce.

Free Join

Free Join is main-only beyond 0.9.2. It handles broader multiway bodies with a frontier-based GPU algorithm instead of requiring a dedicated triangle or 4-cycle kernel. The planner can reorder inputs when the prefix-key constraints and statistics make a better route available. It can also decline when a candidate order would lose the factorized benefit or when required statistics are absent. Dedicated WCOJ shapes remain dedicated; Free Join is the general route, not a replacement for the specialized kernels.

Factorized Recursive Deltas

Factorized recursive deltas are main-only beyond 0.9.2. The route targets transitive-closure-shaped recursive rules where the delta step can compute novel tuples by root rather than materializing every witness. The dispatcher chooses between:
  • a dense-domain bitvector route;
  • a sparse-domain hash-set route;
  • the legacy hash-join and diff path.
Domain size, table budget, arity, key width, and rule shape decide whether the factorized route fires. Unsupported cases decline to the existing recursive evaluation path.

What Not To Claim

  • Do not describe WCOJ as the universal join engine.
  • Do not describe Free Join or factorized deltas as released in 0.9.2.
  • Do not infer optimized dispatch from result equality.
  • Do not mark a fallback as failure when the route contract says to decline cleanly.
Use Factorized Execution for the user guide and Factorized Execution Internals for the main-only implementation details.