Expand description
v0.6.2 minimal env-gated GPU 3-way WCOJ triangle dispatch.
Single public entry:
try_wcoj_triangle_u32_dispatch (env-driven) and
try_wcoj_triangle_u32_dispatch_with_gate (boolean-driven,
for tests). The slice is intentionally narrow:
- Env flag only.
XLOG_USE_WCOJ_TRIANGLE_U32=1(ortrue/TRUE) opts in. Anything else (unset,0,false, etc.) means the helper returnsOk(None)unconditionally — the caller takes the existing binary-join path. - Recognizes exactly one shape. A rule of the form
tri(X, Y, Z) :- e1(X, Y), e2(Y, Z), e3(X, Z)over 2-column WCOJ-eligible relations (U32, Symbol, or U64 keys — seeWcojKeyWidth): three positive 2-arity body atoms covering the head’s three distinct variables in head-position order. No negation, no comparison filters, no recursion (head predicate not in body), no reversed-axis atoms (e.g.e1(Y, X)), no constants in atom args. The planner must also return [xlog_logic::hypergraph::RulePlan::MultiwayCandidate]. - Width uniformity. All three slots must share a key width. A mixed-width triangle (e.g. e1 U32, e2 U64) is rejected at this dispatch level — the binary-join chain handles it.
- Silent fallback. Any mismatch — gate off, shape
mismatch, planner verdict not multiway, missing input
buffer, unsupported scalar type, mixed-width slots —
returns
Ok(None)without an error or log line. The caller is expected to silently route to the existing binary-join path. This keeps the env flag truly opt-in and prevents the helper from accidentally diverting work it can’t handle. - Strict GPU pipeline on dispatch. When all checks pass,
the helper builds three sorted+deduped layouts and runs
the matching WCOJ triangle kernel on the configured
launch_stream—wcoj_layout_u32_recorded/wcoj_triangle_u32_recordedfor 4-byte keys, the_u64_recordedsiblings for 8-byte keys. All [xlog_cuda::launch::LaunchRecorder] discipline carries through unchanged.
What this slice deliberately does NOT do:
- No automatic detection at the executor level — callers
pass the rule + input buffers explicitly. Executor
wiring lives in
xlog-runtime. - No recursion / SCC mixed execution.
- No cost model.
- No mixed-width admission (U32+U64 triangle stays on the binary-join path).
- No histogram-guided block dispatch.
Constants§
- ENV_
USE_ WCOJ_ TRIANGLE_ U32 - Env variable controlling the dispatch gate. Treated as ON
when set to
"1"or case-insensitive"true"; anything else (unset,"0","false", empty string, …) means OFF.
Functions§
- try_
wcoj_ triangle_ u32_ dispatch - Env-driven entry. Reads
XLOG_USE_WCOJ_TRIANGLE_U32and delegates totry_wcoj_triangle_u32_dispatch_with_gate. - try_
wcoj_ triangle_ u32_ dispatch_ with_ gate - Test-friendly form that takes the gate as an explicit boolean.
Production callers use
try_wcoj_triangle_u32_dispatchwhich reads the env var.