Skip to main content

Executor

Struct Executor 

Source
pub struct Executor { /* private fields */ }
Expand description

Query executor that interprets RIR nodes using GPU kernels

The executor processes execution plans by iterating through strata and executing RIR node trees. It maintains a relation store for intermediate and final results.

§Example

use std::sync::Arc;
use xlog_runtime::Executor;
use xlog_cuda::CudaKernelProvider;

let provider = Arc::new(CudaKernelProvider::new(device, memory)?);
let mut executor = Executor::new(provider);

// Execute a plan
let result = executor.execute_plan(&plan)?;

Implementations§

Source§

impl Executor

Source

pub fn epistemic_gpu_runtime_counters(&self) -> EpistemicGpuRuntimeCounters

Snapshot runtime counters used by epistemic GPU certification.

Source

pub fn allocate_epistemic_gpu_workspace( &self, plan: &EpistemicGpuPlan, capacities: EpistemicGpuWorkspaceCapacities, ) -> Result<EpistemicGpuWorkspace>

Allocate GPU-resident buffers required by an epistemic GPU plan.

Source

pub fn reset_epistemic_gpu_workspace( &self, workspace: &mut EpistemicGpuWorkspace, ) -> Result<EpistemicGpuWorkspaceResetTrace>

Zero every epistemic workspace buffer on device before hot-path use.

Source

pub fn generate_epistemic_gpu_candidates( &self, workspace: &mut EpistemicGpuWorkspace, literal_count: usize, candidate_count: usize, ) -> Result<EpistemicGpuCandidateGenerationTrace>

Generate candidate-assumption bitsets directly into the GPU workspace.

Source

pub fn propagate_epistemic_gpu_candidates( &self, workspace: &mut EpistemicGpuWorkspace, literal_count: usize, candidate_count: usize, ) -> Result<EpistemicGpuPropagationTrace>

Propagate generated candidates into GPU-resident world-view staging buffers.

Source

pub fn validate_epistemic_gpu_candidates( &self, workspace: &mut EpistemicGpuWorkspace, literal_count: usize, candidate_count: usize, ) -> Result<EpistemicGpuCandidateValidationTrace>

Validate staged candidate bitsets and world-view activity on device.

Source

pub fn populate_epistemic_gpu_model_membership( &self, workspace: &mut EpistemicGpuWorkspace, output: &CudaBuffer, literal_count: usize, candidate_count: usize, reduction_count: usize, models_per_reduction: usize, ) -> Result<EpistemicGpuModelMembershipTrace>

Populate candidate-scoped model-membership staging buffers on device.

Source

pub fn populate_epistemic_gpu_model_membership_from_tuple_sources( &self, workspace: &mut EpistemicGpuWorkspace, output: &CudaBuffer, gpu_plan: &EpistemicGpuPlan, candidate_count: usize, models_per_reduction: usize, ) -> Result<EpistemicGpuModelMembershipTrace>

Populate model-membership bytes from reduced stable-model tuple sources.

Source

pub fn validate_epistemic_gpu_world_views( &self, workspace: &mut EpistemicGpuWorkspace, gpu_plan: &EpistemicGpuPlan, candidate_count: usize, models_per_reduction: usize, ) -> Result<EpistemicGpuWorldViewValidationTrace>

Validate staged model memberships against candidate world views on device.

Source

pub fn validate_epistemic_gpu_world_view_constraints( &self, workspace: &mut EpistemicGpuWorkspace, gpu_plan: &EpistemicGpuPlan, candidate_count: usize, ) -> Result<EpistemicGpuConstraintWorldViewValidationTrace>

Prune accepted candidate world views that satisfy an epistemic integrity constraint body.

Runs after Self::validate_epistemic_gpu_world_views: each surviving candidate’s assumption bit equals the negation-folded observed modal value of its literal, so a constraint body holds in this accepted world view exactly when every referenced literal’s assumption bit is set. Such candidates are pruned on device with the world-view constraint-violation rejection code; no accepted world is read back to the host.

Source

pub fn materialize_epistemic_gpu_candidates( &self, workspace: &mut EpistemicGpuWorkspace, candidate_count: usize, ) -> Result<EpistemicGpuMaterializationTrace>

Materialize accepted candidate flags into the GPU world-view buffer.

Source

pub fn materialize_epistemic_gpu_final_results( &self, workspace: &mut EpistemicGpuWorkspace, output: &CudaBuffer, candidate_count: usize, ) -> Result<EpistemicGpuFinalResultMaterializationTrace>

Materialize final result flags from the reduced runtime output row count.

Source

pub fn materialize_epistemic_gpu_final_tuples( &self, workspace: &mut EpistemicGpuWorkspace, output: &CudaBuffer, gpu_plan: &EpistemicGpuPlan, literal_count: usize, candidate_count: usize, reduction_count: usize, models_per_reduction: usize, ) -> Result<(CudaBuffer, EpistemicGpuFinalTupleMaterializationTrace)>

Materialize final query tuples into a device-resident output buffer.

Source

pub fn prepare_epistemic_gpu_execution( &self, executable: &EpistemicExecutablePlan, capacities: EpistemicGpuWorkspaceCapacities, ) -> Result<EpistemicGpuPreparedExecution>

Prepare runtime-owned GPU buffers for an epistemic executable plan.

Source

pub fn materialize_epistemic_head_relation( &mut self, name: &str, gated_output: &CudaBuffer, ) -> Result<()>

Materialize a stratum’s GATED epistemic head output into the relation store as a base relation, for stratified epistemic execution.

After a lower stratum computes its modal-gated head extension (the final_output/additional-head buffer), the higher stratum’s know/ possible over that head must read the GATED extension — not the ungated reduced relation the reduced runtime plan leaves in the store. This OVERWRITES the store relation under name with a device-side clone of the gated buffer, so the existing tuple-membership filter (which reads the source relation from the store by predicate name) gates the higher stratum against the correct extension. No resolve-into-body is performed, so there is no double-gating against the GPU world-view filter.

Source

pub fn clone_store_relation(&self, buffer: &CudaBuffer) -> Result<CudaBuffer>

Device-side clone of a store-resident relation buffer, for surfacing a stratified ordinary stratum’s output as a query result without moving it out of the store.

Source

pub fn execute_epistemic_gpu_execution( &mut self, executable: &EpistemicExecutablePlan, capacities: EpistemicGpuWorkspaceCapacities, ) -> Result<EpistemicGpuExecutionResult>

Execute the reduced production runtime plan and capture epistemic GPU evidence.

Source

pub fn execute_epistemic_gpu_execution_batch( &mut self, executables: &[&EpistemicExecutablePlan], capacities: EpistemicGpuWorkspaceCapacities, ) -> Result<Vec<EpistemicGpuExecutionResult>>

Execute multiple accepted epistemic GPU executable plans in order.

This is the runtime adapter used by split execution evidence: each component is still dispatched through Self::execute_epistemic_gpu_execution, so candidate generation, model-membership, world-view validation, materialization, transfer-budget, and production runtime counters are recorded by the existing single-plan path.

Source

pub fn execute_epistemic_gpu_execution_batch_with_trace( &mut self, executables: &[&EpistemicExecutablePlan], capacities: EpistemicGpuWorkspaceCapacities, ) -> Result<EpistemicGpuBatchExecutionResult>

Execute multiple epistemic GPU executable plans and return an aggregate trace.

This is used by split-execution certification: every component still routes through the existing single-plan GPU runtime path, and the batch trace only aggregates those component traces. It does not perform CPU recomposition.

Source§

impl Executor

Source

pub fn execute_filter( &self, input: &CudaBuffer, predicate: &Expr, ) -> Result<CudaBuffer>

Execute a Filter node using GPU predicate evaluation.

Source§

impl Executor

Source

pub fn execute_node(&mut self, node: &RirNode) -> Result<CudaBuffer>

Execute a single RIR node tree

Recursively evaluates the node and its children, returning the result as a GPU buffer.

§Arguments
  • node - The RIR node to execute
§Returns

A CudaBuffer containing the result of the node execution

§Errors

Returns an error if the node execution fails

Source§

impl Executor

Source

pub fn execute_stratum(&mut self, _stratum: &Stratum) -> Result<()>

Stub: always returns an error directing callers to use execute_plan instead.

Source

pub fn execute_non_recursive_scc( &mut self, rules: &[CompiledRule], ) -> Result<()>

Execute all rules in a non-recursive strongly connected component once.

Source

pub fn execute_recursive_scc(&mut self, rules: &[CompiledRule]) -> Result<()>

Execute a recursive SCC using semi-naive fixpoint iteration

The algorithm:

  1. Execute all rules once to get initial result
  2. Track which relations changed (delta)
  3. Re-execute rules, using delta from previous iteration
  4. Repeat until no changes (fixpoint reached)
Source§

impl Executor

Source

pub fn apply_deltas_and_recompute( &mut self, plan: &ExecutionPlan, deltas: &HashMap<String, RelationDelta>, ) -> Result<DeltaRecomputeStats>

Apply base-relation deltas and recompute affected SCCs (no recompilation).

This provides correctness for both insertions and deletions by recomputing any SCCs that depend (directly or transitively) on the changed relations.

Source§

impl Executor

Source

pub fn wcoj_triangle_dispatch_count(&self) -> u64

Number of times the WCOJ triangle hook produced a result and the executor installed it. Used by tests to assert that the WCOJ path actually ran (vs. silently falling back to the existing binary-join path with the same answer).

Source

pub fn wcoj_error_decline_count(&self) -> u64

Number of WCOJ pipeline errors (layout or kernel failures, across triangle / 4-cycle / k-clique / chain hooks) that were converted into binary-join declines. Healthy dispatch keeps this at 0; a nonzero value is the signature of a regressed WCOJ pipeline hiding behind the silent-fallback contract. Set XLOG_WCOJ_STRICT=1 to propagate such errors instead of declining.

Source

pub fn free_join_dispatch_count(&self) -> u64

Count of times the generalized Free Join dispatch produced the installed result (vs. the embedded binary fallback).

Source

pub fn factorized_delta_dispatch_count(&self) -> u64

Count of times the factorized recursive-delta dispatch produced the installed novel set (vs. the legacy hash-join -> diff path).

Source

pub fn wcoj_groupby_fusion_dispatch_count(&self) -> u64

Count of times the fused group-by-root count hook produced a result and the executor installed it (vs. silently falling back to the materialize+groupby path with the same answer).

Source

pub fn wcoj_4cycle_dispatch_count(&self) -> u64

Count of times the WCOJ 4-cycle hook produced a result and the executor installed it. Tracked separately from triangle so tests can pin which shape dispatched.

Source

pub fn chain_dispatch_count(&self) -> u64

Count of times a two-atom ChainJoin routed through the chain dispatcher instead of the embedded binary fallback.

Source

pub fn nested_loop_dispatch_count(&self) -> u64

Count of times execute_join routed an inner-join to the nested-loop provider entry point because the eligibility predicate + Cartesian-product threshold both held. Tests use this counter to assert that the nested-loop path actually fired vs. silently falling back to hash with the same answer.

Source

pub fn prepare_leader_inputs( &self, canonical: &[&CudaBuffer], var_order: &VariableOrder, launch_stream: StreamId, ) -> Result<Vec<CudaBuffer>>

Produce owned, materialized kernel slot inputs from a canonical-order input array and a VariableOrder.

Public runtime helper. Production callers are run_wcoj_*_pipeline_with_leader_order (this module); runtime tests in crates/xlog-runtime/tests/test_leader_input_permutation_tables.rs invoke it directly to assert per-slot schema + content against a CPU reference. Public visibility is intentional: there is no other reasonable seam for tests to inspect rotation + col-swap behavior, and the helper has well-defined owned-buffer semantics that external callers can rely on.

Returns a Vec<CudaBuffer> of length canonical.len() (3 for triangle, 4 for 4-cycle). Slot 0 is the leader; slots 1.. follow var_order.lookup_perms[i].input_idx mapping. Triangle non-default leaders may col-swap selected slots per the locked permutation table; 4-cycle is rotation-only and rejects swap requests with a kernel error.

Each returned CudaBuffer is owned: swapped slots are DtoD-copied via wcoj_project_2col_swap_recorded; non- swapped slots use the double-swap clone path below to give every slot a uniform owned-buffer return type.

Lifetime contract: returned buffers are independent of canonical[*]. Callers may pass references through to wcoj_layout_*_recorded without aliasing concerns.

Source

pub fn wcoj_dispatch_stream_or_init(&self) -> Option<StreamId>

Resolve the cached WCOJ launch stream, lazily initializing it on first call by acquiring one stream from the runtime pool. Subsequent calls reuse the same stream — mirrors xlog_cuda::CudaKernelProvider::recorded_op_stream (provider/mod.rs).

Shared across WCOJ shapes: triangle and 4-cycle dispatch both go through this resolver and reuse the same stream. Renamed from wcoj_triangle_stream_or_init when 4-cycle dispatch landed.

Returns None only when (a) the manager has no runtime, or (b) the very first acquisition fails (pool already at cap from other consumers). After that first success the cached id keeps resolving for the executor’s lifetime.

Source§

impl Executor

Source

pub fn wcoj_clique5_dispatch_count(&self) -> u64

Number of times the WCOJ k=5-clique hook produced a result and the executor installed it. Counter does NOT advance on dispatcher decline / kernel-launch failure (silent fallback to MultiWayJoin.fallback).

Source

pub fn wcoj_clique6_dispatch_count(&self) -> u64

Number of times the WCOJ k=6-clique hook produced a result. Same observability contract as wcoj_clique5_dispatch_count.

Source

pub fn wcoj_clique7_dispatch_count(&self) -> u64

Number of times the WCOJ k=7-clique hook produced a result. Same observability contract as wcoj_clique5_dispatch_count.

Source

pub fn wcoj_clique8_dispatch_count(&self) -> u64

Number of times the WCOJ k=8-clique hook produced a result. Same observability contract as wcoj_clique5_dispatch_count.

Source

pub fn kclique_histogram_refresh_count(&self) -> u64

Number of recursive merge boundaries where K-clique metadata was marked for refresh.

Source

pub fn kclique_histogram_refresh_nanos(&self) -> u128

Cumulative recursive K-clique metadata refresh accounting time in nanoseconds.

Source§

impl Executor

Source

pub fn new(provider: Arc<CudaKernelProvider>) -> Self

Create a new executor with the given kernel provider

§Arguments
  • provider - The CUDA kernel provider for GPU operations
Source

pub fn new_with_config( provider: Arc<CudaKernelProvider>, config: RuntimeConfig, ) -> Self

Create a new executor with the given kernel provider and runtime config

Source

pub fn set_profiling(&mut self, enabled: bool)

Enable or disable the performance profiler

When enabled, execution statistics will be collected for –stats output.

Source

pub fn is_profiling(&self) -> bool

Check if profiling is enabled

Source

pub fn execution_stats(&self, total_output_rows: u64) -> ExecutionStats

Get execution statistics

Returns collected statistics if profiling was enabled.

Source

pub fn store(&self) -> &RelationStore

Get a reference to the relation store

Source

pub fn store_mut(&mut self) -> &mut RelationStore

Get a mutable reference to the relation store

Source

pub fn ilp_registry_mut(&mut self) -> &mut IlpRegistry

Get a mutable reference to the ILP registry.

Source

pub fn ilp_registry(&self) -> &IlpRegistry

Get a shared reference to the ILP registry.

Source

pub fn ilp_last_result(&self) -> Option<&IlpTaggedResult>

Get the last ILP tagged result.

Source

pub fn put_relation(&mut self, name: &str, buffer: CudaBuffer)

Store a relation buffer and invalidate join indices.

Source

pub fn stats(&self) -> &StatsManager

Get a reference to the runtime statistics manager

Source

pub fn join_index_cache_stats(&self) -> JoinIndexCacheStats

Return persistent join-index manager telemetry.

Source

pub fn reset_for_mc(&mut self)

Reset executor state for Monte Carlo sampling.

Clears relation storage and join index cache while preserving relation registrations.

Source

pub fn reset_for_mc_relations( &mut self, preserve: &[&str], clear_to_empty: &[(&str, Schema)], ) -> Result<()>

Targeted MC reset: preserve base/static relations and clear dynamic ones.

Unlike Self::reset_for_mc which drops all relations, this method keeps the relations listed in preserve untouched, removes every other relation, then re-creates the relations specified in clear_to_empty as empty GPU buffers with the given schemas. The join-index cache is fully invalidated because dynamic relations have changed.

§Arguments
  • preserve - Relation names to keep as-is (base/static facts).
  • clear_to_empty - (name, schema) pairs for dynamic relations that should be present but empty after the reset.
Source

pub fn reset_for_ilp(&mut self)

Reset executor state for ILP attempt reuse.

Clears ILP registry (masks + tagged results), relation storage, join index cache, stats, and profiler. Preserves relation name registrations (rel_names, name_to_rel) since those are immutable compile artifacts.

Source

pub fn stats_mut(&mut self) -> &mut StatsManager

Get a mutable reference to the runtime statistics manager

Source

pub fn stats_snapshot(&self) -> StatsSnapshot

Capture a runtime statistics snapshot, including predicate name mappings.

Use this snapshot to seed the compiler/optimizer on subsequent compilations.

Source

pub fn common_subexpression_stats(&self) -> &CommonSubexpressionStats

Return runtime CSE telemetry for evidence and diagnostics.

Source

pub fn adaptive_reoptimization_stats(&self) -> &AdaptiveReoptimizationStats

Return adaptive re-optimization telemetry for evidence and diagnostics.

Source

pub fn replay_adaptive_reoptimization_decision( &self, observations: &[AdaptiveJoinObservation], ) -> AdaptiveReoptimizationDecision

Replay the deterministic adaptive decision against captured telemetry.

Source

pub fn register_relation(&mut self, rel_id: RelId, name: &str)

Register a relation name for a RelId

This mapping is used when executing Scan nodes to look up relations by their RelId.

§Arguments
  • rel_id - The relation identifier
  • name - The name to associate with the relation
Source

pub fn execute_plan_with_adaptive_candidate( &mut self, baseline_plan: &ExecutionPlan, candidate_plan: &ExecutionPlan, ) -> Result<CudaBuffer>

Execute a baseline plan and conditionally adopt a compiler-supplied re-optimized candidate plan.

The baseline runs through the normal Self::execute_plan path first, producing runtime join telemetry. If deterministic mis-plan thresholds fire and adaptive re-optimization is enabled, the candidate also runs through Self::execute_plan. Candidate outputs are compared on the GPU with deterministic full-row set difference; divergent or failing candidates roll back to the baseline relation/statistics snapshot.

Source

pub fn execute_plan(&mut self, plan: &ExecutionPlan) -> Result<CudaBuffer>

Execute a complete execution plan

Iterates through strata in order, executing each one. Returns the result of the final query if present, or an empty buffer.

§Arguments
  • plan - The execution plan to execute
§Returns

The result buffer from executing the plan

§Errors

Returns an error if any stratum or query execution fails

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,