Optimization Passes

Optimization passes operate on the linear kernel IR. They are divided into always-on passes (run before every compilation) and autotuned passes (searched at runtime for the best variant).

The design goal of the small opset is that new passes are easy to write. There is no fixed number zyx aims to have — passes will be added over time as performance opportunities are identified.

Always-On Optimizations

These run in a fixed pipeline before every kernel compilation:

pub fn run_always_on_optimizations(&mut self) {
    self.constant_folding();
    self.move_constants_to_beginning();
    self.loop_invariant_code_motion();
    self.common_subexpression_elimination();
    self.fold_accs();
    self.delete_empty_loops();
    self.dead_code_elimination();
}

Constant Folding

Evaluates expressions where all inputs are constants. For example, 2 + 3 becomes 5 and sin(0.0) becomes 0.0.

Motion of Constants to Beginning

Moves all constant definitions to the start of the kernel. This creates better fusion opportunities and simplifies register allocation.

Loop-Invariant Code Motion (LICM)

Hoists operations that produce the same value on every iteration to before the loop.

Common Subexpression Elimination (CSE)

Reuses results of identical subexpressions by detecting them through hashing.

Accumulator Folding

Simplifies accumulator update patterns for more efficient reduction code generation.

Dead Code Elimination

Removes ops whose results are never used. This is always the final pass — it ensures backends never receive unreferenced ops.

The autotune system clones the kernel, applies optimization variants, and evaluates each separately. No egraphs — just clone, transform, hash, evaluate. The cost function can evaluate thousands of variants per second.

const AVAILABLE_OPTIMIZATIONS: [OptConfigFn; 6] = [
    Kernel::opt_reassociate_commutative,
    Kernel::opt_split_global_to_local,
    Kernel::opt_upcast,
    Kernel::opt_register_blocking,
    Kernel::opt_tiled_reduce,
    Kernel::opt_split_loop,
];

Start with the initial kernel and run always-on optimizations
Apply ONE optimization variant and run always-on optimizations
Hash the kernel — skip if already visited
Evaluate with cost function (or launch and time)
Repeat by combining with existing optimization sequences
Select the best variant based on actual timing or cost estimate

Correctness Guarantee

No optimization is needed for tests to pass. All tests must pass no matter which sequence of optimizations (including empty) is applied.

If an optimization breaks a test, the optimization is buggy — not the sequence ordering. The verify pass (verify.rs) checks internal IR consistency in debug mode.

Optimization Passes

Always-On Optimizations

Constant Folding

Motion of Constants to Beginning

Loop-Invariant Code Motion (LICM)

Common Subexpression Elimination (CSE)

Accumulator Folding

Delete Empty Loops

Dead Code Elimination

Autotuned Passes

Reassociation

Split Global to Local

Upcast Vectorization

Register Blocking

Tiled Reduction

Loop Splitting

How Autotuning Searches

Correctness Guarantee

Keyboard shortcuts