The Kernel IR

The kernel IR is the intermediate representation used for all computation kernels. It is a doubly-linked list of 32-byte OpNodes stored in an arena (the Slab allocator).

Data Structure

pub struct Kernel {
    pub ops: Slab<OpId, OpNode>,
    pub head: OpId,
    pub tail: OpId,
    // ...
}

pub struct OpNode {
    pub prev: OpId,  // u32
    pub next: OpId,  // u32
    pub op: Op,      // 24 bytes (enum + payload)
}

The Slab<OpId, OpNode> is a Vec<OpNode> with a free-list. OpId is a u32 index — random access is O(1).

Before any optimization passes run, the kernel IR is unfolded — LoadView, StoreView, and Move ops are converted to direct index arithmetic (Load, Store with computed indices). After unfolding, all ops are fixed-size inline entries in the arena — no Box, no vtables, no per-op indirection.

The IR is in SSA form, except for Loop, If, and Define ops (which can carry mutable state).

Op Variants

Arithmetic

Op::Cast { x: OpId, dtype: DType }
Op::Unary { x: OpId, uop: UOp }
Op::Binary { x: OpId, y: OpId, bop: BOp }
Op::Mad { x: OpId, y: OpId, z: OpId }

Memory

Op::Define { dtype, scope, ro, len }
Op::Load { src, index, layout }
Op::Store { dst, x, index, layout }
Op::Const(Constant)

Control Flow

Op::Loop { len: Dim }
Op::EndLoop
Op::If { condition: OpId }
Op::EndIf
Op::Barrier { scope }

Indexing

Op::Index { len, scope, axis }

Hardware Accelerators

Op::Wmma { dims, layout, dtype, a, b, c }

Vectorization

Op::Vectorize { ops: Vec<OpId> }
Op::Devectorize { vec: OpId, idx }

View (before unfolding)

Op::Move { x: OpId, mop: Box<MoveOp> }
Op::Reduce { x: OpId, rop, n_axes }

Memory Layouts and Scopes

pub enum MemLayout {
    Scalar,
    Vector(u8),
    Tile { x, y, stride },
}

pub enum Scope {
    Global,
    Local,
    Register,
}

Backend Codegen

Because the IR is designed for it, backend codegen is trivial:

deSSA — resolve SSA references to physical registers/memory
Linear pass — walk the op linked list once, emitting instructions

No further optimizations, no complex lowering.

Debugging

Set ZYX_DEBUG=8 to print the kernel IR:

r18: i32 = def global, len=4
r44: u32 = gidx0    // 0..=0
r19: i32 = r18[r1]  // 0..=3 load

Keyboard shortcuts