Backend System

Zyx supports multiple hardware backends through an enum dispatch system. All backends are compiled into the library and selected at runtime.

Enum Dispatch

Backends use enums instead of trait objects (dyn Backend). Trait objects would require downcasting to access backend-specific functionality, which is ugly in Rust.

pub enum Device {
    C(CDevice),
    CUDA(CUDADevice),
    OpenCL(OpenCLDevice),
    Vulkan(VulkanDevice),
    WGPU(WGPUDevice),
    HIP(HIPDevice),
    Dummy(DummyDevice),
}

pub enum MemoryPool {
    Host(HostMemoryPool),
    Disk(DiskMemoryPool),
    C(CMemoryPool),
    CUDA(CUDAMemoryPool),
    OpenCL(OpenCLMemoryPool),
    Vulkan(VulkanMemoryPool),
    WGPU(WGPUMemoryPool),
    HIP(HIPMemoryPool),
    Dummy(DummyMemoryPool),
}

Each method matches on the variant and delegates:

impl Device {
    pub fn compile(&self, kernel: &Kernel, debug: DebugMask) -> Result<ProgramId, ZyxError> {
        match self {
            Device::C(dev) => dev.compile(kernel, debug),
            Device::CUDA(dev) => dev.compile(kernel, debug),
            // ...
        }
    }
}

Backend Codegen is Trivial

The optimization passes do the hard work. Backend codegen is:

deSSA — resolve SSA references to physical registers or memory locations
Linear pass — walk the op linked list once, emitting target instructions

No further optimizations, no complex backend-specific lowering. The IR emits directly to the target language.

Initialization

Backends are initialized at startup via initialize_backends():

pub fn initialize_backends(config, memory_pools, devices, debug) {
    host::initialize_pool(memory_pools, debug);
    disk::initialize_pool(memory_pools, debug);
    dummy::initialize_device(&config.dummy, ...);
    c::initialize_device(&config.c, ...);
    cuda::initialize_device(&config.cuda, ...);
    opencl::initialize_device(&config.opencl, ...);
    hip::initialize_device(&config.hip, ...);
    vulkan::initialize_device(&config.vulkan, ...);
    wgpu::initialize_device(&config.wgpu, ...);
    #[cfg(feature = "tenstorrent")]
    tenstorrent::initialize_device(&config.tenstorrent, ...);
}

Each backend tries to initialize. Failure (missing driver, no hardware) causes it to be skipped silently. If all backends fail, the program exits with an error.

Current Backends

Backend	Source	Target	Runtime
C	`c.rs`	C99 (compiled to .so)	Clang/GCC
CUDA	`cuda.rs`	CUDA C (compiled to SASS)	CUDA driver via `libloading`
HIP	`hip.rs`	HIP	ROCm via `libloading`
OpenCL	`opencl.rs`	OpenCL C	OpenCL runtime via `libloading`
Vulkan	`vulkan.rs`	SPIR-V	Vulkan via `ash` crate
WGPU	`wgpu.rs`	SPIR-V	WGPU (feature: `wgpu`)
Dummy	`dummy.rs`	—	No hardware needed (fake device)

All backends except WGPU and Tenstorrent are compiled in by default. WGPU requires --features wgpu. Tenstorrent requires --features tenstorrent.

Device Configuration in Config File

Each backend can be enabled/disabled and configured:

{
    "c": { "enabled": true },
    "cuda": { "device_ids": [0] },
    "opencl": { "platform_ids": [] },
    "dummy": { "enabled": false }
}

If a section is missing or the config file doesn’t exist, defaults are used (most backends enabled).

Device Selection

The scheduler picks a device at realize time:

If DeviceId::AUTO, sort devices by free compute capacity (descending)
If a specific device is requested, try it first
Pick the first device with enough free memory for all required tensors
If no device has enough memory, return an allocation error