Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Autograd

Zyx implements automatic differentiation through an explicit GradientTape. Unlike other frameworks, there is no separate autograd graph — it uses the same graph as computation.

The Key Insight

In most frameworks, autograd requires a separate graph because the eager execution engine discards intermediate results. Zyx’s lazy graph keeps all nodes alive until realize() is called, and the gradient tape simply prevents their deletion.

GradientTape API

extern crate zyx;
use zyx::{DType, GradientTape, Tensor, ZyxError};
fn main() -> Result<(), ZyxError> {
let tape = GradientTape::new();
let x = Tensor::randn([2, 3], DType::F32)?;
let y = Tensor::randn([2, 3], DType::F32)?;
let z = x.relu() * y.tanh();

let grads = tape.gradient(&z, vec![&x, &y]);
// grads[0] = gradient of z w.r.t. x
// grads[1] = gradient of z w.r.t. y
Ok(())
}

The gradient() method consumes the tape; use gradient_persistent() to keep it alive for higher-order derivatives.

No “requires_grad”

There’s no requires_grad flag on tensors. The tape records the entire graph; when you call gradient(), you specify which tensors you want gradients for:

extern crate zyx;
use zyx::{DType, GradientTape, Tensor, ZyxError};
fn main() -> Result<(), ZyxError> {
let tape = GradientTape::new();
let x = Tensor::randn([2, 3], DType::F32)?;
let y = Tensor::randn([2, 3], DType::F32)?;
let z = y.exp();

let grads = tape.gradient(&z, vec![&x]);  // None — z doesn't depend on x
Ok(())
}

This is more flexible — you don’t need to decide at tensor creation time which tensors will be differentiated.

Higher-Order Derivatives

Using gradient_persistent(), the tape stays alive and you can compute higher-order derivatives:

// Higher-order derivatives example: currently blocked by
// an autograd subtraction overflow bug (autograd.rs:94).

Memory Efficiency

Since the graph is lazy, intermediate tensors needed for backpropagation are not held in memory until realize() is called for the gradient computation. The gradient tape stores only TensorId values — not the actual data. When the tape is dropped, all tape-preserved nodes are released.