ADR 001 — Burn Backend Selection for vox-tensor
Status: Accepted (note 2026-04-06: Mens QLoRA on HF weights uses Candle + qlora-rs in vox-populi, not this Burn stack — see ADR 003, ADR 006, mens-training.md)
Date: 2026-03-02
Author: Bert Brainerd
Context
We needed a native Rust ML training framework for the Mens model. The options were:
- PyTorch via PyO3 — keep Python, use Rust bindings
- Candle (Hugging Face) — Rust ML framework, CUDA-first
- Burn 0.19 — pure-Rust framework with pluggable backends
- ONNX Runtime — inference-only, not useful for training
The goal: train Mens without requiring Python at all, allow CPU and GPU training, and compile on all major platforms including Windows.
Decision
Use Burn 0.19 with Wgpu backend (primary) and NdArray backend (CPU fallback).
#![allow(unused)] fn main() { // Feature-gated in vox-tensor/Cargo.toml [features] default = [] gpu = ["burn/wgpu", "burn/ndarray"] }
The gpu feature gates all Burn code, keeping cargo check --workspace fast (no GPU deps compiled in CI check).
Consequences
Positive:
- Zero Python dependency for the training loop
- Runs on any hardware: CPU (NdArray), AMD/Intel/Metal/Wgpu (GPU)
- Clean Rust type system for tensor shapes prevents shape bugs at compile time
cargo build -p vox-cli --features native-traingives a self-contained training binary
Negative:
- Burn 0.19 API breaks frequently between minor releases (must pin exact versions)
- The Burn
VoxTransformerscratch path does not load full HF base weights the way the Candle QLoRA pipeline does (HF hub + safetensors for Mens isvox mens train --backend qlora, not Burn) - First cold build takes 10-15 min due to Wgpu and SPIR-V compilation
Mitigations:
- Pin
burn = "0.19"everywhere; add[workspace.dependencies]entry - Large-model QLoRA: use native Candle + qlora-rs via
vox mens train(ADR 006, mens-training.md); use Burn for smaller scratch LoRA / legacy merge-weights +vox mens serveflows where still applicable - Move Wgpu to feature flag so CI check builds skip it
Alternatives Considered
Candle (evaluation at the time of picking Burn for vox-tensor)
We chose Burn for the small scratch transformer + wgpu loop in vox-tensor. Candle was not selected for that slice.
- Then: Pro — Hugging Face–maintained, strong CUDA story; Con — we prioritized wgpu portability and kept Candle out of the initial
vox-tensortrainer. - Now: Candle is the Mens HF QLoRA execution kernel (
vox-populi, qlora-rs, optionalmens-candle-cuda/mens-candle-metal). MSVC/CUDA build notes live in workspace build policy (.cursor/rules,AGENTS.md). This ADR’s “alternatives” section records the original decision, not the full 2026 Mens stack.
PyTorch via tch-rs
- Pro: Mature ecosystem, full model zoo access
- Con: Requires LibTorch binary (400MB+), defeats "zero Python" goal
ONNX Runtime
- Pro: Inference is fast
- Con: No training support
References
- Burn framework
crates/vox-tensor/src/vox_nn.rs— VoxTransformer implementation (gpufeature)crates/vox-cli/src/training/native.rs— Training loop