ADR 003 — Native Rust Training Over Python
Status: Accepted; amended 2026-04-06
Date: 2026-03-02 (original decision)
Author: Bert Brainerd
Current product path: Large-model QLoRA fine-tuning runs entirely in Rust — Candle, qlora-rs, and vox mens train (--backend qlora, --tokenizer hf by default). Python / Unsloth described below is historical context only, not an operator requirement.
Historical context (why we left Python)
The original Mens training pipeline used mens/training/train.py (Python, Unsloth, QLoRA). That caused:
- Environment friction: Python version conflicts, uv/pip pinning, CUDA version mismatches
- Slow iteration: Python-based tokenizer was ~10× slower than native Rust for our dogfood path
- Philosophical mismatch: Vox could not dogfood training if the loop lived in another language
- CI complexity: Separate Python setup and heavy deps on every CI run
Original decision (March 2026): Move the bulk of the pipeline to native Rust (Burn 0.19 for scratch LoRA / experimentation), and initially assumed Python might remain for some large-model QLoRA work.
Amendment: Native Candle + qlora-rs now covers HF-weight QLoRA in-tree. See ADR 006 — Mens full-graph Candle QLoRA with qlora-rs, ADR 007 — qlora-rs multi-layer training API, and the SSOT Mens native training.
Current architecture (summary)
| Concern | Historical (pre–native QLoRA) | Current |
|---|---|---|
| Tokenizer (dogfood / VoxTokenizer JSONL) | Python | Rust (VoxTokenizer in vox-tensor) |
| Data loading (JSONL) | Python loop | Rust JsonlDataLoader |
| Synthetic / CLI data generation | scripts/datagen.py | vox generate-data (Rust) |
| Scratch / Burn LoRA (small model, wgpu) | Python training loop | vox training native / Burn paths in vox-tensor (legacy vs vox mens train dispatch — see SSOT) |
| HF QLoRA (large models) | Python (Unsloth) | Rust: vox mens train → CandleQlora + qlora-rs; weights via Rust hf-hub |
| Corpus extraction | Python | vox mens corpus extract (Rust) |
| Training validation | Python | vox mens corpus eval (Rust via vox-eval) |
Dispatch note: vox mens train is the canonical operator CLI. PopuliTrainBackend::BurnLora is rejected at runtime; the supported in-dispatch trainer for Mens fine-tuning is CandleQlora. Burn remains relevant for legacy checkpoints, vox mens merge-weights, and vox mens serve on merged .bin — not as the primary QLoRA path. Details: mens-training.md.
Implementation pointers
- Candle QLoRA / contract / preflight:
crates/vox-populi/src/mens/tensor/(run_mens_training,lora_train.rs,finetune_contract.rs,preflight_train.rs) - Tokenizer + JSONL loader:
crates/vox-tensor/src/data.rs - Burn model / optim (feature-gated):
crates/vox-tensor/src/vox_nn.rs,optim.rs,train.rs - CLI:
crates/vox-cli—vox mens train, corpus and eval subcommands;training/native.rs,training/datagen.rswhere applicable
Consequences
Positive
- No Python required for HF QLoRA fine-tuning in the default product path.
- Native tokenizer remains fast for VoxTokenizer-shaped JSONL.
- Single
voxbinary for data gen, corpus, eval, and Mens train. - Stronger Windows story than a Python+CUDA training stack.
- Training data schema enforced in Rust (
TrainingPair, contracts, preflight).
Negative / limits (see SSOT, not “use Python”)
- Execution kernel gaps: Full causal NF4 blocks and other limits are documented in candle-full-graph-feasibility.md and mens-training.md.
- Serving: Merged QLoRA artifacts are aimed at external runtimes (vLLM, Ollama, HF, OpenAI-compatible);
vox mens servetoday targets the Burn merged-weights lane. - Burn ecosystem (where still used): fewer optimizers than PyTorch; cold wgpu builds can be heavy — mitigated by feature flags.
- Optional legacy: Old Python scripts may still exist in trees or forks for one-off experiments; they are not the documented or dispatched path for Mens QLoRA.
References
- Mens native training SSOT
- ADR 006 — Mens full-graph Candle QLoRA with qlora-rs
- ADR 007 — qlora-rs multi-layer training API
- ADR 001 — Burn backend selection (Burn rationale; amended for QLoRA)
- Native ML training pipeline
crates/vox-tensor/src/data.rs,crates/vox-cli/src/training/- Burn ML framework