"HF fine-tuning capability matrix (code-grounded)"

HF fine-tuning capability matrix (code-grounded)

Single control plane: crates/vox-populi/src/mens/tensor/finetune_contract.rs (FineTuneContract) + execution_planner.rs (ExecutionPlanner). Execution kernels: Burn (wgpu LoRA) vs Candle (qlora-rs NF4).

CapabilityBurn kernel (PopuliTrainBackend::BurnLora)Candle kernel (PopuliTrainBackend::CandleQlora)
Training graph depthFull causal stack: LoraVoxTransformer → blocks → LM head (tensor/lora.rs).Proxy stack: optional per-layer o_proj / GPT-2 c_proj as sequential QuantizedLinear + tied LM head; not full MHA/FFN blocks (candle_qlora_train.rs).
Base quantizationNone in production path (f32 LoRA bases). NF4 base is not implemented (lora.rs module docs).NF4 frozen bases via qlora-rs on stacked linears + LM head.
TokenizerVox (VoxTokenizer ChatML) default; HF tokenizer.json when --tokenizer hf + GPT-2 HF layout (contract-gated).HF only (tokenizer.json); enforced in qlora_preflight.rs.
Weight loadingHF warm-start: token embeddings + GPT-2 decoder blocks (Q/K/V split from c_attn, MLP, norms, wpe, ln_f, optional lm_head) when shapes match (burn_hf_load.rs).mmap f32 embedding table + selected projection keys from shards.
ArtifactsBurn *.bin checkpoints (Checkpoint); merge-weights → merged VoxTransformer.candle_qlora_adapter*.safetensors v2 + sidecar meta; v3 unified schema (adapter_schema_v3.rs); merge-qlora subset merge.
Merge fidelityLoraAttention {:merge() → Burn MultiHeadAttention with merged Q/K/V when use_rope == false; RoPE stacks cannot merge to static linears (see lora.rs).Deterministic f32 delta merge for exported keys (candle_qlora_merge.rs).
Cross-stack logits parityNot asserted end-to-end (NF4 vs f32 LoRA, different graphs). Touchpoints: tests/candle_burn_f32_matmul_parity.rs (matmul); tests/candle_burn_f32_linear_lm_logits_parity.rs (biased linear / LM-head-shaped f32 logits); tests/candle_burn_nf4_dequant_lm_reference_parity.rs (Tier B: qlora-rs NF4 round-trip → shared f32 W → Burn vs Candle LM-shaped linear); tests/candle_burn_cross_entropy_parity.rs (CE on shared logits).Same integration tests.

Token / label policy

  • Shared helpers: tensor/training_text.rsplain_system_prompt_response (Candle), ChatML supervision strings + hf_tokenize_chatml_supervised (Burn + HF).
  • Candle objective: last-token LM loss on concatenated plain text (see candle_qlora_train.rs).
  • Burn objective: token-level CE with prompt masked at -100 (ChatML boundary), Vox or HF tokenizer.

Feature flags

BuildNotes
vox-populi/mens-gpuBurn + tokenizers + safetensors for HF-aware Burn path.
vox-populi/mens-trainmens-gpu + candle-qlora + qlora-rs (CLI gpu feature pulls this chain).

Burn production policy

Burn training is held as an opt-in research lane. Promotion to production requires scorecard evidence with explicit backend comparisons (backend=burn vs backend=qlora) over at least two benchmark cycles, including syntax + semantic KPI deltas and runtime repair KPIs.