Mens / HF fine-tune — LLM PR checklist
Use this when agents or humans touch vox-populi Mens training (mens-train), merge commands, LoRA/QLoRA, or parity tests. Goal: avoid typical context-blind mistakes (wrong crate, wrong layout, doc drift).
Duplication and ownership
-
Two
lora.rstrees:crates/vox-tensor/src/lora.rs(primitives) vscrates/vox-populi/src/mens/tensor/lora.rs(transformer + merge). Fixes to linear LoRA math may need both or a deliberate consolidation. Canonical split:mens-lora-ownership.md. -
CLI / operator strings: user-facing merge errors should stay aligned with
MERGE_QLORA_REJECTS_BURN_BINintensor/artifact_bridge.rs; grep SSOT markdown when changing wording. Planner / QLoRA preflight gates sharetensor/operator_messages.rs— update there when changing tokenizer or weight-path errors.
Feature flags and API
-
cfg(feature = "mens-train")onvox-populiexports (e.g.MERGE_QLORA_REJECTS_BURN_BIN): every binary that needs them must enablevox-populi/mens-train(seevox-cligpufeature wiring). -
Format strings: wrapping
anyhow!/bail!messages that contain{— escape as{{/}}where needed.
Tensor layout (Burn vs Candle)
-
Matmul orientation: state explicitly e.g.
x [batch, in] @ W [in, out]; qlora-rs stores base weight as[out_features, in_features]and usesinput.matmul(&weight.t()). -
Bias broadcast: Burn often needs
bias.reshape([1, out]); Candle usesbroadcast_add— confirm ranks. - Tolerances: tight for shared f32 primitives; loose / statistical for end-to-end training — never one global epsilon for everything.
Tests and CI
-
CI job names vs runbook:
.github/workflows/ci.ymlMens steps should stay aligned withmens-finetune-acceptance-runbook.md(samecargo testfilters, e.g.execution_plannernot multiple filters on one line). -
Strict QLoRA proxy stack: regression
preflight_strict_rejects_missing_o_projmust stay green when changingqlora_preflight/ planner middle-key inventory. -
CI job vs test binary:
.github/workflows/ci.yml--test <name>must matchcrates/vox-populi/tests/<name>.rs(orsrc/…integration tests as wired). - GPU-only tests { must not be the only coverage for logic that also runs on CPU / NdArray.
-
Path edge cases: e.g.
merge-qlora*.bindetection — consider double extensions and Windows paths when adding guards.
Documentation
-
Same change, two docs: behavior visible to users should match
AGENTS.md(Mens subsection) anddocs/src/reference/mens-training.mdwhere applicable. -
NF4 wording { Burn path is f32 LoRA; Candle
--backend qlorais qlora-rs NF4 — do not conflate in CLI blurbs.
Vox web / training corpus
-
Express /
server.ts: treatVOX_EMIT_EXPRESS_SERVER=1as legacy / opt-in in training text; default story is Axum +api.ts(seevox-fullstack-artifacts.md). -
Examples: prefer golden
examples/*.voxfromexamples/README.md; avoid ingestingexamples/archive/**unless the pipeline explicitly opts in.
Merge / attention
-
RoPE: no silent merge to static
MultiHeadAttention;use_ropestacks need explicit unmerged serve or documented limitation (seeLoraAttention::mergerustdoc).
Parity strategy (reminder)
| Tier | What it proves |
|---|---|
| A | Shared f32 ops: matmul, biased linear, CE (candle_burn_*_parity tests). |
| B | NF4 round-trip → same f32 tensor → Burn vs Candle matmul (candle_burn_nf4_dequant_lm_reference_parity). |
| C | Avoid: single tight tolerance on full NF4 proxy vs full Burn LM without identical graph and reference path. |