"Mens Qwen family migration and native stack (research 2026)"

Mens Qwen family migration and native stack (research 2026)

Executive summary

  • Product default in this repository is already Qwen3.5-class text bases (DEFAULT_MODEL_ID in vox-populi mens/mod.rs, nightly workflow qwen35-native-nightly.yml, Mens training reference).
  • Qwen2 remains in-tree as HfArchitecture::Qwen2, InferenceModel::Qwen2, HF keymap tables, and unit test fixtures using "model_type":"qwen2" JSON snippets. That is intentional compatibility and regression surface, not legacy neglect.
  • Public ecosystem still ships many Qwen2-named weights and LoRA adapters; “delete Qwen2 from Candle” is a semver-scale decision, not a documentation tweak.

This document defines deprecation tiers, a migration story split (runbook vs weight surgery vs code removal), and external references to re-check before any removal milestone.

External references (April 2026 snapshot)

Re-verify URLs and claims before release-blocking decisions.

SourceUse
QwenLM: Qwen3 — Think Deeper, Act FasterProduct positioning: thinking vs non-thinking modes, multi-size lineup.
QwenLM: Qwen2.5-Coder familyCode-specialized line; still a credible baseline for comparisons.
airank.dev: Qwen2.5-Coder-32B vs Qwen3 Coder NextThird-party benchmark/cost framing (non-authoritative).
Hugging Face Transformers: Qwen3_5 model doctext_config / vision_config, multimodal token ids; upstream pages may still contain scaffolding — treat as evolving.

Migration story: three layers of difficulty

LayerMeaningEffort band
A — Operator runbookNew work uses Qwen/Qwen3.5-*; refresh tokenizer.json; train or merge QLoRA; serve via Schola path in Mens serving SSOT; re-run eval on fixed JSONL.Small (documentation + checklist + one dry run).
B — Adapter continuitySame LoRA directory must run on a new base without retrain — may require out-of-tree conversion or may be unsupported; document honestly.Medium to large if promised automatically.
C — Code removalDelete Qwen2 branches in Candle and tests.Large; requires audit, CI matrix, release notes.

Narrative for contributors: default new recipes to Qwen3.5; keep Qwen2 paths until an explicit audit shows zero product dependency; prefer “retrain recommended” over silent weight conversion.

Deprecation tiers (proposal)

TierQwen2 native pathQwen3.5
SupportedLoad + inference + tests maintainedDefault for new training and docs.
FrozenBugfixes only; no new Qwen2-only featuresActive development.
RemovedDelete after migration guide + major boundarySingle text architecture path (names TBD).

Repository audit checklist (for tier movement)

Execute before Frozen or Removed:

  1. rg / search: Qwen2, qwen2, HfArchitecture::Qwen2, InferenceModel::Qwen2 across crates/vox-populi, crates/vox-cli, workflows, contracts/mens/.
  2. Confirm no operator-facing doc promises Qwen2 as default.
  3. Confirm training-presets and DEFAULT_MODEL_ID stay aligned (vox-populi test training_presets_yaml_contract.rs in the workspace crate).
  4. Update Mens training reference cross-links if serve or merge matrix changes.

Qwen3.5-specific technical notes (native stack)

  • Linear / hybrid attention blockshf_keymap.rs branches on HfArchitecture::Qwen35 and layer type (linear_attention vs full attention). Changes to upstream config.json naming must be reflected here.
  • RoPE and preflightqlora_preflight.rs includes Qwen3.5-specific rope key warnings; keep tests when touching layout discovery.
  • Thinking-mode tokens — If training data includes chain-of-thought, define whether Mens supervised spans strip them for vox_codegen lanes (Mens training data contract lane policy).

Multimodal (HF) vs native Candle

Hugging Face Qwen3_5Config documents vision_config and image placeholder token ids. Native Candle QLoRA in this repo remains text-only until a separate ADR and execution planner workstream adds a vision encoder and training contract. Until then, multimodal serving belongs in external runtimes (vLLM, Ollama, HF) as already described in Mens training reference external serving section.

See also

Open questions

  1. Minimum Qwen2 fixture set to keep permanently in vox-populi tests after tier Frozen.
  2. Whether to publish a single external_serving_handoff extension field for base_family when VL is used only for eval, not training.
  3. Official policy on community weight migration scripts (license, no vendoring without review).