"Mens Qwen family migration and native stack (research 2026)"

Mens Qwen family migration and native stack (research 2026)

Executive summary

Product default in this repository is already Qwen3.5-class text bases (DEFAULT_MODEL_ID in vox-populi mens/mod.rs, nightly workflow qwen35-native-nightly.yml, Mens training reference).
Qwen2 remains in-tree as HfArchitecture::Qwen2, InferenceModel::Qwen2, HF keymap tables, and unit test fixtures using "model_type":"qwen2" JSON snippets. That is intentional compatibility and regression surface, not legacy neglect.
Public ecosystem still ships many Qwen2-named weights and LoRA adapters; “delete Qwen2 from Candle” is a semver-scale decision, not a documentation tweak.

This document defines deprecation tiers, a migration story split (runbook vs weight surgery vs code removal), and external references to re-check before any removal milestone.

External references (April 2026 snapshot)

Re-verify URLs and claims before release-blocking decisions.

Source	Use
QwenLM: Qwen3 — Think Deeper, Act Faster	Product positioning: thinking vs non-thinking modes, multi-size lineup.
QwenLM: Qwen2.5-Coder family	Code-specialized line; still a credible baseline for comparisons.
airank.dev: Qwen2.5-Coder-32B vs Qwen3 Coder Next	Third-party benchmark/cost framing (non-authoritative).
Hugging Face Transformers: Qwen3_5 model doc	`text_config` / `vision_config`, multimodal token ids; upstream pages may still contain scaffolding — treat as evolving.

Migration story: three layers of difficulty

Layer	Meaning	Effort band
A — Operator runbook	New work uses `Qwen/Qwen3.5-*`; refresh `tokenizer.json`; train or merge QLoRA; serve via Schola path in Mens serving SSOT; re-run eval on fixed JSONL.	Small (documentation + checklist + one dry run).
B — Adapter continuity	Same LoRA directory must run on a new base without retrain — may require out-of-tree conversion or may be unsupported; document honestly.	Medium to large if promised automatically.
C — Code removal	Delete `Qwen2` branches in Candle and tests.	Large; requires audit, CI matrix, release notes.

Narrative for contributors: default new recipes to Qwen3.5; keep Qwen2 paths until an explicit audit shows zero product dependency; prefer “retrain recommended” over silent weight conversion.

Deprecation tiers (proposal)

Tier	Qwen2 native path	Qwen3.5
Supported	Load + inference + tests maintained	Default for new training and docs.
Frozen	Bugfixes only; no new Qwen2-only features	Active development.
Removed	Delete after migration guide + major boundary	Single text architecture path (names TBD).

Repository audit checklist (for tier movement)

Execute before Frozen or Removed:

rg / search: Qwen2, qwen2, HfArchitecture::Qwen2, InferenceModel::Qwen2 across crates/vox-populi, crates/vox-cli, workflows, contracts/mens/.
Confirm no operator-facing doc promises Qwen2 as default.
Confirm training-presets and DEFAULT_MODEL_ID stay aligned (vox-populi test training_presets_yaml_contract.rs in the workspace crate).
Update Mens training reference cross-links if serve or merge matrix changes.

Qwen3.5-specific technical notes (native stack)

Linear / hybrid attention blocks — hf_keymap.rs branches on HfArchitecture::Qwen35 and layer type (linear_attention vs full attention). Changes to upstream config.json naming must be reflected here.
RoPE and preflight — qlora_preflight.rs includes Qwen3.5-specific rope key warnings; keep tests when touching layout discovery.
Thinking-mode tokens — If training data includes chain-of-thought, define whether Mens supervised spans strip them for vox_codegen lanes (Mens training data contract lane policy).

Multimodal (HF) vs native Candle

Hugging Face Qwen3_5Config documents vision_config and image placeholder token ids. Native Candle QLoRA in this repo remains text-only until a separate ADR and execution planner workstream adds a vision encoder and training contract. Until then, multimodal serving belongs in external runtimes (vLLM, Ollama, HF) as already described in Mens training reference external serving section.

Open questions

Minimum Qwen2 fixture set to keep permanently in vox-populi tests after tier Frozen.
Whether to publish a single external_serving_handoff extension field for base_family when VL is used only for eval, not training.
Official policy on community weight migration scripts (license, no vendoring without review).

Vox: The AI-Native Programming Language