"Qwen 3.6 integration research (groundwork)"

Qwen 3.6 integration research (groundwork)

This note is planning and verification only. It does not claim shipped Qwen 3.6 behavior in Vox. Third-party summaries (blogs, aggregators, model-router copy) often lag or misstate open-weight availability and config details—treat them as hypotheses until pinned to primary artifacts below.

Current Vox SSOT for native Candle QLoRA remains Qwen 3.5 (Qwen/Qwen3.5-4B and related tiers); see mens-training.md.

1. Source-of-truth checklist (before any code)

Verify and record links + revision dates for:

ItemWhy it matters for Vox
Official Qwen / Alibaba model card or release postLicense, context limits, modality claims, “thinking” / reasoning behavior
Hugging Face model hub entries (if any)Whether weights exist for local train/merge/serve; config.json, tokenizer_config.json, chat template
model_type and key layout in config.jsonDrives hf_load.rs and hf_keymap.rs
Attention layout (dense, hybrid linear/full, MoE)Whether 3.6 reuses Qwen 3.5 hybrid patterns or needs a new HfArchitecture variant
Special tokens (tool, vision, reasoning, EOS)Tokenization, masking for SFT, completion boundaries in Schola / orchestrator
Context length (advertised vs practical)VRAM, sequence packing, checkpointing policy for local QLoRA

If no Hugging Face–compatible weights appear for a given SKU, native Mens paths in this repo remain out of scope for that SKU until that changes.

2. Vox integration matrix (planning)

SurfaceWhen 3.6 is in scopePreconditions
vox mens train / Candle QLoRAHF (or compatible) safetensors + config that match or extend existing Qwen 3.5 parsingSuccessful qlora_preflight; possible new HfArchitecture::Qwen36 or mapped alias to Qwen35 if keys are compatible
vox-schola serve / merged adaptersSame as above + merge manifest parityAdapter schema and candle_qlora_merge family detection
Orchestrator / remote inference (BYOK, HTTP)API-only or OpenRouter-style ids are fine without local weightsProvider prefix handling (see provider_family_strengths in spec.rs); tokenizer + tool schema documented by provider
MultimodalNot a separate stack from 3.5Extends the same contracts as qwen35-multimodal-phase2-backlog.md (vision/video tokens, corpus, trainer, serve)

3. Risks and vagaries (confirm against official docs)

  • Long context: Advertised millions of tokens vs what local QLoRA can train at a given seq_len and batch; optimizer state and activation memory.
  • Reasoning / chain-of-thought: Extra tokens or template segments affect supervised fine-tuning masks and logprob boundaries; may differ from Qwen 3.5 “thinking” toggles.
  • Tool calling: JSON schema or special tokens may drift from 3.5 Instruct; orchestrator and eval gates need explicit fixtures per model id.
  • Closed-weight or hosted-only SKUs: No local merge of adapters without a compatible open base; plan for remote-only routing and cost/quotas.
  • MoE or new block types: May invalidate assumptions in proxy-stack or full-graph QLoRA preflight; strict preflight should fail closed with a clear operator message.

4. Optional follow-up (implementation phase, later)

  • After official config.json is available, add explicit parsing in hf_load.rs (e.g. HfArchitecture::Qwen36 or map to Qwen35 if key namespaces match model.language_model.layers.*).
  • Extend qlora_preflight.rs with architecture-specific guards and diagnostics.
  • Update contracts/mens/training-presets.v1.yaml and docs only when a concrete default 3.6 base is chosen for the product.