Mens strategy inputs checklist
This document is the handoff sheet for the next pass.
Its job is simple:
- confirm that discovery is complete enough,
- make sure the implementation-planning pass uses the new groundwork docs,
- prevent the next pass from redoing research that has already been done.
Required groundwork bundle
The second-pass implementation-planning work should treat the following documents as mandatory inputs:
reference/mens-laziness-accuracy-audit.mdreference/mens-measurement-gap-analysis.mdarchitecture/mens-lane-segmentation-research.mdreference/mens-external-tech-options.mdreference/mens-training.mdreference/mens-qlora-data-strategy.mdreference/mens-training-data-contract.md
What the next pass must not redo
The next pass should not spend most of its tokens rediscovering:
- that output-surface strictness is weaker than desired,
- that metric drift exists between telemetry producers and consumers,
- that docs can contaminate a code-only lane,
- that retrieval and constrained decoding are realistic adoption candidates,
- that Burn is a selective R&D lane rather than the mainline training default.
Those points are already established in this groundwork bundle.
Implementation-planning prerequisites
Before writing a second-pass implementation plan, confirm the following:
A. Audit prerequisites
- Critical and High findings from the laziness/accuracy audit are accepted as real issues or explicitly rejected with rationale.
- The planning pass names a single owner surface for:
- output normalization,
- validity checking,
- scorecard decision thresholds,
- runtime generation metrics.
B. Measurement prerequisites
- The planning pass uses the KPI tiers from the measurement analysis:
- product KPIs,
- diagnostic KPIs,
- contextual metrics.
- It explicitly distinguishes:
- training metrics,
- corpus/data metrics,
- generation/runtime metrics.
- It does not substitute corpus quality metrics for model success metrics.
C. Data-lane prerequisites
- The planning pass states whether lane segmentation is:
- metadata only,
- mixture-level,
- adapter-level,
- benchmark-level,
- or some combination.
- It explicitly protects the code-only lane from prose-target contamination.
- It defines how docs-derived data will be used:
- as code-only supervision,
- as docs/chat supervision,
- as retrieval context,
- or all three in separate lanes.
D. External-technology prerequisites
- Every external technique selected for implementation is assigned one of:
- adopt now,
- prototype,
- watchlist.
- The implementation plan includes why the repo should adopt that technique now instead of later.
- Each selected option has a success metric tied to the KPI contract.
Recommended second-pass structure
The next pass should organize its implementation plan in this order:
-
SSOT unification
- shared normalization,
- shared validity contract,
- shared telemetry/event ownership.
-
metric contract implementation
- fix producer/consumer drift,
- define summary artifacts,
- wire runtime generation metrics.
-
lane segmentation
- metadata contract,
- source routing,
- benchmark separation.
-
adopt-now options
- retrieval/context improvements,
- benchmark strengthening,
- pragmatic decoding constraints.
-
prototype options
- stronger grammar constraints,
- semantic benchmark subsets,
- Burn R&D experiments if the gate still points there.
Decision questions the next pass must answer
The implementation-planning pass should explicitly answer these questions:
Output contract
- What does “code only” mean operationally?
- Is fenced output ever allowed in transport, or is raw code the only target?
- What exact canonicalization sequence becomes the product contract?
Validity contract
- Which function or module becomes the SSOT validator?
- Does validity include HIR and canonicalization re-validation?
- Which narrower validation modes still exist, and why?
Metrics contract
- Which artifact becomes the one comparable benchmark summary?
- Where is
TimeToFirstValidMsrecorded? - Which token accounting source becomes canonical?
- Which current metrics are deprecated or moved to secondary status?
Lane contract
- Which rows belong in the code-only lane?
- Which rows belong in docs/chat lanes?
- Which metadata field is authoritative for lane ownership?
- How will the scorecard benchmark separate lanes?
Burn decision contract
- What specific evidence would justify investing in Burn R&D next?
- What evidence would instead justify staying QLoRA-first?
Suggested second-pass output bundle
The next pass will likely need:
- one implementation strategy document,
- one metrics/schema migration plan,
- one lane-segmentation implementation plan,
- one benchmark rollout plan,
- optional ADR updates if the architecture boundary changes materially.
Completion criteria for the next pass
The second-pass implementation plan will be ready when:
- it names the SSOTs instead of describing parallel alternatives,
- it attaches each proposed change to a measurable KPI improvement,
- it avoids adding a second benchmark or normalization system when an existing one can be extended,
- it makes the code-only lane stricter without blocking future docs/chat/multimodal lanes,
- it explains whether the remaining gap is still a systems problem or has become a backbone-model problem.
Final handoff note
The central strategic question is still the right one:
Are the remaining failures due mostly to missing architecture around Qwen, or due to limits of using a non-Vox-native base model at all?
This groundwork bundle is designed so that the next pass can answer that question with an implementation strategy rather than with another broad discovery pass.