Vox source → compiler → Mens training (pipeline SSOT)
This page is the persistent crosswalk for contributors: where .vox files are enforced, how they relate to documentation, and how they reach Mens fine-tuning. It deliberately separates compile-time lexing from training-time tokenization.
1. Authoritative .vox layout
| Tree | Role | Enforcement |
|---|---|---|
examples/golden/**/*.vox | Canonical, training-eligible demos | cargo test -p vox-compiler --test golden_vox_examples (parse → HIR → WebIR validate → Syntax-K metrics) |
examples/parser-inventory/**/*.vox | Negative / recovery fixtures | Must not be mixed into Mens goldens; excluded by SSOT |
| Policy file | Declares golden roots, negative roots, doc scan roots | examples/examples.ssot.v1.yaml |
| mdBook includes | Hash-include paths under docs/src must resolve to existing .vox under examples/golden/ (see Golden Examples corpus) | cargo test -p vox-compiler --test examples_ssot |
Operator entry: examples/README.md.
2. Lexer and parser (language surface)
- Lexer:
crates/vox-compiler/src/lexer/—logos-derivedTokenstream; batch APIlex. - Parser / typechecker / lowering: monolithic
vox-compiler(see Compiler IR pipeline, IR emission SSOT).
The lexer’s keyword inventory is the source-of-truth for what characters become which tokens before AST construction. It does not define Mens vocabulary.
Lexing note: lex currently skips spans that do not match a token (logos errors are dropped). Prefer adding explicit #[token("@…")] entries for documented decorators so source is not silently altered.
3. Documentation corpus
- Verified snippets: pull from
examples/golden/via{{#include}}(see Golden Examples book page, documentation governance). vox mens pipelinemay ingestdocs/srcinto mix-side JSONL; default production mix may remain code-heavy—see Mens native training § documentation corpus lane.
4. Mens training path (model input)
- Golden / codegen pairs:
vox_corpuswalksexamples/golden/**/*.vox(and other configured roots) to build instruction–response rows. - Mix + validate:
mens/config/mix.yaml,vox mens corpus validate, etc.—see Native ML pipeline and Mens native training. - QLoRA default:
vox mens trainuses Hugging Face tokenizer for the chosen base model—notVoxTokenizerand not the compile lexer. LabVoxTokenizerinvox-tensoris a small Burn/dogfood path only.
5. Gap checklist (goldens vs journeys)
Use this when adding files under examples/golden/:
| Journey / capability | Golden coverage (Apr 2026) | Suggested follow-up |
|---|---|---|
Script / CLI vox run | mesh/noop.vox, hello.vox, std_http_wrappers.vox | Optional: dedicated golden/script_args.vox if CLI argv story grows |
| Reactive UI | reactive_counter.vox, dashboard_ui.vox, web_routing_fullstack.vox | Expand when layout_groups grammar lands (see backlog docs) |
| Data + HTTP API | crud_api.vox, blog_fullstack.vox | — |
| Actors / workflows / MCP | counter_actor.vox, checkout_workflow.vox, mcp_tools.vox | — |
@scheduled decorator | scheduled_tick.vox | WebIrModule.scheduled_jobs carries name + interval from HIR |
@pure / @require / @deprecated | ref_effects.vox (regions wired in mdBook API pages) | HTTP Result / Error mapping: http_error_mapping.vox |
Error / Result patterns | http_error_mapping.vox, type_system.vox (partial) | — |
6. Related links
- Language surface SSOT
- Populi data pipeline (mesh / control-plane vs training data)
- Mens training data contract
- Vox corpus lab (research 2026) — Tier B mass corpus, batch lanes, eval harness sketch
- Mens vision and multimodal inputs (research 2026)
- Mens Qwen family migration (research 2026)