"Vox source → compiler → Mens training (pipeline SSOT)"

Vox source → compiler → Mens training (pipeline SSOT)

This page is the persistent crosswalk for contributors: where .vox files are enforced, how they relate to documentation, and how they reach Mens fine-tuning. It deliberately separates compile-time lexing from training-time tokenization.

1. Authoritative `.vox` layout

Tree	Role	Enforcement
`examples/golden/*/.vox`	Canonical, training-eligible demos	`cargo test -p vox-compiler --test golden_vox_examples` (parse → HIR → WebIR validate → Syntax-K metrics)
`examples/parser-inventory/*/.vox`	Negative / recovery fixtures	Must not be mixed into Mens goldens; excluded by SSOT
Policy file	Declares golden roots, negative roots, doc scan roots	`examples/examples.ssot.v1.yaml`
mdBook includes	Hash-include paths under `docs/src` must resolve to existing `.vox` under `examples/golden/` (see Golden Examples corpus)	`cargo test -p vox-compiler --test examples_ssot`

Operator entry: examples/README.md.

2. Lexer and parser (language surface)

Lexer: crates/vox-compiler/src/lexer/ — logos-derived Token stream; batch API lex.
Parser / typechecker / lowering: monolithic vox-compiler (see Compiler IR pipeline, IR emission SSOT).

The lexer’s keyword inventory is the source-of-truth for what characters become which tokens before AST construction. It does not define Mens vocabulary.

Lexing note: lex currently skips spans that do not match a token (logos errors are dropped). Prefer adding explicit #[token("@…")] entries for documented decorators so source is not silently altered.

3. Documentation corpus

Verified snippets: pull from examples/golden/ via {{#include}} (see Golden Examples book page, documentation governance).
vox mens pipeline may ingest docs/src into mix-side JSONL; default production mix may remain code-heavy—see Mens native training § documentation corpus lane.

4. Mens training path (model input)

Golden / codegen pairs: vox_corpus walks examples/golden/**/*.vox (and other configured roots) to build instruction–response rows.
Mix + validate: mens/config/mix.yaml, vox mens corpus validate, etc.—see Native ML pipeline and Mens native training.
QLoRA default: vox mens train uses Hugging Face tokenizer for the chosen base model—not VoxTokenizer and not the compile lexer. Lab VoxTokenizer in vox-tensor is a small Burn/dogfood path only.

5. Gap checklist (goldens vs journeys)

Use this when adding files under examples/golden/:

Journey / capability	Golden coverage (Apr 2026)	Suggested follow-up
Script / CLI `vox run`	`mesh/noop.vox`, `hello.vox`, `std_http_wrappers.vox`	Optional: dedicated `golden/script_args.vox` if CLI argv story grows
Reactive UI	`reactive_counter.vox`, `dashboard_ui.vox`, `web_routing_fullstack.vox`	Expand when `layout_groups` grammar lands (see backlog docs)
Data + HTTP API	`crud_api.vox`, `blog_fullstack.vox`	—
Actors / workflows / MCP	`counter_actor.vox`, `checkout_workflow.vox`, `mcp_tools.vox`	—
`@scheduled` decorator	`scheduled_tick.vox`	`WebIrModule.scheduled_jobs` carries name + interval from HIR
`@pure` / `@require` / `@deprecated`	`ref_effects.vox` (regions wired in mdBook API pages)	HTTP `Result` / `Error` mapping: `http_error_mapping.vox`
Error / `Result` patterns	`http_error_mapping.vox`, `type_system.vox` (partial)	—

Language surface SSOT
Populi data pipeline (mesh / control-plane vs training data)
Mens training data contract
Vox corpus lab (research 2026) — Tier B mass corpus, batch lanes, eval harness sketch
Mens vision and multimodal inputs (research 2026)
Mens Qwen family migration (research 2026)

Vox: The AI-Native Programming Language