Compiler Architecture
The Vox compiler follows a modular pipeline architecture with conceptual stages. The current implementation is consolidated under crates/vox-compiler/src/, where each stage is represented by explicit modules.
Current implementation note: the practical pipeline is currently consolidated under crates/vox-compiler/src/ for lexer, parser, AST, HIR, typecheck, and emitters. This document keeps conceptual stage boundaries while implementation modules may live in one crate.
Pipeline Overview
Source Code (.vox)
│
▼
┌────────────────┐
│ Lexer │ Tokenization (logos)
└──────┬─────────┘
│ Vec<Token>
▼
┌────────────────┐
│ Parser │ Recursive descent parser → AST Module
└──────┬─────────┘
│ Module (AST root)
▼
┌────────────────┐
│ AST │ Strongly-typed AST wrappers
└──────┬─────────┘
│ Module (Decl, Expr, Stmt, Pattern)
▼
┌────────────────┐
│ HIR │ Desugaring + name resolution + dead code detection
└──────┬─────────┘
│ HirModule
▼
┌────────────────┐
│ Typeck │ Bidirectional type checking + HM inference
└──────┬─────────┘
│ Typed HIR + Vec<Diagnostic>
▼
┌────────────────┐
│ Web IR │ HIR→WebIR lower + validate
└──────┬─────────┘
│ WebIrModule
▼
┌────────────────┐
│ App Contract │ HIR→AppContract (HTTP/RPC/islands/server config)
└──────┬─────────┘
│ AppContractModule
▼
┌────────────────┐
│ Runtime Proj │ HIR→RuntimeProjection (DB/task capability hints)
└──────┬─────────┘
│ RuntimeProjectionModule
▼
┌──────────────────┬─────────────────────┐
│ vox-codegen-rust │ vox-codegen-ts │
│ (quote! → .rs) │ (string → .ts/tsx) │
└──────────────────┴─────────────────────┘
Current path note:
codegen_tsis still the production TS emitter path.VOX_WEBIR_VALIDATEdefaults on (WebIR lower/validate gate); set=0/false/no/offto skip.app_contract::project_app_contractis the SSOT for route/RPC/island/server-config codegen inputs.runtime_projection::project_runtime_from_hiris the SSOT for orchestration-facing DB capability projection.VOX_WEBIR_EMIT_REACTIVE_VIEWSdefaults on so reactiveview:can use the Web IR TSX bridge when parity checks pass; set=0/false/no/offfor legacyemit_hir_exprviews only.
ML Training Pipeline
Vox has a native ML training loop powered by Burn (a pure-Rust deep learning framework):
docs/src/*.md + examples/*.vox
│
▼
vox mens corpus extract # produces validated.jsonl
│
▼
vox mens corpus pairs # produces train.jsonl (instruction-response pairs)
│
▼
vox mens train # native Burn / HF path (default CLI features)
│
▼
mens/runs/v1/model_final.bin
The training loop is defined in crates/vox-cli/src/training/native.rs.
Stage Details
1. Lexer (vox-compiler::lexer)
Purpose: Converts source text into a flat stream of tokens.
Implementation: Uses the logos crate for high-performance, zero-copy tokenization.
Output: Vec<Token> — each token carries its kind and span.
2. Parser (vox-compiler::parser)
Purpose: Transforms a token stream into an AST module.
Implementation: A hand-written recursive descent parser producing ast::decl::Module. The parser is resilient to errors, meaning it continues parsing after encountering invalid syntax — this is critical for LSP support, where the user is actively typing.
Key features:
- Error recovery with synchronization points
- Trailing comma support in parameter lists
- Duplicate parameter name detection
- Indentation-aware formatting (
indent.rs)
See crates/vox-compiler/src/parser/descent/mod.rs for the implementation entrypoint.
Output: Module (AST root) with source spans on declarations and expressions.
3. AST (vox-compiler::ast)
Purpose: Strongly-typed wrappers around the untyped CST nodes.
See crates/vox-compiler/src/ast/ for the node hierarchy.
6. Code Generation
Rust Codegen (vox-compiler::codegen_rust)
Emits Rust source using the quote! macro. Each decorator maps to specific Rust constructs:
| Vox | Generated Rust |
|---|---|
@server fn | Axum handler + route registration |
@table type | Struct + SQLite schema |
@test fn | #[test] function |
@deprecated | #[deprecated] attribute |
actor | Tokio task + mpsc mailbox |
workflow | Plain async function today; interpreted runtime provides partial durable step recording |
TypeScript Codegen (vox-compiler::codegen_ts)
Emits TypeScript/TSX in modular files:
| Module | Output |
|---|---|
jsx.rs | React JSX components |
component.rs | Component declarations and hooks |
activity.rs | Activity/workflow client wrappers |
emitter.rs | TanStack Router trees, optional server fns, islands metadata |
adt.rs | TypeScript discriminated union types |
Normative strategy for reducing frontend emitter complexity while preserving React interop: ADR 012 — Internal web IR strategy. Detailed implementation sequencing and weighted task quotas: Internal Web IR implementation blueprint. Ordered file-by-file execution map: WebIR operations catalog. Canonical current-vs-target representation mapping: Internal Web IR side-by-side schema. Quantified K-complexity delta for the canonical worked app: WebIR K-complexity quantification. Reproducible per-token-class computation: WebIR K-metric appendix.
Supporting Crates
| Crate | Purpose |
|---|---|
vox-cli | vox command-line entry point — see ref-cli.md for the implemented subcommand set |
vox-lsp | Language Server Protocol implementation |
vox-runtime | Tokio/Axum runtime: actors, scheduler, subscriptions, storage |
vox-pm | Package manager: CAS store, dependency resolution, caching |
vox-db | Database abstraction layer |
vox-ludus | Gamification system |
vox-orchestrator | Multi-agent orchestration |
vox-toestub | AI anti-pattern detector |
vox-tensor | Native ML tensors via Burn 0.19 (Wgpu/NdArray backends) |
vox-eval | Automated evaluation of training data quality |
vox-doc-pipeline | Rust-native doc extraction + SUMMARY.md generation |
vox-integration-tests | End-to-end pipeline tests |
Adding a Language Feature
The full checklist for adding a new language construct:
- Lexer — Add tokens to
crates/vox-compiler/src/lexer/token.rs - Parser — Add grammar rules in
crates/vox-compiler/src/parser/descent/ - AST — Add node types in
crates/vox-compiler/src/ast/ - HIR — Map AST → HIR in
crates/vox-compiler/src/hir/lower/ - Type Check — Add inference rules in
crates/vox-compiler/src/typeck/ - WebIR — Add/update lowering + validation semantics in
crates/vox-compiler/src/web_ir/when the feature affects web-facing behavior - Codegen — Emit code in both
crates/vox-compiler/src/codegen_rust/andcrates/vox-compiler/src/codegen_ts/ - Test — Add integration coverage in
vox-integration-tests/tests/and WebIR/parity coverage where applicable - Docs — Add frontmatter + code example in
docs/src/ - Training — Run
vox mens corpus extractto include the new construct in ML data
Next Steps
- Language Reference — Full syntax and feature reference
- Actors & Workflows — Workflow durability and actor persistence
- Ecosystem & Tooling — CLI commands, package manager, LSP
- Web IR operations catalog — numbered compiler/emitter tasks OP-0001–OP-0320 + supplemental OP-S049–OP-S220 batch map
- Web IR acceptance gates G1–G6 — parser, K-metric, parity, and rollout thresholds