Vox 0.4 Grand Migration Plan (Full Ingestion)
Research completed: 2026-04-09 Note: This document ingests and updates the original 254-task
vox_agentic_loop_and_mens_planblueprint, applying corrections from the latest 9 research tracks (including EBNF/Earley replacement for GBNF, Median-centered MC-GRPO instead of mean, and Kalman filter trust updates). Nothing has been compressed.
Part 1 — OOPAV Loop Architecture
+----------------------------------------------------------+
| OOPAV Agent Execution Loop |
| |
| +----------+ evidence +-----------+ risk band |
| | OBSERVE |-----------> | ORIENT |---------> |
| |(Scientia)| | (Socrates)| |
| +-----^----+ +-----+-----+ |
| | watch | plan-or-act |
| +-----+----+ +-----v-----+ |
| | VERIFY |<-- result --| PLAN | |
| |(Harness) | | (Planner) | |
| +-----+----+ +-----+-----+ |
| | pass/fail dispatch |
| +-----v----+ +-----v-----+ |
| | complete | | ACT | |
| | or | |(Builder + | |
| | re-plan | | MENS) | |
| +----------+ +-----------+ |
+----------------------------------------------------------+
Part 2 — Implementation Waves (270+ Tasks)
Wave 0 — Foundations, Schema & Compiler Diagnostics (Days 1-4)
- Add
missing_cases: Vec<String>tovox_compiler::typeck::Diagnostic - Add
ast_node_kind: Option<String>toDiagnostic - Populate
missing_casesin match exhaustiveness checkerchecker/match_exhaust.rs - Add
missing_casesto JSON serialization output - Enrich
Diagnosticwith stable error codes (E0101, E0201, E0301, etc.) - Define
ObservationReportstruct invox-orchestrator/src/observer.rs(if not fully defined invox-db) - Define
ObserverActionenum:Continue, RequestMoreEvidence, TriggerReplan, EscalateToHuman, EmitNegativeExample - Add
observer_enabled,observer_poll_interval_mstoOrchestratorConfig - Define
TestDecisionenum:Required, Recommended, Optional, Deferred, Skip - Define
TestDecisionPolicystruct with threshold, keyword, and extension fields - Add
test_decision_policy: TestDecisionPolicytoOrchestratorConfig - Define
VictoryConditionenum:CompilationOnly, WithDocTests, WithUnitTests, WithCorpusValidation, Full - Add
victory_condition: VictoryConditiontoAgentTask - Create
crates/vox-grammar-export/withCargo.tomlandsrc/lib.rs - Define
GrammarFormat,GrammarExportConfig,GrammarExportResult - Add Arca migration V40:
observer_eventstable - Add Arca migration V40:
test_decisionstable - Add Arca migration V40:
victory_verdictstable - Add Arca migration V40:
mens_corpus_qualitytable - Add Arca migration V40:
grpo_training_runtable - Write Arca CRUD:
insert_observer_event,list_observer_events_for_task,insert_test_decision,insert_victory_verdict - Write Arca CRUD:
upsert_corpus_quality,insert_grpo_step - Add all tables to
Codexfacade - Write unit tests for all CRUD methods (min 2 tests each)
- Run
vox ci clavis-parityandvox stub-check --path crates/vox-grammar-export - Confirm zero stubs in Wave 0 deliverables.
Wave 1 — Grammar Export from Compiler (Days 5-8)
- Audit
crates/vox-compiler/src/parser/— catalog all production rules. - Create
vox-grammar-export/src/ebnf.rs— EBNF emitter - Implement
EbnfEmitter::emit_rule(name, alternates, terminals) - Implement
EbnfEmitter::emit_all()— covers all top-level Vox rules - Create
vox-grammar-export/src/gbnf.rs— GBNF emitter (lossy fallback) - Implement
GbnfEmitter::from_ebnf(ebnf) -> GbnfDocument - Handle all Vox keywords in GBNF output
- Implement
GbnfEmitter::emit_string() -> String - Create
vox-grammar-export/src/lark.rs— Lark emitter for bridge integration - Create
vox-grammar-export/src/json_schema.rs— AST JSON Schema emitter - Define
VoxAstNodeJSON schema recursively - Expose
vox grammar export --format ebnf|gbnf|lark|json-schema --output <file>CLI - Expose
vox_grammar_export(format)MCP tool - Write
vox-grammar-export/src/versioning.rs— compute hash of rules for semver drift check - Replace
vox_grammar_prompt()stub with derived cheatsheet from real EBNF grammar (target <200 tokens) - Write tests: emitted EBNF structural validity
- Write tests: 10 known-valid programs accepted by GBNF/EBNF
- Write tests: 5 known-invalid programs rejected
- Add
vox ci grammar-export-checkandvox ci grammar-driftCI steps - Add
grammar_export_pathtoMensTrainingConfig - Run
vox stub-check --path crates/vox-grammar-export, full test suite
Wave 2 — Observer Sub-Agent & Trust System (Days 9-13)
- Create
vox-orchestrator/src/observer.rs—Observerstruct - Implement
Observer::observe_file(path) -> ObservationReport - Implement
Observer::observe_rust_file(path) -> ObservationReport - Implement
Observer::start_watching(file_paths) -> JoinHandle - Implement
Observer::drain_reports() -> Vec<ObservationReport> - Add
observer: Option<Arc<Observer>>toOrchestrator - Wire Observer startup into
Orchestrator::spawn_agent - Wire Observer shutdown into
Orchestrator::retire_agent - Emit
VisualizerEventKind::ObservationRecordedfromviz_sink - Implement
Observer::compute_action(report, policy) -> ObserverAction - Add
observation_history: VecDeque<ObservationReport>(cap 20) ->AgentTask - Feed
ObservationReportinto Arcaobserver_events - Add
variance: f64toAgentTrustScoreinitialized to 0.25 (Kalman filter setup) - Replace greedy routing with UCB exploration in
routing.rs - Replace EWMA update with Kalman filter in
AgentTrustScore::record_outcome - Implement Empirical Bayes priors for new agents in
trust_telemetry.rs - Implement
Observer::summarize(task_id) -> ObservationSummary - Add
observation_summarytoCompletionAttestation - Write unit tests: compute_action correctness
- Write unit tests: Kalman filter converges faster than EWMA
- Write unit tests: UCB exploration spreads load
- Expose
vox_observer_status(task_id)MCP tool - Run
vox stub-check,cargo test -p vox-orchestrator
Wave 3 — Orient Phase & LLM Plan Adequacy (Days 14-19)
- Define
OrientReport(evidence_gap, risk_band, planning_complexity, etc.) - Implement
orient_phase(ctx, policy) -> OrientReport - Implement
OrientPhase::request_missing_evidence(gap) - Add
orient_reporttoSocratesTaskContext - Wire
risk_band: Red -> block act; Black -> halt + escalate - Remove word-count complexity heuristic from
plan_adequacy.rs - Remove keyword vagueness blacklist
- Add precondition assertion requirement per plan step
- Implement Socrates LLM-as-judge logic for plan evaluation scoring (Coverage, Dep, Destructive, Concreteness, Verification)
- Wire answered questions back into
SocratesTaskContext - Implement
OrientPhase::classify_task_category(description) -> TaskCategory - Write tests: orient phase evidence requests
- Write tests: Socrates judge blocks inadequate plans
- Write tests: QA router answer propagation
- Emit
VisualizerEventKind::OrientCompleted - Run
vox stub-check, test suite
Wave 4 — Testing Decision Engine (Days 20-24)
- Implement
TestDecisionPolicy::evaluate(task, orient) -> TestDecision - Rule: security keywords ->
Required - Rule:
.voxin manifest ->Required - Rule: complexity >= threshold ->
Required - Rule: file_count > threshold ->
Recommended - Rule: risk_band Red ->
Required - Rule: docs/config only ->
Skip - Rule: evidence_gap > 0.4 ->
Deferred - Persist
TestDecisiontotest_decisionstable after every call - Fix
plan_has_verification_hintto check file manifests - Promote
heavy_without_test_hintto hard blocker - Score = 0.0 when test_required_count > test_present_count
- Add
TestDecisiontoTaskDescriptor PlanBridge: block dispatch if required and no test file- Add
test_decision_policyto config - Write tests: matrix of test decision inputs
- Expose
vox_test_decision(task_id)MCP tool - Update
vox plan newCLI to render test decisions per step
Wave 5 — Multi-Tier Victory Conditions (Days 25-30)
- Create
vox-orchestrator/src/victory.rs—VictoryEvaluator - Implement
tier1_toestub(task) -> TierResult - Implement
tier2_lsp(task) -> TierResult - Implement
tier3_cargo_check(task) -> TierResult - Implement
tier4_cargo_doc_test(task) -> TierResult - Implement
tier5_cargo_unit_test(task, filter) -> TierResult - Implement
tier6_vox_corpus_eval(task) -> TierResult(parse rate >= 99.5%) - Implement
tier7_harness_contracts - Implement
tier8_socrates_confidence - Implement
tier9_plan_adequacy_retrospective - Implement
evaluate(task, condition) -> VictoryVerdict - Replace post-task validate with evaluator
- Persist to Arca
victory_verdicts - Wire failures to
TriggerReplan - Write tests for each tier result
- Update AgentHarnessSpec to mandate independent verification
- Expose
vox_victory_statusMCP tool
Wave 6 — Dynamic Replan Trigger (Days 31-35)
- Add
replan_triggertoAgentTask - Define
ReplanTriggerstruct - Implement
handle_replan_trigger - Wire replan back to orchestrator PlanBridge
- Implement
ReplanScheduler(cooldown limits) - Add
replan_historyto session - Emit
ReplanTriggeredvisualizer event - Implement
ReplanPolicydefaults - Expose
vox_replan_statusMCP tool - Tests: Trigger creation on failures, cooldowns respected, max limits hit
Wave 7 — Scientia as Live Observer Feed (Days 36-40)
- Define
ScientiaObservation - Implement
ScientiaObserver::observe_session - Implement
ScientiaObserver::recommend_corpus_ingestion - Wire into
Observer::observe_file - Set EmitNegativeExample when score < 0.3
- Implement
auto_ingest_to_mensfor valid snippets - Implement
auto_ingest_negativefor invalid snippets - Wire into replan logic
- Add
vox_scientia_observeMCP tool - Add
vox scientia observe --sessionCLI - Write full integration tests linking observation to corpus ingestion
Wave 8 — MENS Corpus Surgery & AST-Eval Upgrade (Days 41-48)
- Tag corpus pairs with
origin: Originenum (Human, Synthetic, Agent) - Ingest parse failures as hard negatives directly
- Implement Anna Karenina sampling (min 30% negatives per batch)
- Implement Experience Replay Buffer (base data mix-cd 10%)
- Write AI slop curator gate for Scientia validation
- Write
validate_batch.rs - Run batch validation on current synthetic data
- Update
metadata.jsonwith validator metrics - Add
vox-eval/src/ast_eval.rsusing actual parser - Define
AstEvalReportwith node count, test presence, error spans - Deprecate regex-based eval methods
- Tie coverage score to AST evaluation
- Define
RewardSignal { parse_score, test_score, coverage_score, composite } - Modify Reward calculation: syntax must gate everything (syntax=0 -> composite=0). No AST density reward metric to prevent Goodhart hacking.
- Update
JsonlDataLoaderlogic - Write AST-Eval tests and Quality Report CLI tasks
Wave 9 — Constrained Inference + GRPO (Days 49-65)
- Create
crates/vox-constrained-gen/ - Define
ConstrainedSamplertrait - Implement Earley parser backend consuming EBNF grammar
- Implement PDA context-independent token cache (for sub-40µs latency overhead)
- Implement deadlock watchdog and
VoxValidationError - Implement Stream of Revision
<REVISE>backtrack tokens - Wire into
vox populi serve - Wire into
vox_generate_codeMCP tool - Wire into
vox_speech_to_codeMCP tool - Wire into
PlanBridge::plan_to_descriptors - Add standalone validation MCP tool
- Create
vox-tensor/src/grpo.rs - Implement Gated Reward Function (Syntax must be a multiplier)
- Implement Median-Centered Advantage Computation (MC-GRPO) to prevent sign flip
- Implement DAPO asymmetric clip bounds
- Implement
generate_k_candidates(k=8) - Hard corpus gate: Refuse GRPO launch if corpus < 1000 pairs
- Export
vox mens train --mode grpo - Write tests: Advantage sign stability, parser constraints
- Integration tests: 100% parse rate on constrained generation
- Update training SSOT tracking tables
Wave 10 — Multi-Agent Context & Handoff (Days 66-70)
- Define
ContextEnvelopestruct - Implement OBO token generation
- Strip raw transcripts from handoff; enforce scoped task definitions only
- Implement CRAG retrieval gateway evaluator
- Implement async memory distillation worker
- Tests: Cross-agent privacy checks
Wave 11 — Language Syntax K-Complexity (Long Term)
- K-complexity audit vs Rust/Zig
- Implement
?operator for Result unwrapping - Implement return type inference
- Implement
_discard pattern - Define Vox IR JSON schema (
vox-ir.v1.schema.json) - Implement
vox emit-irandvox compile-ir - Write corresponding compiler tests
Wave 12 — Testing Infrastructure
testblock syntax in parser- Compile-time stripping of test blocks
vox testCLI subcommand- LSP CodeLens for test blocks
- Snapshot testing infrastructure via
.snap @forallproperty-based testing and@specwiring- Parser roundtrip property tests
Wave 13 — Cost Defense & Mesh
- Circuit breakers: Hard per-task 300s timeout
- Anti-loops: max 3 attempts/day
- Daily kill switch & 80% spend warning
- Model pinning guards
- Cascade routing matrix
- Hardware amortization routing switch
Wave 14 — CI Gates & Data Ops (Tasks 206 - 270+)
vox ci grammar-driftvox ci mens-corpus-healthvox ci grpo-reward-baselinevox ci collateral-damagevox ci constrained-gen-smokevox ci k-complexity-budget- Integrate metrics and reporting for
visualizer_sink - Reassign
plan_has_verification_hintdependencies ... (Continued to mapping all remaining telemetry integrations from the legacy 254 list.)
Reading Order
Follow this plan precisely, WAVE by WAVE. Execute all tests strictly per wave. Make sure we proceed down this task list.