"Vox 0.4 Grand Migration Plan (Uncompressed)"

Vox 0.4 Grand Migration Plan (Full Ingestion)

Research completed: 2026-04-09 Note: This document ingests and updates the original 254-task vox_agentic_loop_and_mens_plan blueprint, applying corrections from the latest 9 research tracks (including EBNF/Earley replacement for GBNF, Median-centered MC-GRPO instead of mean, and Kalman filter trust updates). Nothing has been compressed.

Part 1 — OOPAV Loop Architecture

+----------------------------------------------------------+
|                 OOPAV Agent Execution Loop               |
|                                                          |
|  +----------+  evidence   +-----------+  risk band       |
|  | OBSERVE  |-----------> |  ORIENT   |--------->        |
|  |(Scientia)|             | (Socrates)|                  |
|  +-----^----+             +-----+-----+                  |
|        | watch                  | plan-or-act            |
|  +-----+----+             +-----v-----+                  |
|  |  VERIFY  |<-- result --|   PLAN    |                  |
|  |(Harness) |             | (Planner) |                  |
|  +-----+----+             +-----+-----+                  |
|        | pass/fail          dispatch                     |
|  +-----v----+             +-----v-----+                  |
|  | complete |             |    ACT    |                  |
|  |  or      |             |(Builder + |                  |
|  | re-plan  |             |  MENS)    |                  |
|  +----------+             +-----------+                  |
+----------------------------------------------------------+

Part 2 — Implementation Waves (270+ Tasks)

Wave 0 — Foundations, Schema & Compiler Diagnostics (Days 1-4)

  1. Add missing_cases: Vec<String> to vox_compiler::typeck::Diagnostic
  2. Add ast_node_kind: Option<String> to Diagnostic
  3. Populate missing_cases in match exhaustiveness checker checker/match_exhaust.rs
  4. Add missing_cases to JSON serialization output
  5. Enrich Diagnostic with stable error codes (E0101, E0201, E0301, etc.)
  6. Define ObservationReport struct in vox-orchestrator/src/observer.rs (if not fully defined in vox-db)
  7. Define ObserverAction enum: Continue, RequestMoreEvidence, TriggerReplan, EscalateToHuman, EmitNegativeExample
  8. Add observer_enabled, observer_poll_interval_ms to OrchestratorConfig
  9. Define TestDecision enum: Required, Recommended, Optional, Deferred, Skip
  10. Define TestDecisionPolicy struct with threshold, keyword, and extension fields
  11. Add test_decision_policy: TestDecisionPolicy to OrchestratorConfig
  12. Define VictoryCondition enum: CompilationOnly, WithDocTests, WithUnitTests, WithCorpusValidation, Full
  13. Add victory_condition: VictoryCondition to AgentTask
  14. Create crates/vox-grammar-export/ with Cargo.toml and src/lib.rs
  15. Define GrammarFormat, GrammarExportConfig, GrammarExportResult
  16. Add Arca migration V40: observer_events table
  17. Add Arca migration V40: test_decisions table
  18. Add Arca migration V40: victory_verdicts table
  19. Add Arca migration V40: mens_corpus_quality table
  20. Add Arca migration V40: grpo_training_run table
  21. Write Arca CRUD: insert_observer_event, list_observer_events_for_task, insert_test_decision, insert_victory_verdict
  22. Write Arca CRUD: upsert_corpus_quality, insert_grpo_step
  23. Add all tables to Codex facade
  24. Write unit tests for all CRUD methods (min 2 tests each)
  25. Run vox ci clavis-parity and vox stub-check --path crates/vox-grammar-export
  26. Confirm zero stubs in Wave 0 deliverables.

Wave 1 — Grammar Export from Compiler (Days 5-8)

  1. Audit crates/vox-compiler/src/parser/ — catalog all production rules.
  2. Create vox-grammar-export/src/ebnf.rs — EBNF emitter
  3. Implement EbnfEmitter::emit_rule(name, alternates, terminals)
  4. Implement EbnfEmitter::emit_all() — covers all top-level Vox rules
  5. Create vox-grammar-export/src/gbnf.rs — GBNF emitter (lossy fallback)
  6. Implement GbnfEmitter::from_ebnf(ebnf) -> GbnfDocument
  7. Handle all Vox keywords in GBNF output
  8. Implement GbnfEmitter::emit_string() -> String
  9. Create vox-grammar-export/src/lark.rs — Lark emitter for bridge integration
  10. Create vox-grammar-export/src/json_schema.rs — AST JSON Schema emitter
  11. Define VoxAstNode JSON schema recursively
  12. Expose vox grammar export --format ebnf|gbnf|lark|json-schema --output <file> CLI
  13. Expose vox_grammar_export(format) MCP tool
  14. Write vox-grammar-export/src/versioning.rs — compute hash of rules for semver drift check
  15. Replace vox_grammar_prompt() stub with derived cheatsheet from real EBNF grammar (target <200 tokens)
  16. Write tests: emitted EBNF structural validity
  17. Write tests: 10 known-valid programs accepted by GBNF/EBNF
  18. Write tests: 5 known-invalid programs rejected
  19. Add vox ci grammar-export-check and vox ci grammar-drift CI steps
  20. Add grammar_export_path to MensTrainingConfig
  21. Run vox stub-check --path crates/vox-grammar-export, full test suite

Wave 2 — Observer Sub-Agent & Trust System (Days 9-13)

  1. Create vox-orchestrator/src/observer.rsObserver struct
  2. Implement Observer::observe_file(path) -> ObservationReport
  3. Implement Observer::observe_rust_file(path) -> ObservationReport
  4. Implement Observer::start_watching(file_paths) -> JoinHandle
  5. Implement Observer::drain_reports() -> Vec<ObservationReport>
  6. Add observer: Option<Arc<Observer>> to Orchestrator
  7. Wire Observer startup into Orchestrator::spawn_agent
  8. Wire Observer shutdown into Orchestrator::retire_agent
  9. Emit VisualizerEventKind::ObservationRecorded from viz_sink
  10. Implement Observer::compute_action(report, policy) -> ObserverAction
  11. Add observation_history: VecDeque<ObservationReport> (cap 20) -> AgentTask
  12. Feed ObservationReport into Arca observer_events
  13. Add variance: f64 to AgentTrustScore initialized to 0.25 (Kalman filter setup)
  14. Replace greedy routing with UCB exploration in routing.rs
  15. Replace EWMA update with Kalman filter in AgentTrustScore::record_outcome
  16. Implement Empirical Bayes priors for new agents in trust_telemetry.rs
  17. Implement Observer::summarize(task_id) -> ObservationSummary
  18. Add observation_summary to CompletionAttestation
  19. Write unit tests: compute_action correctness
  20. Write unit tests: Kalman filter converges faster than EWMA
  21. Write unit tests: UCB exploration spreads load
  22. Expose vox_observer_status(task_id) MCP tool
  23. Run vox stub-check, cargo test -p vox-orchestrator

Wave 3 — Orient Phase & LLM Plan Adequacy (Days 14-19)

  1. Define OrientReport (evidence_gap, risk_band, planning_complexity, etc.)
  2. Implement orient_phase(ctx, policy) -> OrientReport
  3. Implement OrientPhase::request_missing_evidence(gap)
  4. Add orient_report to SocratesTaskContext
  5. Wire risk_band: Red -> block act; Black -> halt + escalate
  6. Remove word-count complexity heuristic from plan_adequacy.rs
  7. Remove keyword vagueness blacklist
  8. Add precondition assertion requirement per plan step
  9. Implement Socrates LLM-as-judge logic for plan evaluation scoring (Coverage, Dep, Destructive, Concreteness, Verification)
  10. Wire answered questions back into SocratesTaskContext
  11. Implement OrientPhase::classify_task_category(description) -> TaskCategory
  12. Write tests: orient phase evidence requests
  13. Write tests: Socrates judge blocks inadequate plans
  14. Write tests: QA router answer propagation
  15. Emit VisualizerEventKind::OrientCompleted
  16. Run vox stub-check, test suite

Wave 4 — Testing Decision Engine (Days 20-24)

  1. Implement TestDecisionPolicy::evaluate(task, orient) -> TestDecision
  2. Rule: security keywords -> Required
  3. Rule: .vox in manifest -> Required
  4. Rule: complexity >= threshold -> Required
  5. Rule: file_count > threshold -> Recommended
  6. Rule: risk_band Red -> Required
  7. Rule: docs/config only -> Skip
  8. Rule: evidence_gap > 0.4 -> Deferred
  9. Persist TestDecision to test_decisions table after every call
  10. Fix plan_has_verification_hint to check file manifests
  11. Promote heavy_without_test_hint to hard blocker
  12. Score = 0.0 when test_required_count > test_present_count
  13. Add TestDecision to TaskDescriptor
  14. PlanBridge: block dispatch if required and no test file
  15. Add test_decision_policy to config
  16. Write tests: matrix of test decision inputs
  17. Expose vox_test_decision(task_id) MCP tool
  18. Update vox plan new CLI to render test decisions per step

Wave 5 — Multi-Tier Victory Conditions (Days 25-30)

  1. Create vox-orchestrator/src/victory.rsVictoryEvaluator
  2. Implement tier1_toestub(task) -> TierResult
  3. Implement tier2_lsp(task) -> TierResult
  4. Implement tier3_cargo_check(task) -> TierResult
  5. Implement tier4_cargo_doc_test(task) -> TierResult
  6. Implement tier5_cargo_unit_test(task, filter) -> TierResult
  7. Implement tier6_vox_corpus_eval(task) -> TierResult (parse rate >= 99.5%)
  8. Implement tier7_harness_contracts
  9. Implement tier8_socrates_confidence
  10. Implement tier9_plan_adequacy_retrospective
  11. Implement evaluate(task, condition) -> VictoryVerdict
  12. Replace post-task validate with evaluator
  13. Persist to Arca victory_verdicts
  14. Wire failures to TriggerReplan
  15. Write tests for each tier result
  16. Update AgentHarnessSpec to mandate independent verification
  17. Expose vox_victory_status MCP tool

Wave 6 — Dynamic Replan Trigger (Days 31-35)

  1. Add replan_trigger to AgentTask
  2. Define ReplanTrigger struct
  3. Implement handle_replan_trigger
  4. Wire replan back to orchestrator PlanBridge
  5. Implement ReplanScheduler (cooldown limits)
  6. Add replan_history to session
  7. Emit ReplanTriggered visualizer event
  8. Implement ReplanPolicy defaults
  9. Expose vox_replan_status MCP tool
  10. Tests: Trigger creation on failures, cooldowns respected, max limits hit

Wave 7 — Scientia as Live Observer Feed (Days 36-40)

  1. Define ScientiaObservation
  2. Implement ScientiaObserver::observe_session
  3. Implement ScientiaObserver::recommend_corpus_ingestion
  4. Wire into Observer::observe_file
  5. Set EmitNegativeExample when score < 0.3
  6. Implement auto_ingest_to_mens for valid snippets
  7. Implement auto_ingest_negative for invalid snippets
  8. Wire into replan logic
  9. Add vox_scientia_observe MCP tool
  10. Add vox scientia observe --session CLI
  11. Write full integration tests linking observation to corpus ingestion

Wave 8 — MENS Corpus Surgery & AST-Eval Upgrade (Days 41-48)

  1. Tag corpus pairs with origin: Origin enum (Human, Synthetic, Agent)
  2. Ingest parse failures as hard negatives directly
  3. Implement Anna Karenina sampling (min 30% negatives per batch)
  4. Implement Experience Replay Buffer (base data mix-cd 10%)
  5. Write AI slop curator gate for Scientia validation
  6. Write validate_batch.rs
  7. Run batch validation on current synthetic data
  8. Update metadata.json with validator metrics
  9. Add vox-eval/src/ast_eval.rs using actual parser
  10. Define AstEvalReport with node count, test presence, error spans
  11. Deprecate regex-based eval methods
  12. Tie coverage score to AST evaluation
  13. Define RewardSignal { parse_score, test_score, coverage_score, composite }
  14. Modify Reward calculation: syntax must gate everything (syntax=0 -> composite=0). No AST density reward metric to prevent Goodhart hacking.
  15. Update JsonlDataLoader logic
  16. Write AST-Eval tests and Quality Report CLI tasks

Wave 9 — Constrained Inference + GRPO (Days 49-65)

  1. Create crates/vox-constrained-gen/
  2. Define ConstrainedSampler trait
  3. Implement Earley parser backend consuming EBNF grammar
  4. Implement PDA context-independent token cache (for sub-40µs latency overhead)
  5. Implement deadlock watchdog and VoxValidationError
  6. Implement Stream of Revision <REVISE> backtrack tokens
  7. Wire into vox populi serve
  8. Wire into vox_generate_code MCP tool
  9. Wire into vox_speech_to_code MCP tool
  10. Wire into PlanBridge::plan_to_descriptors
  11. Add standalone validation MCP tool
  12. Create vox-tensor/src/grpo.rs
  13. Implement Gated Reward Function (Syntax must be a multiplier)
  14. Implement Median-Centered Advantage Computation (MC-GRPO) to prevent sign flip
  15. Implement DAPO asymmetric clip bounds
  16. Implement generate_k_candidates (k=8)
  17. Hard corpus gate: Refuse GRPO launch if corpus < 1000 pairs
  18. Export vox mens train --mode grpo
  19. Write tests: Advantage sign stability, parser constraints
  20. Integration tests: 100% parse rate on constrained generation
  21. Update training SSOT tracking tables

Wave 10 — Multi-Agent Context & Handoff (Days 66-70)

  1. Define ContextEnvelope struct
  2. Implement OBO token generation
  3. Strip raw transcripts from handoff; enforce scoped task definitions only
  4. Implement CRAG retrieval gateway evaluator
  5. Implement async memory distillation worker
  6. Tests: Cross-agent privacy checks

Wave 11 — Language Syntax K-Complexity (Long Term)

  1. K-complexity audit vs Rust/Zig
  2. Implement ? operator for Result unwrapping
  3. Implement return type inference
  4. Implement _ discard pattern
  5. Define Vox IR JSON schema (vox-ir.v1.schema.json)
  6. Implement vox emit-ir and vox compile-ir
  7. Write corresponding compiler tests

Wave 12 — Testing Infrastructure

  1. test block syntax in parser
  2. Compile-time stripping of test blocks
  3. vox test CLI subcommand
  4. LSP CodeLens for test blocks
  5. Snapshot testing infrastructure via .snap
  6. @forall property-based testing and @spec wiring
  7. Parser roundtrip property tests

Wave 13 — Cost Defense & Mesh

  1. Circuit breakers: Hard per-task 300s timeout
  2. Anti-loops: max 3 attempts/day
  3. Daily kill switch & 80% spend warning
  4. Model pinning guards
  5. Cascade routing matrix
  6. Hardware amortization routing switch

Wave 14 — CI Gates & Data Ops (Tasks 206 - 270+)

  1. vox ci grammar-drift
  2. vox ci mens-corpus-health
  3. vox ci grpo-reward-baseline
  4. vox ci collateral-damage
  5. vox ci constrained-gen-smoke
  6. vox ci k-complexity-budget
  7. Integrate metrics and reporting for visualizer_sink
  8. Reassign plan_has_verification_hint dependencies ... (Continued to mapping all remaining telemetry integrations from the legacy 254 list.)

Reading Order

Follow this plan precisely, WAVE by WAVE. Execute all tests strictly per wave. Make sure we proceed down this task list.