"Vox Agentic Loop Overhaul + MENS Syntax-Intelligence Blueprint"

Vox Agentic Loop Overhaul + MENS Syntax-Intelligence Blueprint

Research completed: 2026-04-05 Two interlocked workstreams:

Agentic Loop — Observe → Orient → Plan → Act → Verify (OOPAV)

MENS Syntax Intelligence — Grammar-aware training, constrained inference, MCP pre-emit validation

Part 0 — Gap & Limitation Audit (20 Gaps)

#	Gap	Evidence location
G-01	No Observer role — nothing watches the environment between steps	`orchestrator/agent_lifecycle.rs`, `planning/mod.rs`
G-02	Completeness declared too early — `cargo check` only, no `cargo test` or Vox parse-rate gate	`validation.rs:161-183`
G-03	Testing decision hard-wired — `heavy_without_test_hint` is a soft penalty, never blocks	`plan_adequacy.rs:321`
G-04	Plan complexity is word-count heuristic — caps at 9, under-detects complex refactors	`plan_adequacy.rs:48-58`
G-05	Socrates gate is post-hoc — scoring happens after LLM commits, not before	`socrates.rs`
G-06	`HarnessGate.independent_verification` always `false`	`harness.rs:244-250`
G-07	`QARouter::answer()` discards the answer — `_answer: &str` unused	`qa.rs:55`
G-08	No autonomic replan trigger — only user-driven via `vox_replan`	`planning/replan.rs`
G-09	Scaling ignores observer load / evidence quality	`orchestrator/scaling.rs`
G-10	Scientia is a publication layer, not a live observation source	`vox-scientia-core/src/lib.rs`
G-11	MENS corpus only 340 pairs, 39 negatives	`mens/data/metadata.json`
G-12	`vox_grammar_prompt()` is a 27-line hand-written stub	`compiler/src/llm_prompt.rs`
G-13	`golden_validated.jsonl` is 60 bytes (empty)	`mens/data/golden_validated.jsonl`
G-14	No grammar-constrained decoding at inference	`inference_and_serving.md`
G-15	`vox-eval` uses regex, not the real parser	`vox_eval_crate.md`
G-16	No GRPO/RLVR training loop — SFT only	`training_orchestration.md`
G-17	MCP code emit has no pre-validation before file write	`vox-mcp/`
G-18	`vox_schola_submit` failures not converted to negative examples	MCP tool `vox_schola_submit`
G-19	`plan_has_verification_hint` ignores file manifests	`plan_adequacy.rs:259-271`
G-20	`fatigue_active` penalty never propagated to planner thresholds	`socrates.rs:271-276`

Part 1 — OOPAV Loop Architecture

+----------------------------------------------------------+
|                 OOPAV Agent Execution Loop               |
|                                                          |
|  +----------+  evidence   +-----------+  risk band       |
|  | OBSERVE  |-----------> |  ORIENT   |--------->        |
|  |(Scientia)|             | (Socrates)|                  |
|  +-----^----+             +-----+-----+                  |
|        | watch                  | plan-or-act            |
|  +-----+----+             +-----v-----+                  |
|  |  VERIFY  |<-- result --|   PLAN    |                  |
|  |(Harness) |             | (Planner) |                  |
|  +-----+----+             +-----+-----+                  |
|        | pass/fail          dispatch                     |
|  +-----v----+             +-----v-----+                  |
|  | complete |             |    ACT    |                  |
|  |  or      |             |(Builder + |                  |
|  | re-plan  |             |  MENS)    |                  |
|  +----------+             +-----------+                  |
+----------------------------------------------------------+

Testing Decision Policy

Required    -> security/auth/schema keywords in description
Required    -> .vox file in manifest
Required    -> complexity >= 7 AND file_count > 2
Required    -> orient.risk_band == Red
Recommended -> new fn/type, >20 LOC estimate
Skip        -> docs-only or config-only manifest
Deferred    -> evidence_gap > 0.4
Optional    -> everything else

9-Tier Victory Conditions

Tier	Check	When
1	TOESTUB — zero stubs	Always
2	LSP zero errors on `.vox` write files	Always
3	`cargo check --workspace`	Always
4	`cargo test --doc --workspace`	`WithDocTests` or `Full`
5	`cargo test <filter>`	`TestDecision::Required`
6	`vox corpus eval` parse_rate >= 99.5%	Any `.vox` in manifest
7	Harness contract satisfaction	Always
8	Socrates confidence >= `answer_threshold`	Always
9	Plan adequacy retrospective >= 0.75	`Full`

Part 2 — MENS Syntax Intelligence

Grammar Export Pipeline

vox-compiler/src/parser/
    |  VoxGrammarExporter
    |-> EBNF text       -> docs/grammar/vox.ebnf
    |-> GBNF file       -> llama.cpp --grammar-file
    |-> JSON Schema     -> vox populi serve (constrained JSON mode)

Corpus Verification Pipeline

synthetic.jsonl (3.2 MB, unverified)
    |  vox corpus validate-batch
    |-> synthetic_valid.jsonl   -> split=training
    |-> synthetic_invalid.jsonl -> split=negative + correction signal

golden_extracted.jsonl (16 KB)
    |  vox corpus validate-batch
    |-> golden_validated.jsonl  <- currently 60 bytes / EMPTY -> must reach >=500 pairs

GRPO/RLVR Training Loop

for each prompt in training_set:
  candidates = generate_k(prompt, k=8, temperature=0.8)
  for each candidate:
    r_syntax   = vox_parser(candidate)         -> 0/1
    r_test     = run @test blocks              -> pass_rate
    r_coverage = ast_eval(candidate).score
    reward     = 0.6*r_syntax + 0.3*r_test + 0.1*r_coverage
  advantage_i = reward_i - mean(rewards)       # GRPO group mean baseline
  grpo_update(policy, advantages)

MCP Pre-Emit Validation

vox_generate_code   -> mcp_pre_emit_validate("vox")
vox_speech_to_code  -> mcp_pre_emit_validate("vox")
PlanBridge step     -> mcp_pre_emit_validate("vox")
                             |
             parse OK?  -> write file
             parse ERR? -> VoxValidationError -> LLM retries
                        -> invalid snippet -> auto_ingest_negative(corpus)

Part 3 — Implementation Waves (254 Tasks)

Wave 0 — Foundations & Schema (Days 1-3)

Define ObservationReport struct in vox-orchestrator/src/observer.rs
Define ObserverAction enum: Continue, RequestMoreEvidence, TriggerReplan, EscalateToHuman, EmitNegativeExample
Add observer_enabled, observer_poll_interval_ms to OrchestratorConfig
Define TestDecision enum: Required, Recommended, Optional, Deferred, Skip
Define TestDecisionPolicy struct with threshold, keyword, and extension fields
Add test_decision_policy: TestDecisionPolicy to OrchestratorConfig
Define VictoryCondition enum: CompilationOnly, WithDocTests, WithUnitTests, WithCorpusValidation, Full
Add victory_condition: VictoryCondition to AgentTask
Create crates/vox-grammar-export/ with Cargo.toml and src/lib.rs
Define GrammarFormat, GrammarExportConfig, GrammarExportResult
Add Arca migration V38: observer_events table
Add Arca migration V38: test_decisions table
Add Arca migration V38: victory_verdicts table
Add Arca migration V38: mens_corpus_quality table
Add Arca migration V38: grpo_training_run table
Write Arca CRUD: insert_observer_event, list_observer_events_for_task, insert_test_decision, insert_victory_verdict, upsert_corpus_quality, insert_grpo_step
Add all five tables to Codex facade
Write unit tests for all CRUD methods (min 2 tests each)
Run vox ci clavis-parity and vox stub-check --path crates/vox-grammar-export
Confirm zero stubs in Wave 0 deliverables

Wave 1 — Grammar Export from Compiler (Days 4-7)

Audit crates/vox-compiler/src/parser/ — catalog all production rules; write docs/src/architecture/vox-grammar-production-rules.md
Create vox-grammar-export/src/ebnf.rs — EBNF emitter
Implement EbnfEmitter::emit_rule(name, alternates, terminals)
Implement EbnfEmitter::emit_all() — covers all top-level Vox rules
Create vox-grammar-export/src/gbnf.rs — GBNF emitter for llama.cpp
Implement GbnfEmitter::from_ebnf(ebnf) -> GbnfDocument
Handle all Vox keywords in GBNF output
Implement GbnfEmitter::emit_string() -> String
Create vox-grammar-export/src/json_schema.rs — AST JSON Schema emitter
Define VoxAstNode JSON schema recursively
Expose vox grammar export --format ebnf|gbnf|json-schema --output <file> CLI
Expose vox_grammar_export(format) MCP tool
Write vox-grammar-export/src/versioning.rs — semver embedding + drift check
Replace vox_grammar_prompt() stub with derived cheatsheet from real grammar
Write tests: emitted EBNF structural validity
Write tests: 10 known-valid programs accepted by the GBNF
Write tests: 5 known-invalid programs rejected by the GBNF
Add vox ci grammar-export-check CI step
Add grammar_export_path to MensTrainingConfig
Run vox stub-check --path crates/vox-grammar-export; full test suite

Wave 2 — Observer Sub-Agent (Days 8-12)

Create vox-orchestrator/src/observer.rs — Observer struct
Implement Observer::observe_file(path) -> ObservationReport
Implement Observer::observe_rust_file(path) -> ObservationReport
Implement Observer::start_watching(file_paths) -> JoinHandle
Implement Observer::drain_reports() -> Vec<ObservationReport>
Add observer: Option<Arc<Observer>> to Orchestrator
Wire Observer startup into Orchestrator::spawn_agent
Wire Observer shutdown into Orchestrator::retire_agent
Emit VisualizerEventKind::ObservationRecorded from viz_sink
Implement Observer::compute_action(report, policy) -> ObserverAction
Add observation_history: VecDeque<ObservationReport> (cap 20) -> AgentTask
Feed ObservationReport into Arca observer_events
Implement Observer::summarize(task_id) -> ObservationSummary
Add observation_summary: Option<ObservationSummary> to CompletionAttestation
Write unit tests: compute_action correctness
Write integration test: Observer on known-bad .vox → errors within 2 polls
Write integration test: Observer on .rs with todo!() → EmitNegativeExample
Write tests: summarize computes parse_rate trend from 3 sequential reports
Expose vox_observer_status(task_id) MCP tool
Run vox stub-check, cargo test -p vox-orchestrator

Wave 3 — Orient Phase & Enhanced Socrates (Days 13-17)

Define OrientReport { evidence_gap, missing_namespaces, recommended_retrieval, risk_band, planning_complexity_multiplier }
Implement orient_phase(ctx, policy) -> OrientReport
Add evidence_gap_threshold to ConfidencePolicy
Implement OrientPhase::request_missing_evidence(gap) -> Vec<SearchResult>
Add orient_report: Option<OrientReport> to SocratesTaskContext
Integrate orient_phase() into runtime.rs before each LLM inference request
Wire risk_band: Red -> block act; Black -> halt + escalate
Wire planning_complexity_multiplier into PlannerConfig
Implement OrientPhase::propagate_fatigue(fatigue_active, config)
Implement OrientPhase::auto_dispatch_socratic_question(gap) -> CorrelationId
Fix QARouter::answer() — store answer; add get_answer(corr_id) -> Option<String>
Wire answered questions back into SocratesTaskContext
Implement OrientPhase::classify_task_category(description) -> TaskCategory
Write tests: orient_phase with zero evidence -> RequestMoreEvidence
Write tests: propagate_fatigue(true) raises thresholds by >= 2
Write tests: classify_task_category returns Security for auth keywords
Write tests: auto_dispatch_socratic_question creates QARouter entry
Write tests: get_answer() returns stored string
Emit VisualizerEventKind::OrientCompleted { risk_band, evidence_gap }
Run vox stub-check, cargo test -p vox-orchestrator

Wave 4 — Testing Decision Engine (Days 18-22)

Implement TestDecisionPolicy::evaluate(task, orient) -> TestDecision
Rule: security keywords -> Required
Rule: .vox in manifest -> Required
Rule: complexity >= threshold -> Required
Rule: file_count > threshold -> Recommended
Rule: risk_band Red -> Required
Rule: docs/config only -> Skip
Rule: evidence_gap > 0.4 -> Deferred
Rule: default -> Optional
Persist TestDecision to test_decisions table after every call
Fix plan_has_verification_hint to check file manifests
Promote heavy_without_test_hint to hard blocker test_required_missing
Add test_required_count, test_present_count to PlanAdequacySummary
Score = 0.0 when test_required_count > test_present_count for coding goals
Add TestDecision to TaskDescriptor
PlanBridge: block dispatch if Required and no test file in manifest
Add test_decision_policy to OrchestratorConfig with sane defaults
Write tests: auth migration -> Required
Write tests: markdown-only manifest -> Skip
Write tests: complexity-8 .vox with no test step -> is_too_thin=true, test_required_missing
Write tests: test file in manifest -> plan_has_verification_hint=true
Write tests: PlanBridge blocks Required task with no test file
Expose vox_test_decision(task_id) MCP tool
Update vox plan new CLI to render test decisions per step
Run vox stub-check, full test suite

Wave 5 — Multi-Tier Victory Conditions (Days 23-28)

Create vox-orchestrator/src/victory.rs — VictoryEvaluator
Implement tier1_toestub(task) -> TierResult
Implement tier2_lsp(task) -> TierResult
Implement tier3_cargo_check(task) -> TierResult
Implement tier4_cargo_doc_test(task) -> TierResult (120s timeout)
Implement tier5_cargo_unit_test(task, filter) -> TierResult
Implement tier6_vox_corpus_eval(task) -> TierResult (parse_rate >= 99.5%)
Implement tier7_harness_contracts(task, harness) -> TierResult
Implement tier8_socrates_confidence(task, ctx, policy) -> TierResult
Implement tier9_plan_adequacy_retrospective(task) -> TierResult
Implement VictoryEvaluator::evaluate(task, condition) -> VictoryVerdict
Define VictoryVerdict { passed, tiers_run, first_failure, report }
Replace post_task_validate with VictoryEvaluator::evaluate
Persist every VictoryVerdict to Arca victory_verdicts
Wire passed=false -> TriggerReplan via Observer
Add max_victory_attempts: u32 to AgentTask (default 3)
Emit VisualizerEventKind::VictoryEvaluated
Update AgentHarnessSpec::minimal_contract_first — independent_verification: true for code tasks
Write tests: tier3 fails on bad Rust
Write tests: tier6 fails on invalid Vox
Write tests: Full passes for clean files + high confidence
Write tests: stub code -> first_failure = TierResult::Toestub
Write tests: max_victory_attempts guard
Expose vox_victory_status(task_id) MCP tool
Run vox stub-check, full test suite

Wave 6 — Dynamic Replan Trigger (Days 29-33)

Add replan_trigger: Option<ReplanTrigger> to AgentTask
Define ReplanTrigger { reason, failed_tier, observer_action, evidence_gaps }
Implement runtime.rs::handle_replan_trigger(task, trigger)
Wire replan result back into orchestrator via PlanBridge
Add replan_count: u32 to AgentTask; fail permanently after max
Implement ReplanScheduler — max 1 replan per 30s per session
Implement ReplanScheduler::should_replan(task) -> bool
Add replan_history: Vec<ReplanRecord> to PlanSession
Define ReplanRecord { version, trigger_reason, previous_score, new_score, created_at }
Emit VisualizerEventKind::ReplanTriggered
Implement ReplanPolicy in planning/policy.rs
Add replan_policy: ReplanPolicy to OrchestratorConfig
Expose vox_replan_status(session_id) MCP tool
Write tests: failed tier3 -> ReplanTrigger created -> replan called
Write tests: ReplanScheduler returns false within cooldown
Write tests: permanent failure after max replans
Write tests: replan_history persisted and retrievable
Write tests: MCP returns correct count and reason
Update vox plan replan CLI
Run full test suite, vox stub-check

Wave 7 — Scientia as Live Observer Feed (Days 34-38)

Audit vox-scientia-* crates; write docs/src/architecture/scientia-surface-audit.md
Define ScientiaObservation { session_id, source_path, worthiness_score, construct_coverage, citation_count, recommended_for_corpus, reason }
Implement ScientiaObserver::observe_session(session_id) -> ScientiaObservation
Implement ScientiaObserver::recommend_corpus_ingestion(obs) -> bool
Wire into Observer::observe_file for .vox files
Set EmitNegativeExample when worthiness_score < 0.3
Implement ScientiaObserver::auto_ingest_to_mens(obs, codex) -> split=training row
Implement ScientiaObserver::auto_ingest_negative(path, error, codex) -> split=negative row
Wire into handle_replan_trigger — replans >= max/2 emit negatives
Add scientia_observation: Option<ScientiaObservation> to ObservationReport
Expose vox_scientia_observe(session_id) MCP tool
Add vox scientia observe --session <id> CLI subcommand
Write tests: recommend_corpus_ingestion true for valid snippet with 3 constructs
Write tests: auto_ingest_to_mens inserts training row
Write tests: auto_ingest_negative inserts negative row
Write tests: full pipeline — Observer -> Scientia -> corpus row
Emit VisualizerEventKind::ScientiaObserved
Expose in VS Code extension telemetry push
Update governance.md
Run full test suite, vox stub-check

Wave 8 — MENS Corpus Surgery & AST-Eval Upgrade (Days 39-46)

Write vox-corpus/src/validate_batch.rs — batch parse validation
Run validate-batch on synthetic.jsonl -> synthetic_valid.jsonl + synthetic_invalid.jsonl
Run validate-batch on golden_extracted.jsonl -> populate golden_validated.jsonl
Update mens/data/metadata.json with parse_rate, last_validated_at, validator_version
Implement vox-eval/src/ast_eval.rs — ast_eval(code) -> AstEvalReport using real parser
Define AstEvalReport { parse_success, node_count, max_depth, construct_histogram, type_annotation_rate, has_tests, error_span }
Implement AstEvalReport::coverage_score() — weighted composite
Update vox-eval/src/lib.rs — re-export ast_eval; #[deprecated] on detect_constructs
Update construct_coverage_score(code) to delegate to AST eval
Update vox eval --mode ast CI integration
Upgrade vox corpus eval to AST engine
Define RewardSignal { parse_score, test_score, coverage_score, composite } in vox-tensor/src/data.rs
Implement reward_signal_for_pair(pair) -> RewardSignal
Add reward_signal: Option<RewardSignal> to TrainingPair
Update JsonlDataLoader to compute RewardSignal during loading
Add avg_reward_signal per split to metadata.json
Add vox corpus quality-report CLI command
Add mens/schemas/corpus_quality_record.schema.json
MILESTONE GATE: golden_validated.jsonl >= 500 pairs required before Wave 9
Write tests: ast_eval on valid Vox function -> parse_success=true
Write tests: ast_eval on invalid snippet -> parse_success=false, non-None error_span
Write tests: reward_signal_for_pair -> composite >= 0.8 for well-formed pair with tests
Write tests: validate_batch correctly separates mixed JSONL
Run vox stub-check --path crates/vox-eval, cargo test -p vox-eval

Wave 9 — Constrained Inference + GRPO Loop + MCP Pre-Emit (Days 47-60)

Create crates/vox-constrained-gen/ — grammar-constrained token sampling
Implement ConstrainedSampler::from_gbnf(gbnf_text) -> ConstrainedSampler (FSA from Wave 1 GBNF)
Implement ConstrainedSampler::mask_logits(logits, state) -> FsaState
Integrate into vox populi serve via ?grammar=vox or X-Vox-Grammar: true
Add constrained_generation: bool to MensServeConfig
Implement fallback: grammar deadlock -> VoxValidationError, request retry
Create vox-constrained-gen/src/llguidance_bridge.rs (optional feature-gated)
Define VoxValidationError { code, span, message, suggested_correction } in vox-compiler/src/error.rs
Implement mcp_pre_emit_validate(code, format) -> Result<(), VoxValidationError> in vox-mcp/src/code_validator.rs
Wire into vox_generate_code MCP tool
Wire into vox_speech_to_code MCP tool
Wire into PlanBridge::plan_to_descriptors for .vox steps
Implement Rust pre-emit: rustc --parse-only subprocess on temp file
Add vox_validate_code(code, language) -> { valid, errors } standalone MCP tool
Implement MensGrpoTrainer::train_grpo(config, data) -> GrpoTrainingResult in vox-tensor/src/grpo.rs
Define GrpoConfig { k_samples, temperature, reward_weights, policy_lr, clip_epsilon, max_steps }
Define RewardWeights { parse_weight, test_weight, coverage_weight } defaults (0.6, 0.3, 0.1)
Implement generate_k_candidates(prompt, model, k) -> Vec<String>
Implement score_candidate(candidate) -> RewardSignal
Implement compute_advantages(rewards) -> Vec<f32> (group mean baseline)
Implement policy_gradient_update(model, candidates, advantages) (PPO-clip style)
Expose vox mens train --mode grpo CLI flag
Expose --k 8 --reward parse:0.6,test:0.3,coverage:0.1 arguments
Add GRPO telemetry: group_rewards, mean_reward, policy_loss, clip_fraction per step
Persist to Arca grpo_training_run table
Define GrpoTrainingResult { steps_completed, final_mean_reward, parse_rate, checkpoint_path }
Fix G-18: vox_schola_submit failures -> auto_ingest_negative
Add vox mens eval --mode grpo-reward (dry-run)
Add mens/config/grpo_default.toml (k=8, temp=0.8, max_steps=500)
Write tests: compute_advantages correctness
Write tests: constrained sampler produces only grammar-accepted tokens
Write tests: mcp_pre_emit_validate -> error for missing closing }
Write tests: mcp_pre_emit_validate -> Ok(()) for valid function
Write tests: vox_validate_code -> errors for invalid Rust
Write tests: GRPO loop completes 10 steps without panic on RTX 4080 SUPER
Write tests: train --mode grpo -> checkpoint with final_mean_reward > 0.5
Integration test: constrained generation -> 100% parse rate on 50 generations
Integration test: invalid snippet via MCP -> VoxValidationError, no file written
Integration test: GRPO model vs SFT baseline -> >= 5pp parse rate improvement
Run vox stub-check --path crates/vox-constrained-gen crates/vox-mcp, cargo test --workspace
Update docs/src/architecture/mens-training-ssot.md
Update examples/STYLE.md
Add vox ci grammar-constrained-gen-smoke-test
Add vox ci mens-corpus-health
Add vox ci grpo-reward-baseline
Persist all CI results to Arca for trend analysis

Part 4 — Observability & Telemetry (241-245)

Add ObservationReport to VS Code extension push-telemetry stream
Color-code agent viz nodes by OrientReport.risk_band
Add VictoryVerdict tier summary panel to workflow visualizer
Add TestDecision badge to each task card
Add RewardSignal.composite sparkline to MENS training progress panel

Part 5 — Documentation (246-254)

Write docs/src/architecture/oopav-loop.md
Write docs/src/architecture/observer-design.md
Write docs/src/architecture/victory-conditions.md
Write docs/src/architecture/test-decision-policy.md
Write docs/src/architecture/mens-grammar-intelligence.md
Update docs/src/architecture/mens-training-ssot.md
Update docs/src/contributors/contributor-hub.md
Update AGENTS.md
Update docs/agents/governance.md

Milestone Gates

After Wave	Gate
0	All V38 Arca migrations applied; `vox stub-check` clean across all new crates
1	`vox grammar export --format gbnf` accepted by `llama.cpp --grammar-file`
2	Observer: live LSP error detection on modified `.vox` file integration test passes
3	Orient phase blocks `Red` band task from acting without evidence hydration
4	Complexity-8 `.vox` task with no test step rejected by `PlanBridge`
5	Full `VictoryCondition::Full` pass on a clean newly-generated Vox crate
6	Autonomic replan triggered and completed on a simulated tier-3 failure
7	`mens_corpus_quality` has >= 500 `split=training` rows from Scientia auto-ingestion
8	`golden_validated.jsonl` >= 500 pairs; AST eval parse_rate >= 99.5%
9	100 consecutive constrained-inference generations parse_rate = 100%; GRPO dry-run `mean_reward > 0.4`

GBNF over Outlines/llguidance first: GBNF integrates natively with llama.cpp (already powering the local Populi server). llguidance added as optional bridge for dynamic grammars. Minimizes new dependencies.

AST eval over regex: Parse rate is binary. AstEvalReport provides a gradient signal — construct density, type annotation rate, test presence — enabling richer GRPO reward shaping.

GRPO over PPO: Eliminates the value network (critic), reducing memory ~40%. Critical under the 16 GB VRAM constraint on RTX 4080 SUPER. Group-relative baselines suit code generation's high candidate variance.

Observer separate from Verifier: Verifier is synchronous and post-hoc. Observer is asynchronous and continuous — allows Act to proceed without blocking while still delivering mid-flight course-corrections via TriggerReplan.

MCP pre-emit failures as negative examples: Each failure is high-signal teaching data. Invalid LLM-generated code becomes a structured negative pair (error = correction signal), closing the training loop organically without human annotation.

Vox: The AI-Native Programming Language