Vox Agentic Loop Overhaul + MENS Syntax-Intelligence Blueprint
Research completed: 2026-04-05 Two interlocked workstreams:
- Agentic Loop — Observe → Orient → Plan → Act → Verify (OOPAV)
- MENS Syntax Intelligence — Grammar-aware training, constrained inference, MCP pre-emit validation
Part 0 — Gap & Limitation Audit (20 Gaps)
| # | Gap | Evidence location |
|---|---|---|
| G-01 | No Observer role — nothing watches the environment between steps | orchestrator/agent_lifecycle.rs, planning/mod.rs |
| G-02 | Completeness declared too early — cargo check only, no cargo test or Vox parse-rate gate | validation.rs:161-183 |
| G-03 | Testing decision hard-wired — heavy_without_test_hint is a soft penalty, never blocks | plan_adequacy.rs:321 |
| G-04 | Plan complexity is word-count heuristic — caps at 9, under-detects complex refactors | plan_adequacy.rs:48-58 |
| G-05 | Socrates gate is post-hoc — scoring happens after LLM commits, not before | socrates.rs |
| G-06 | HarnessGate.independent_verification always false | harness.rs:244-250 |
| G-07 | QARouter::answer() discards the answer — _answer: &str unused | qa.rs:55 |
| G-08 | No autonomic replan trigger — only user-driven via vox_replan | planning/replan.rs |
| G-09 | Scaling ignores observer load / evidence quality | orchestrator/scaling.rs |
| G-10 | Scientia is a publication layer, not a live observation source | vox-scientia-core/src/lib.rs |
| G-11 | MENS corpus only 340 pairs, 39 negatives | mens/data/metadata.json |
| G-12 | vox_grammar_prompt() is a 27-line hand-written stub | compiler/src/llm_prompt.rs |
| G-13 | golden_validated.jsonl is 60 bytes (empty) | mens/data/golden_validated.jsonl |
| G-14 | No grammar-constrained decoding at inference | inference_and_serving.md |
| G-15 | vox-eval uses regex, not the real parser | vox_eval_crate.md |
| G-16 | No GRPO/RLVR training loop — SFT only | training_orchestration.md |
| G-17 | MCP code emit has no pre-validation before file write | vox-mcp/ |
| G-18 | vox_schola_submit failures not converted to negative examples | MCP tool vox_schola_submit |
| G-19 | plan_has_verification_hint ignores file manifests | plan_adequacy.rs:259-271 |
| G-20 | fatigue_active penalty never propagated to planner thresholds | socrates.rs:271-276 |
Part 1 — OOPAV Loop Architecture
+----------------------------------------------------------+
| OOPAV Agent Execution Loop |
| |
| +----------+ evidence +-----------+ risk band |
| | OBSERVE |-----------> | ORIENT |---------> |
| |(Scientia)| | (Socrates)| |
| +-----^----+ +-----+-----+ |
| | watch | plan-or-act |
| +-----+----+ +-----v-----+ |
| | VERIFY |<-- result --| PLAN | |
| |(Harness) | | (Planner) | |
| +-----+----+ +-----+-----+ |
| | pass/fail dispatch |
| +-----v----+ +-----v-----+ |
| | complete | | ACT | |
| | or | |(Builder + | |
| | re-plan | | MENS) | |
| +----------+ +-----------+ |
+----------------------------------------------------------+
Testing Decision Policy
Required -> security/auth/schema keywords in description
Required -> .vox file in manifest
Required -> complexity >= 7 AND file_count > 2
Required -> orient.risk_band == Red
Recommended -> new fn/type, >20 LOC estimate
Skip -> docs-only or config-only manifest
Deferred -> evidence_gap > 0.4
Optional -> everything else
9-Tier Victory Conditions
| Tier | Check | When |
|---|---|---|
| 1 | TOESTUB — zero stubs | Always |
| 2 | LSP zero errors on .vox write files | Always |
| 3 | cargo check --workspace | Always |
| 4 | cargo test --doc --workspace | WithDocTests or Full |
| 5 | cargo test <filter> | TestDecision::Required |
| 6 | vox corpus eval parse_rate >= 99.5% | Any .vox in manifest |
| 7 | Harness contract satisfaction | Always |
| 8 | Socrates confidence >= answer_threshold | Always |
| 9 | Plan adequacy retrospective >= 0.75 | Full |
Part 2 — MENS Syntax Intelligence
Grammar Export Pipeline
vox-compiler/src/parser/
| VoxGrammarExporter
|-> EBNF text -> docs/grammar/vox.ebnf
|-> GBNF file -> llama.cpp --grammar-file
|-> JSON Schema -> vox populi serve (constrained JSON mode)
Corpus Verification Pipeline
synthetic.jsonl (3.2 MB, unverified)
| vox corpus validate-batch
|-> synthetic_valid.jsonl -> split=training
|-> synthetic_invalid.jsonl -> split=negative + correction signal
golden_extracted.jsonl (16 KB)
| vox corpus validate-batch
|-> golden_validated.jsonl <- currently 60 bytes / EMPTY -> must reach >=500 pairs
GRPO/RLVR Training Loop
for each prompt in training_set:
candidates = generate_k(prompt, k=8, temperature=0.8)
for each candidate:
r_syntax = vox_parser(candidate) -> 0/1
r_test = run @test blocks -> pass_rate
r_coverage = ast_eval(candidate).score
reward = 0.6*r_syntax + 0.3*r_test + 0.1*r_coverage
advantage_i = reward_i - mean(rewards) # GRPO group mean baseline
grpo_update(policy, advantages)
MCP Pre-Emit Validation
vox_generate_code -> mcp_pre_emit_validate("vox")
vox_speech_to_code -> mcp_pre_emit_validate("vox")
PlanBridge step -> mcp_pre_emit_validate("vox")
|
parse OK? -> write file
parse ERR? -> VoxValidationError -> LLM retries
-> invalid snippet -> auto_ingest_negative(corpus)
Part 3 — Implementation Waves (254 Tasks)
Wave 0 — Foundations & Schema (Days 1-3)
- Define
ObservationReportstruct invox-orchestrator/src/observer.rs - Define
ObserverActionenum:Continue,RequestMoreEvidence,TriggerReplan,EscalateToHuman,EmitNegativeExample - Add
observer_enabled,observer_poll_interval_mstoOrchestratorConfig - Define
TestDecisionenum:Required,Recommended,Optional,Deferred,Skip - Define
TestDecisionPolicystruct with threshold, keyword, and extension fields - Add
test_decision_policy: TestDecisionPolicytoOrchestratorConfig - Define
VictoryConditionenum:CompilationOnly,WithDocTests,WithUnitTests,WithCorpusValidation,Full - Add
victory_condition: VictoryConditiontoAgentTask - Create
crates/vox-grammar-export/withCargo.tomlandsrc/lib.rs - Define
GrammarFormat,GrammarExportConfig,GrammarExportResult - Add Arca migration V38:
observer_eventstable - Add Arca migration V38:
test_decisionstable - Add Arca migration V38:
victory_verdictstable - Add Arca migration V38:
mens_corpus_qualitytable - Add Arca migration V38:
grpo_training_runtable - Write Arca CRUD:
insert_observer_event,list_observer_events_for_task,insert_test_decision,insert_victory_verdict,upsert_corpus_quality,insert_grpo_step - Add all five tables to
Codexfacade - Write unit tests for all CRUD methods (min 2 tests each)
- Run
vox ci clavis-parityandvox stub-check --path crates/vox-grammar-export - Confirm zero stubs in Wave 0 deliverables
Wave 1 — Grammar Export from Compiler (Days 4-7)
- Audit
crates/vox-compiler/src/parser/— catalog all production rules; writedocs/src/architecture/vox-grammar-production-rules.md - Create
vox-grammar-export/src/ebnf.rs— EBNF emitter - Implement
EbnfEmitter::emit_rule(name, alternates, terminals) - Implement
EbnfEmitter::emit_all()— covers all top-level Vox rules - Create
vox-grammar-export/src/gbnf.rs— GBNF emitter forllama.cpp - Implement
GbnfEmitter::from_ebnf(ebnf) -> GbnfDocument - Handle all Vox keywords in GBNF output
- Implement
GbnfEmitter::emit_string() -> String - Create
vox-grammar-export/src/json_schema.rs— AST JSON Schema emitter - Define
VoxAstNodeJSON schema recursively - Expose
vox grammar export --format ebnf|gbnf|json-schema --output <file>CLI - Expose
vox_grammar_export(format)MCP tool - Write
vox-grammar-export/src/versioning.rs— semver embedding + drift check - Replace
vox_grammar_prompt()stub with derived cheatsheet from real grammar - Write tests: emitted EBNF structural validity
- Write tests: 10 known-valid programs accepted by the GBNF
- Write tests: 5 known-invalid programs rejected by the GBNF
- Add
vox ci grammar-export-checkCI step - Add
grammar_export_pathtoMensTrainingConfig - Run
vox stub-check --path crates/vox-grammar-export; full test suite
Wave 2 — Observer Sub-Agent (Days 8-12)
- Create
vox-orchestrator/src/observer.rs—Observerstruct - Implement
Observer::observe_file(path) -> ObservationReport - Implement
Observer::observe_rust_file(path) -> ObservationReport - Implement
Observer::start_watching(file_paths) -> JoinHandle - Implement
Observer::drain_reports() -> Vec<ObservationReport> - Add
observer: Option<Arc<Observer>>toOrchestrator - Wire Observer startup into
Orchestrator::spawn_agent - Wire Observer shutdown into
Orchestrator::retire_agent - Emit
VisualizerEventKind::ObservationRecordedfromviz_sink - Implement
Observer::compute_action(report, policy) -> ObserverAction - Add
observation_history: VecDeque<ObservationReport>(cap 20) ->AgentTask - Feed
ObservationReportinto Arcaobserver_events - Implement
Observer::summarize(task_id) -> ObservationSummary - Add
observation_summary: Option<ObservationSummary>toCompletionAttestation - Write unit tests: compute_action correctness
- Write integration test: Observer on known-bad
.vox→ errors within 2 polls - Write integration test: Observer on
.rswithtodo!()→EmitNegativeExample - Write tests:
summarizecomputes parse_rate trend from 3 sequential reports - Expose
vox_observer_status(task_id)MCP tool - Run
vox stub-check,cargo test -p vox-orchestrator
Wave 3 — Orient Phase & Enhanced Socrates (Days 13-17)
- Define
OrientReport { evidence_gap, missing_namespaces, recommended_retrieval, risk_band, planning_complexity_multiplier } - Implement
orient_phase(ctx, policy) -> OrientReport - Add
evidence_gap_thresholdtoConfidencePolicy - Implement
OrientPhase::request_missing_evidence(gap) -> Vec<SearchResult> - Add
orient_report: Option<OrientReport>toSocratesTaskContext - Integrate
orient_phase()intoruntime.rsbefore each LLM inference request - Wire
risk_band:Red-> block act;Black-> halt + escalate - Wire
planning_complexity_multiplierintoPlannerConfig - Implement
OrientPhase::propagate_fatigue(fatigue_active, config) - Implement
OrientPhase::auto_dispatch_socratic_question(gap) -> CorrelationId - Fix
QARouter::answer()— store answer; addget_answer(corr_id) -> Option<String> - Wire answered questions back into
SocratesTaskContext - Implement
OrientPhase::classify_task_category(description) -> TaskCategory - Write tests:
orient_phasewith zero evidence ->RequestMoreEvidence - Write tests:
propagate_fatigue(true)raises thresholds by >= 2 - Write tests:
classify_task_categoryreturnsSecurityfor auth keywords - Write tests:
auto_dispatch_socratic_questioncreates QARouter entry - Write tests:
get_answer()returns stored string - Emit
VisualizerEventKind::OrientCompleted { risk_band, evidence_gap } - Run
vox stub-check,cargo test -p vox-orchestrator
Wave 4 — Testing Decision Engine (Days 18-22)
- Implement
TestDecisionPolicy::evaluate(task, orient) -> TestDecision - Rule: security keywords ->
Required - Rule:
.voxin manifest ->Required - Rule: complexity >= threshold ->
Required - Rule: file_count > threshold ->
Recommended - Rule: risk_band Red ->
Required - Rule: docs/config only ->
Skip - Rule: evidence_gap > 0.4 ->
Deferred - Rule: default ->
Optional - Persist
TestDecisiontotest_decisionstable after every call - Fix
plan_has_verification_hintto check file manifests - Promote
heavy_without_test_hintto hard blockertest_required_missing - Add
test_required_count,test_present_counttoPlanAdequacySummary - Score = 0.0 when
test_required_count > test_present_countfor coding goals - Add
TestDecisiontoTaskDescriptor PlanBridge: block dispatch ifRequiredand no test file in manifest- Add
test_decision_policytoOrchestratorConfigwith sane defaults - Write tests: auth migration ->
Required - Write tests: markdown-only manifest ->
Skip - Write tests: complexity-8
.voxwith no test step ->is_too_thin=true,test_required_missing - Write tests: test file in manifest ->
plan_has_verification_hint=true - Write tests:
PlanBridgeblocksRequiredtask with no test file - Expose
vox_test_decision(task_id)MCP tool - Update
vox plan newCLI to render test decisions per step - Run
vox stub-check, full test suite
Wave 5 — Multi-Tier Victory Conditions (Days 23-28)
- Create
vox-orchestrator/src/victory.rs—VictoryEvaluator - Implement
tier1_toestub(task) -> TierResult - Implement
tier2_lsp(task) -> TierResult - Implement
tier3_cargo_check(task) -> TierResult - Implement
tier4_cargo_doc_test(task) -> TierResult(120s timeout) - Implement
tier5_cargo_unit_test(task, filter) -> TierResult - Implement
tier6_vox_corpus_eval(task) -> TierResult(parse_rate >= 99.5%) - Implement
tier7_harness_contracts(task, harness) -> TierResult - Implement
tier8_socrates_confidence(task, ctx, policy) -> TierResult - Implement
tier9_plan_adequacy_retrospective(task) -> TierResult - Implement
VictoryEvaluator::evaluate(task, condition) -> VictoryVerdict - Define
VictoryVerdict { passed, tiers_run, first_failure, report } - Replace
post_task_validatewithVictoryEvaluator::evaluate - Persist every
VictoryVerdictto Arcavictory_verdicts - Wire
passed=false->TriggerReplanvia Observer - Add
max_victory_attempts: u32toAgentTask(default 3) - Emit
VisualizerEventKind::VictoryEvaluated - Update
AgentHarnessSpec::minimal_contract_first—independent_verification: truefor code tasks - Write tests:
tier3fails on bad Rust - Write tests:
tier6fails on invalid Vox - Write tests:
Fullpasses for clean files + high confidence - Write tests: stub code ->
first_failure = TierResult::Toestub - Write tests:
max_victory_attemptsguard - Expose
vox_victory_status(task_id)MCP tool - Run
vox stub-check, full test suite
Wave 6 — Dynamic Replan Trigger (Days 29-33)
- Add
replan_trigger: Option<ReplanTrigger>toAgentTask - Define
ReplanTrigger { reason, failed_tier, observer_action, evidence_gaps } - Implement
runtime.rs::handle_replan_trigger(task, trigger) - Wire replan result back into orchestrator via
PlanBridge - Add
replan_count: u32toAgentTask; fail permanently after max - Implement
ReplanScheduler— max 1 replan per 30s per session - Implement
ReplanScheduler::should_replan(task) -> bool - Add
replan_history: Vec<ReplanRecord>toPlanSession - Define
ReplanRecord { version, trigger_reason, previous_score, new_score, created_at } - Emit
VisualizerEventKind::ReplanTriggered - Implement
ReplanPolicyinplanning/policy.rs - Add
replan_policy: ReplanPolicytoOrchestratorConfig - Expose
vox_replan_status(session_id)MCP tool - Write tests: failed tier3 -> ReplanTrigger created -> replan called
- Write tests: ReplanScheduler returns false within cooldown
- Write tests: permanent failure after max replans
- Write tests: replan_history persisted and retrievable
- Write tests: MCP returns correct count and reason
- Update
vox plan replanCLI - Run full test suite,
vox stub-check
Wave 7 — Scientia as Live Observer Feed (Days 34-38)
- Audit
vox-scientia-*crates; writedocs/src/architecture/scientia-surface-audit.md - Define
ScientiaObservation { session_id, source_path, worthiness_score, construct_coverage, citation_count, recommended_for_corpus, reason } - Implement
ScientiaObserver::observe_session(session_id) -> ScientiaObservation - Implement
ScientiaObserver::recommend_corpus_ingestion(obs) -> bool - Wire into
Observer::observe_filefor.voxfiles - Set
EmitNegativeExamplewhenworthiness_score < 0.3 - Implement
ScientiaObserver::auto_ingest_to_mens(obs, codex)->split=trainingrow - Implement
ScientiaObserver::auto_ingest_negative(path, error, codex)->split=negativerow - Wire into
handle_replan_trigger— replans >= max/2 emit negatives - Add
scientia_observation: Option<ScientiaObservation>toObservationReport - Expose
vox_scientia_observe(session_id)MCP tool - Add
vox scientia observe --session <id>CLI subcommand - Write tests:
recommend_corpus_ingestiontrue for valid snippet with 3 constructs - Write tests:
auto_ingest_to_mensinserts training row - Write tests:
auto_ingest_negativeinserts negative row - Write tests: full pipeline — Observer -> Scientia -> corpus row
- Emit
VisualizerEventKind::ScientiaObserved - Expose in VS Code extension telemetry push
- Update
governance.md - Run full test suite,
vox stub-check
Wave 8 — MENS Corpus Surgery & AST-Eval Upgrade (Days 39-46)
- Write
vox-corpus/src/validate_batch.rs— batch parse validation - Run validate-batch on
synthetic.jsonl->synthetic_valid.jsonl+synthetic_invalid.jsonl - Run validate-batch on
golden_extracted.jsonl-> populategolden_validated.jsonl - Update
mens/data/metadata.jsonwithparse_rate,last_validated_at,validator_version - Implement
vox-eval/src/ast_eval.rs—ast_eval(code) -> AstEvalReportusing real parser - Define
AstEvalReport { parse_success, node_count, max_depth, construct_histogram, type_annotation_rate, has_tests, error_span } - Implement
AstEvalReport::coverage_score()— weighted composite - Update
vox-eval/src/lib.rs— re-exportast_eval;#[deprecated]ondetect_constructs - Update
construct_coverage_score(code)to delegate to AST eval - Update
vox eval --mode astCI integration - Upgrade
vox corpus evalto AST engine - Define
RewardSignal { parse_score, test_score, coverage_score, composite }invox-tensor/src/data.rs - Implement
reward_signal_for_pair(pair) -> RewardSignal - Add
reward_signal: Option<RewardSignal>toTrainingPair - Update
JsonlDataLoaderto computeRewardSignalduring loading - Add
avg_reward_signalper split tometadata.json - Add
vox corpus quality-reportCLI command - Add
mens/schemas/corpus_quality_record.schema.json - MILESTONE GATE:
golden_validated.jsonl>= 500 pairs required before Wave 9 - Write tests:
ast_evalon valid Vox function ->parse_success=true - Write tests:
ast_evalon invalid snippet ->parse_success=false, non-Noneerror_span - Write tests:
reward_signal_for_pair->composite >= 0.8for well-formed pair with tests - Write tests:
validate_batchcorrectly separates mixed JSONL - Run
vox stub-check --path crates/vox-eval,cargo test -p vox-eval
Wave 9 — Constrained Inference + GRPO Loop + MCP Pre-Emit (Days 47-60)
- Create
crates/vox-constrained-gen/— grammar-constrained token sampling - Implement
ConstrainedSampler::from_gbnf(gbnf_text) -> ConstrainedSampler(FSA from Wave 1 GBNF) - Implement
ConstrainedSampler::mask_logits(logits, state) -> FsaState - Integrate into
vox populi servevia?grammar=voxorX-Vox-Grammar: true - Add
constrained_generation: booltoMensServeConfig - Implement fallback: grammar deadlock ->
VoxValidationError, request retry - Create
vox-constrained-gen/src/llguidance_bridge.rs(optional feature-gated) - Define
VoxValidationError { code, span, message, suggested_correction }invox-compiler/src/error.rs - Implement
mcp_pre_emit_validate(code, format) -> Result<(), VoxValidationError>invox-mcp/src/code_validator.rs - Wire into
vox_generate_codeMCP tool - Wire into
vox_speech_to_codeMCP tool - Wire into
PlanBridge::plan_to_descriptorsfor.voxsteps - Implement Rust pre-emit:
rustc --parse-onlysubprocess on temp file - Add
vox_validate_code(code, language) -> { valid, errors }standalone MCP tool - Implement
MensGrpoTrainer::train_grpo(config, data) -> GrpoTrainingResultinvox-tensor/src/grpo.rs - Define
GrpoConfig { k_samples, temperature, reward_weights, policy_lr, clip_epsilon, max_steps } - Define
RewardWeights { parse_weight, test_weight, coverage_weight }defaults(0.6, 0.3, 0.1) - Implement
generate_k_candidates(prompt, model, k) -> Vec<String> - Implement
score_candidate(candidate) -> RewardSignal - Implement
compute_advantages(rewards) -> Vec<f32>(group mean baseline) - Implement
policy_gradient_update(model, candidates, advantages)(PPO-clip style) - Expose
vox mens train --mode grpoCLI flag - Expose
--k 8 --reward parse:0.6,test:0.3,coverage:0.1arguments - Add GRPO telemetry:
group_rewards,mean_reward,policy_loss,clip_fractionper step - Persist to Arca
grpo_training_runtable - Define
GrpoTrainingResult { steps_completed, final_mean_reward, parse_rate, checkpoint_path } - Fix G-18:
vox_schola_submitfailures ->auto_ingest_negative - Add
vox mens eval --mode grpo-reward(dry-run) - Add
mens/config/grpo_default.toml(k=8, temp=0.8, max_steps=500) - Write tests:
compute_advantagescorrectness - Write tests: constrained sampler produces only grammar-accepted tokens
- Write tests:
mcp_pre_emit_validate-> error for missing closing} - Write tests:
mcp_pre_emit_validate->Ok(())for valid function - Write tests:
vox_validate_code-> errors for invalid Rust - Write tests: GRPO loop completes 10 steps without panic on RTX 4080 SUPER
- Write tests:
train --mode grpo-> checkpoint withfinal_mean_reward > 0.5 - Integration test: constrained generation -> 100% parse rate on 50 generations
- Integration test: invalid snippet via MCP ->
VoxValidationError, no file written - Integration test: GRPO model vs SFT baseline -> >= 5pp parse rate improvement
- Run
vox stub-check --path crates/vox-constrained-gen crates/vox-mcp,cargo test --workspace - Update
docs/src/architecture/mens-training-ssot.md - Update
examples/STYLE.md - Add
vox ci grammar-constrained-gen-smoke-test - Add
vox ci mens-corpus-health - Add
vox ci grpo-reward-baseline - Persist all CI results to Arca for trend analysis
Part 4 — Observability & Telemetry (241-245)
- Add
ObservationReportto VS Code extension push-telemetry stream - Color-code agent viz nodes by
OrientReport.risk_band - Add
VictoryVerdicttier summary panel to workflow visualizer - Add
TestDecisionbadge to each task card - Add
RewardSignal.compositesparkline to MENS training progress panel
Part 5 — Documentation (246-254)
- Write
docs/src/architecture/oopav-loop.md - Write
docs/src/architecture/observer-design.md - Write
docs/src/architecture/victory-conditions.md - Write
docs/src/architecture/test-decision-policy.md - Write
docs/src/architecture/mens-grammar-intelligence.md - Update
docs/src/architecture/mens-training-ssot.md - Update
docs/src/contributors/contributor-hub.md - Update
AGENTS.md - Update
docs/agents/governance.md
Milestone Gates
| After Wave | Gate |
|---|---|
| 0 | All V38 Arca migrations applied; vox stub-check clean across all new crates |
| 1 | vox grammar export --format gbnf accepted by llama.cpp --grammar-file |
| 2 | Observer: live LSP error detection on modified .vox file integration test passes |
| 3 | Orient phase blocks Red band task from acting without evidence hydration |
| 4 | Complexity-8 .vox task with no test step rejected by PlanBridge |
| 5 | Full VictoryCondition::Full pass on a clean newly-generated Vox crate |
| 6 | Autonomic replan triggered and completed on a simulated tier-3 failure |
| 7 | mens_corpus_quality has >= 500 split=training rows from Scientia auto-ingestion |
| 8 | golden_validated.jsonl >= 500 pairs; AST eval parse_rate >= 99.5% |
| 9 | 100 consecutive constrained-inference generations parse_rate = 100%; GRPO dry-run mean_reward > 0.4 |
Key Design Rationale
GBNF over Outlines/llguidance first: GBNF integrates natively with llama.cpp (already powering the local Populi server). llguidance added as optional bridge for dynamic grammars. Minimizes new dependencies.
AST eval over regex: Parse rate is binary. AstEvalReport provides a gradient signal — construct density, type annotation rate, test presence — enabling richer GRPO reward shaping.
GRPO over PPO: Eliminates the value network (critic), reducing memory ~40%. Critical under the 16 GB VRAM constraint on RTX 4080 SUPER. Group-relative baselines suit code generation's high candidate variance.
Observer separate from Verifier: Verifier is synchronous and post-hoc. Observer is asynchronous and continuous — allows Act to proceed without blocking while still delivering mid-flight course-corrections via TriggerReplan.
MCP pre-emit failures as negative examples: Each failure is high-signal teaching data. Invalid LLM-generated code becomes a structured negative pair (error = correction signal), closing the training loop organically without human annotation.