Plan adequacy — research synthesis and Vox behavior
Why “add more detail” often fails
Planner outputs are constrained by multiple stacked layers, not only model capability:
- Output token caps — APIs expose
max_output_tokens,max_completion_tokens, etc.; vendors also tune for cost and latency, which favors shorter completions. See OpenAI’s guidance on controlling response length (Controlling the length of OpenAI model responses). - Verbosity and reasoning budgets — On GPT‑5-class routes,
verbositysteers detail;reasoning.effortconsumes part of the completion budget before visible text. A fixed cap can leave little room for a long visible plan (same OpenAI article). - Lossy context compaction — Long agent sessions summarize or drop old context; Cursor documents that summarization is lossy and can degrade task knowledge (Dynamic context discovery). Training for “self‑summarization” optimizes dense short carry‑forward state (~1k tokens vs multi‑k baselines) (Training Composer for longer horizons).
- Dynamic context harnesses — Agents are steered to pull context on demand rather than materializing one huge plan up front (same dynamic context post). That improves tokens and sometimes quality but undershoots users who want one detailed static plan.
- Infrastructure — Truncation, JSON parse failures on long structured outputs, timeouts, and rate limits all present as “the plan stopped early” or “it rewrote without adding substance.”
Implication: Safe mitigation is not “prompt harder once”; it is measure thinness, expand in bounded steps, persist plans outside chat, and telemetry to verify improvement.
Vox planning surfaces (where adequacy applies)
| Surface | Role | Adequacy integration |
|---|---|---|
MCP vox_plan | LLM JSON task list + optional refinement | PlanRefinementReport: gap heuristics + plan-level adequacy; expansion-first refinement; optional plan_depth for token/detail targets |
Orchestrator goal → synthesize_plan_nodes | Rule-based PlanNode DAG | Same report shape via plan_nodes_to_adequacy_tasks; adequacy JSON on plan_session_created lineage; optional tracing when thin |
quality_gate | Blocks vague/destructive nodes | Uses orchestrator_node_text_findings plus file_manifest checks (tbd path / filename, empty path → tbd_placeholder / manifest_empty_path); adequacy is plan-level and complementary |
Codex plan_sessions.iterative_loop_metadata_json | MCP iterative telemetry | Merge adequacy + refinement metadata for analytics |
Deterministic signals (tier‑1)
Implemented in vox-orchestrator planning/plan_adequacy.rs:
- Per-task: short text, vague phrases, TBD placeholders, destructive cues, dependency integrity, heavy tasks without test hints (aligned with legacy MCP gap behavior).
- Plan-level: minimum task count vs estimated goal complexity; missing verification for implementation-flavored goals; flat DAG (many tasks, no deps); goal path tokens without task
files; mega-task clusters (several very high complexity tasks). - Structural noise: many tasks but low surface (short descriptions, few file linkages); repeated task openings (copy-paste “detail” without distinct steps).
- Refinement regression (MCP): when a prior task list is supplied after a refine pass, signals include task-count compression, lost file linkage, and shrunk total description mass—guarding against “rewrite” that drops substance.
is_too_thin combines low adequacy score with structural reason codes so refinement triggers even when per-task keyword risk is moderate.
Safe expansion policy
- Expand, don’t wholesale rewrite — Refinement prompts require preserving existing task IDs and intent unless a gap code demands a fix; new work is additional tasks with new IDs.
- Bound rounds and token budget — Reuses
max_refine_rounds,refine_budget_tokens,gap_risk_threshold; Auto mode refines when aggregate gap risk oris_too_thin. - Optional auto-expansion when
loop_modeis off —auto_expand_thin_plan(default on): run a small refinement pass when the draft is thin, so clients that never setloop_modestill benefit. - Orchestrator shadow —
plan_adequacy_shadow(defaulttrue): enqueue behavior unchanged; lineage + logs carry adequacy for dashboards before any enforcement. - Orchestrator enforce (opt-in) —
plan_adequacy_enforce/VOX_ORCHESTRATOR_PLAN_ADEQUACY_ENFORCE: native synthesized plans that remain thin after synthesis are rejected withScopeDenied(afterquality_gate); the same flag makes MCPvox_planfail when the refined JSON plan is still thin.
Telemetry and rollout
Fields to record (conceptual)
Codex / JSON metadata SHOULD include where possible:
| Field | Purpose |
|---|---|
adequacy_score | 0..1 structural adequacy |
is_too_thin | Boolean trigger |
adequacy_reason_codes | too_few_tasks, missing_plan_verification, etc. |
detail_target_min_tasks | Expected floor for complexity |
estimated_goal_complexity | Router/word heuristic |
aggregate_unresolved_risk | Legacy gap rollup |
refinement_rounds, loop_stop_reason | Loop outcome |
plan_depth | minimal / standard / deep |
initial_plan_max_output_tokens | Diagnose truncation (MCP metadata) |
adequacy_before / adequacy_after | Tier‑1 snapshots before vs after refinement |
task_count_before_refine / task_count_after_refine | Detect collapse vs expansion |
adequacy_improved_heuristic | True if score rose, thin cleared, or aggregate risk dropped |
Rollout stages
- Shadow (default) —
plan_adequacy_shadow: true; only metrics + logs. - Auto-expand MCP — Default on via
auto_expand_thin_planand Auto loop ORis_too_thin. - Enforce native plans (opt-in) —
VOX_ORCHESTRATOR_PLAN_ADEQUACY_ENFORCEblocks goal enqueue when the rule-based synthesized DAG is still thin. - Enforce MCP plans (same flag) — When the flag is on,
vox_planreturns a tool error if the plan is stillis_too_thinafter refinement (telemetry DB updates are skipped on that path). - Stricter MCP / post-refine policy (future) — Optional extra gates (e.g. max aggregate gap risk) or questioning-first flows when facts are missing. Governance for when planning MUST ask before generating a plan is specified in
planning-meta/12-question-gate-standard.md.
Example SQL (Codex SQLite)
plan_sessions.iterative_loop_metadata_json and orchestration lineage payloads may contain JSON blobs. Example exploration query (adjust DB path):
-- Recent MCP plan sessions with iterative metadata (if populated)
SELECT plan_session_id,
iterative_loop_round,
iterative_stop_reason,
iterative_loop_metadata_json
FROM plan_sessions
WHERE iterative_loop_metadata_json IS NOT NULL
ORDER BY updated_at DESC
LIMIT 20;
Use json_extract(iterative_loop_metadata_json, '$.adequacy_after.score') (or $.adequacy_before.score) where SQLite JSON1 is enabled.
Related docs
- Socrates protocol — SSOT — telemetry surfaces for MCP tools
- Information-theoretic questioning — when to ask vs expand
- Anti-foot-gun planning standard