"Context management phase 1 backlog"

Context management phase 1 backlog

Purpose

This document is the prioritized first implementation wave for the context-management program. It is intentionally front-loaded toward high-win, low-regret changes that improve correctness before deeper optimization.

Companion documents:

Prioritization rules

Tasks are ordered by this priority stack:

  1. stop context bleed,
  2. stop silent under-grounding,
  3. make behavior observable,
  4. unify local surfaces,
  5. harden distributed handoff,
  6. then optimize quality and cost.

Phase 0: Contract and identity foundation

PriorityIDOwnerTaskDepends onVerify
P0ctx.001orchestratorAdd Rust ContextEnvelope model mirroring the schema contractnoneunit_test, contract_validation
P0ctx.002mcpAdd adapter from MCP retrieval evidence to ContextEnvelopectx.001unit_test
P0ctx.003orchestratorAdd adapter from SessionRetrievalEnvelope to ContextEnvelopectx.001unit_test
P0ctx.004orchestratorAdd adapter from SocratesTaskContext to ContextEnvelope projectionctx.001unit_test
P0ctx.005populiAdd remote payload wrapper for ContextEnvelope JSON in A2A deliveryctx.001integration_test
P0ctx.006mcpIntroduce explicit session identity helper instead of silent "default" for new callersnoneunit_test
P0ctx.007orchestratorRequire session lineage on submit paths that expect continuityctx.006integration_test
P0ctx.008orchestratorAdd thread lineage fields to task and handoff context adaptersctx.001integration_test
P0ctx.009cross_cuttingEmit context.capture and context.select tracing events in shadow modectx.001telemetry_review
P0ctx.010testsAdd concurrent-session bleed regression fixturesctx.006integration_test
P0ctx.011docsDocument canonical session and thread invariants in reference docsctx.006docs_review
P0ctx.012opsAdd feature flags for envelope dual-write and identity enforcementctx.001manual_trace

Phase 1: Local retrieval and gating hardening

PriorityIDOwnerTaskDepends onVerify
P1ctx.101searchCentralize retrieval trigger evaluation into a shared policy modulectx.001unit_test
P1ctx.102mcpSwitch chat preamble retrieval to shared trigger policyctx.101integration_test
P1ctx.103orchestratorSwitch task-submit retrieval to shared trigger policyctx.101integration_test
P1ctx.104searchDefine common budget knobs for auto preamble, explicit search, and submit-time retrievalctx.101unit_test
P1ctx.105orchestratorDistinguish no-retrieval, heuristic, verified, and corrective retrieval tiers in task contextctx.101unit_test
P1ctx.106searchAdd retrieval quality evaluator using contradiction, diversity, and citation coveragectx.101unit_test
P1ctx.107orchestratorFail closed on high-risk tasks that remain ungrounded after required retrievalctx.105integration_test
P1ctx.108mcpSurface policy version and retrieval decision path in MCP responsesctx.101manual_trace
P1ctx.109testsAdd fixtures for code-navigation, repo-structure, and factual-lookup trigger correctnessctx.101eval_benchmark
P1ctx.110docsAdd search-vs-memory operator guidancectx.102docs_review
P1ctx.111cross_cuttingEmit context.retrieve spans with conversation, agent, and policy metadatactx.106telemetry_review
P1ctx.112opsAdd rollout toggles for retrieval-policy shadow and enforce modesctx.107canary_rollout

Phase 2: Corrective retrieval and compaction

PriorityIDOwnerTaskDepends onVerify
P2ctx.201searchAdd corrective retrieval planner for weak or contradictory evidencectx.106unit_test
P2ctx.202searchImplement query rewrite and corpus-broaden hooks for second-pass retrievalctx.201unit_test
P2ctx.203orchestratorThread corrective-retrieval result into Socrates task contextctx.201integration_test
P2ctx.204mcpPreserve corrective retrieval metadata in MCP evidence envelopesctx.201unit_test
P2ctx.205mcpAdd envelope-based compaction output for long chat sessionsctx.001integration_test
P2ctx.206orchestratorAllow task submit to consume compacted session summariesctx.205integration_test
P2ctx.207mcpAdd note-taking envelope writer for durable task/session notesctx.001integration_test
P2ctx.208searchAdd stale-context refresh rule using TTL and freshness metadatactx.001unit_test
P2ctx.209testsCreate contradiction-resolution benchmark setctx.201eval_benchmark
P2ctx.210cross_cuttingEmit context.compact and context.resolve spansctx.205telemetry_review
P2ctx.211docsDocument corrective retrieval and compaction lifecyclectx.205docs_review
P2ctx.212opsEnable corrective retrieval in shadow mode for selected surfacesctx.201canary_rollout

Phase 3: Handoff and distributed context integrity

PriorityIDOwnerTaskDepends onVerify
P3ctx.301orchestratorAdd ContextEnvelope wrapper to local handoff payloadsctx.001integration_test
P3ctx.302orchestratorPreserve session/thread lineage through accept_handoffctx.301integration_test
P3ctx.303populiExtend remote task envelope population with context lineage and artifact refsctx.005integration_test
P3ctx.304searchImplement production handling for A2ARetrievalRequest and A2ARetrievalResponsectx.005integration_test
P3ctx.305populiAdd remote retrieval worker flow using shared vox-searchctx.304integration_test
P3ctx.306orchestratorReconcile remote result lineage with task, lease, and session authorityctx.303integration_test
P3ctx.307populiAdd lease-aware failure states for remote context loss and retryctx.303integration_test
P3ctx.308cross_cuttingEmit context.handoff spans with sender, receiver, node, and lease identifiersctx.301telemetry_review
P3ctx.309testsAdd remote-handoff integrity evals for session continuity and authority ownershipctx.303eval_benchmark
P3ctx.310docsDocument remote context contract for MENs and Populictx.303docs_review
P3ctx.311opsAdd kill-switches for remote envelope enforcement and remote retrieval delegationctx.303canary_rollout
P3ctx.312orchestratorReject remote execution paths that lack explicit lineage when enforcement is onctx.311integration_test

Phase 4: Conflict governance and enforceable release gates

PriorityIDOwnerTaskDepends onVerify
P4ctx.401orchestratorImplement conflict classifier for temporal, semantic, authority, source-trust, and policy conflictsctx.001unit_test
P4ctx.402orchestratorImplement precedence and merge strategy enginectx.401unit_test
P4ctx.403searchBind overwrite behavior to evidence and trust thresholdsctx.401unit_test
P4ctx.404mcpMark stale or low-trust context as reference-only instead of inlinectx.402integration_test
P4ctx.405orchestratorPersist conflict-resolution events for review and metricsctx.401integration_test
P4ctx.406testsAdd merge-policy regression suitectx.402eval_benchmark
P4ctx.407cross_cuttingCreate scorecard query surfaces for conflict rate and resolution outcomesctx.405telemetry_review
P4ctx.408opsPromote high-risk task retrieval enforcement from shadow to opt-in enforcectx.107canary_rollout
P4ctx.409opsPromote remote lineage enforcement from shadow to opt-in enforcectx.312canary_rollout
P4ctx.410opsAdd context-system release checklist and rollback matrixctx.407docs_review
P4ctx.411docsPublish conflict-governance SSOT and deprecation criteria for legacy payloadsctx.402docs_review
P4ctx.412cross_cuttingFreeze v1 KPI/SLO gates for CI and staged rollout dashboardsctx.407telemetry_review

Detailed operation expansion

The tables above are the phase-level seed. The following sections expand the complex work into operation-level tasks so the program does not claim progress too early on large multi-surface features.

Phase 0 detailed operations: contract and identity

IDOwnerOperationDepends onVerify
ctx.013orchestratorDefine envelope fixture for chat_turnctx.001contract_validation
ctx.014orchestratorDefine envelope fixture for retrieval_evidencectx.001contract_validation
ctx.015orchestratorDefine envelope fixture for task_contextctx.001contract_validation
ctx.016orchestratorDefine envelope fixture for handoff_contextctx.001contract_validation
ctx.017orchestratorDefine envelope fixture for execution_contextctx.001contract_validation
ctx.018mcpMap chat history entries into envelope projectionsctx.013unit_test
ctx.019mcpAdd session-ID normalization helper with explicit warning pathctx.006unit_test
ctx.020mcpAudit every session_id default path under MCP chat and task surfacesctx.019manual_trace
ctx.021orchestratorAdd thread-id plumbing for task submit metadatactx.008integration_test
ctx.022orchestratorAdd session/thread fields to handoff metadata builderctx.008unit_test
ctx.023orchestratorAdd structured warn-only rejection path for missing remote lineagectx.007integration_test
ctx.024testsAdd fixture pair proving two concurrent sessions do not share retrieval envelope keysctx.010integration_test
ctx.025testsAdd fixture proving remote-bound work cannot silently use implicit default session lineagectx.023integration_test
ctx.026cross_cuttingEmit envelope-id generation and propagation tracesctx.009telemetry_review
ctx.027docsDocument “default session” compatibility and deprecation posturectx.020docs_review
ctx.028opsAdd config matrix documenting warn-only vs enforce behavior for missing lineagectx.012docs_review

Phase 1 detailed operations: retrieval policy parity

IDOwnerOperationDepends onVerify
ctx.113searchDefine shared retrieval-policy decision result shapectx.101unit_test
ctx.114searchClassify query families into low-risk, normal, and high-risk bucketsctx.101unit_test
ctx.115searchDefine forced-search categories for codebase and environment claimsctx.114docs_review
ctx.116mcpReplace local trigger heuristics in chat preamble path with shared policy adapterctx.102integration_test
ctx.117mcpReplace explicit search-tool trigger reporting with shared policy adapterctx.102integration_test
ctx.118orchestratorAdd policy-evaluation call before attach_goal_search_context_with_retrievalctx.103integration_test
ctx.119orchestratorPreserve policy-evaluation rationale in task trace metadatactx.118telemetry_review
ctx.120searchAdd per-surface retrieval budget knobs and defaultsctx.104unit_test
ctx.121searchAdd parity tests ensuring MCP and orchestrator classify the same query identicallyctx.113unit_test
ctx.122testsAdd code-navigation trigger fixture setctx.109eval_benchmark
ctx.123testsAdd repo-structure trigger fixture setctx.109eval_benchmark
ctx.124testsAdd factual-lookup trigger fixture setctx.109eval_benchmark
ctx.125testsAdd “should skip retrieval” low-risk fixture setctx.109eval_benchmark
ctx.126orchestratorAdd high-risk deny-complete gate when retrieval was required but absentctx.107integration_test
ctx.127cross_cuttingEmit trace field for retrieval-skip reasonctx.111telemetry_review
ctx.128cross_cuttingEmit trace field for retrieval-policy version and risk tierctx.111telemetry_review
ctx.129docsPublish policy table describing search-required vs memory-allowed behaviorctx.110docs_review
ctx.130opsAdd shadow scorecard comparing pre-policy and post-policy retrieval decisionsctx.112telemetry_review
ctx.131opsAdd rollback threshold for search-policy false positivesctx.112docs_review
ctx.132opsAdd rollback threshold for search-policy false negativesctx.112docs_review

Phase 2 detailed operations: corrective retrieval and compaction

IDOwnerOperationDepends onVerify
ctx.213searchDefine corrective-retrieval trigger thresholds in configctx.201unit_test
ctx.214searchAdd reason taxonomy for weak evidence, contradictions, and stale evidencectx.201unit_test
ctx.215searchImplement query-broaden rewrite helperctx.202unit_test
ctx.216searchImplement query-narrow rewrite helperctx.202unit_test
ctx.217searchImplement corpus recommendation output for correction stagectx.202unit_test
ctx.218orchestratorPreserve correction-stage diagnostics inside Socrates task contextctx.203integration_test
ctx.219mcpPreserve correction-stage diagnostics inside MCP retrieval envelopectx.204unit_test
ctx.220mcpDecide compaction owner and create design note in code/docsctx.205docs_review
ctx.221mcpDefine compaction input window selection rulesctx.220docs_review
ctx.222mcpDefine compaction output envelope shape and lineage fieldsctx.205contract_validation
ctx.223mcpImplement summary persistence path for compacted sessionsctx.222integration_test
ctx.224orchestratorAdd read path for compacted session summary during submitctx.206integration_test
ctx.225mcpImplement note-taking envelope write path distinct from compactionctx.207integration_test
ctx.226searchAdd freshness-aware rejection or refresh rule for stale contextctx.208unit_test
ctx.227testsAdd benchmark where corrective retrieval improves weak first-pass evidencectx.209eval_benchmark
ctx.228testsAdd benchmark where contradiction should escalate rather than continue retrievingctx.209eval_benchmark
ctx.229testsAdd session-compaction continuity benchmarkctx.223eval_benchmark
ctx.230testsAdd stale-summary suppression benchmarkctx.223eval_benchmark
ctx.231cross_cuttingEmit compaction generation and parent-envelope lineage tracesctx.210telemetry_review
ctx.232opsAdd corrective-retrieval loop budget and stop-limit rollout controlsctx.212canary_rollout

Phase 3 detailed operations: handoff and remote context

IDOwnerOperationDepends onVerify
ctx.313orchestratorExtend HandoffPayload with session identity fieldsctx.301unit_test
ctx.314orchestratorExtend HandoffPayload with thread identity fieldsctx.301unit_test
ctx.315orchestratorExtend HandoffPayload with retrieval-envelope reference fieldsctx.301unit_test
ctx.316orchestratorAdd invariant requiring session/thread continuity on resumable handoffctx.302integration_test
ctx.317orchestratorAdd warn-only mode for missing handoff lineagectx.302integration_test
ctx.318orchestratorBridge handoff payloads to context-store retrieval references when availablectx.315integration_test
ctx.319testsAdd local handoff continuity benchmark with session and thread preservationctx.316eval_benchmark
ctx.320testsAdd stale-handoff rejection benchmark for missing lineagectx.316eval_benchmark
ctx.321orchestratorMove retrieval attachment earlier in submit path before remote relay buildctx.303integration_test
ctx.322orchestratorAdd task-trace marker proving context assembly completed before remote relayctx.321telemetry_review
ctx.323populiExtend remote envelope population with session identityctx.303integration_test
ctx.324populiExtend remote envelope population with thread identityctx.303integration_test
ctx.325populiExtend remote envelope population with artifact referencesctx.303integration_test
ctx.326populiExtend remote envelope population with context-envelope reference or embedded snapshotctx.303integration_test
ctx.327populiAdd remote worker parser for richer remote envelope fieldsctx.303integration_test
ctx.328searchImplement requester-side send path for A2ARetrievalRequestctx.304integration_test
ctx.329populiImplement worker-side retrieval handler using shared vox-searchctx.305integration_test
ctx.330searchImplement response normalization from A2ARetrievalResponse into envelope formctx.304integration_test
ctx.331searchImplement refinement resend path using A2ARetrievalRefinementctx.304integration_test
ctx.332orchestratorReconcile remote result against lease lineage and session identityctx.306integration_test
ctx.333orchestratorAdd fallback path when remote result lacks required lineagectx.306integration_test
ctx.334testsAdd remote retrieval delegation benchmarkctx.329eval_benchmark
ctx.335testsAdd remote result reconciliation benchmarkctx.332eval_benchmark
ctx.336opsAdd canary matrix for remote envelope enforcement, remote retrieval delegation, and fallback modesctx.311canary_rollout

Phase 4 detailed operations: conflict governance and release gates

IDOwnerOperationDepends onVerify
ctx.413orchestratorDefine explicit precedence order across system, policy, user, peer, and derived contextctx.401docs_review
ctx.414orchestratorAdd freshness-based conflict classifier branchctx.401unit_test
ctx.415orchestratorAdd semantic-disagreement classifier branchctx.401unit_test
ctx.416orchestratorAdd authority-conflict classifier branchctx.401unit_test
ctx.417orchestratorAdd policy-conflict classifier branchctx.401unit_test
ctx.418orchestratorAdd dedupe-key and tombstone behavior for superseded envelopesctx.402unit_test
ctx.419searchAdd evidence-required overwrite rule for high-risk contextsctx.403unit_test
ctx.420mcpAdd reference-only injection mode for low-trust or stale envelopesctx.404integration_test
ctx.421orchestratorPersist structured conflict-resolution event rowsctx.405integration_test
ctx.422testsAdd stale-summary overwrite regression suitectx.406eval_benchmark
ctx.423testsAdd authority-override regression suitectx.406eval_benchmark
ctx.424testsAdd contradictory-evidence merge regression suitectx.406eval_benchmark
ctx.425cross_cuttingAdd operator query surfaces for conflict-class counts by surfacectx.407telemetry_review
ctx.426cross_cuttingAdd operator query surfaces for merge-strategy outcomesctx.407telemetry_review
ctx.427opsAdd enforce-readiness checklist for local retrieval gatectx.408docs_review
ctx.428opsAdd enforce-readiness checklist for remote lineage gatectx.409docs_review
ctx.429opsAdd deprecation checklist for legacy payload readersctx.410docs_review
ctx.430opsAdd rollback drill for bad envelope parse or bad merge behaviorctx.410canary_rollout
ctx.431docsPublish operator SSOT for conflict interpretation and remediationctx.411docs_review
ctx.432cross_cuttingFreeze scorecard schema and CI reporting format for context-system gatesctx.412telemetry_review

High-win first 15

If only a small first wave can ship immediately, do these first:

  1. ctx.001 canonical Rust envelope model.
  2. ctx.006 explicit session identity helper.
  3. ctx.007 task-submit lineage enforcement.
  4. ctx.010 concurrent-session bleed tests.
  5. ctx.101 shared retrieval trigger policy.
  6. ctx.102 MCP adoption of shared retrieval policy.
  7. ctx.103 orchestrator adoption of shared retrieval policy.
  8. ctx.106 retrieval quality evaluator.
  9. ctx.107 high-risk ungrounded-task fail-closed path.
  10. ctx.111 retrieval lifecycle spans.
  11. ctx.201 corrective retrieval planner.
  12. ctx.205 envelope-based compaction.
  13. ctx.301 local handoff envelope wrapper.
  14. ctx.303 remote task envelope lineage population.
  15. ctx.401 conflict classifier.

Rollout strategy

Stage 1: Shadow only

  • Emit envelopes and traces without changing current behavior.
  • Preserve current payloads and derive envelope projections from them.
  • Record bleed, grounding, and handoff correlation metrics before any enforcement.

Stage 2: Dual-write

  • Write both legacy payloads and normalized envelopes.
  • Compare envelope-derived behavior to current production behavior.
  • Gate remote and high-risk paths behind kill switches.

Stage 3: Local enforce

  • Enforce explicit session lineage on local handoff and task-submit paths.
  • Enforce retrieval requirements on high-risk local tasks.
  • Keep remote enforcement in shadow until correlation metrics are healthy.

Stage 4: Remote enforce

  • Require lineage and envelope presence for remote execution and remote retrieval.
  • Enable lease-aware remote context reconciliation.
  • Keep rollback flags for remote relay and retrieval delegation.

Stage 5: Legacy retirement

  • Remove legacy-only consumers after error budgets hold.
  • Keep adapters for historical replay and migration tooling as needed.

Required rollback guardrails

GuardrailPurpose
envelope dual-write flagdisable canonical-write if adapter regression appears
explicit-session enforcement flagfall back to warn-only when clients lag
retrieval-policy enforce flagreturn to shadow if false negatives appear
corrective-retrieval flagdisable second-pass cost spikes quickly
remote-envelope enforcement flagavoid breaking remote execution during rollout
conflict-engine enforce flagrevert to advisory mode if merges are too aggressive

KPI and SLO framework

Core KPIs

KPIDefinitionInitial target
context bleed ratepercentage of cross-session contamination incidents in deterministic tests and canaries0 in tests, near-zero in canaries
unsupported factual claim ratepercentage of high-risk completions lacking required evidencereduce materially release over release
retrieval adequacy ratepercentage of high-risk tasks with acceptable diversity, quality, and citation coverage> 95% in controlled evals
corrective retrieval success ratepercentage of weak first passes improved by second passtrend upward and stabilize
A2A handoff correlation successpercentage of handoffs preserving session/thread/task lineage end-to-end> 99% in integration tests
remote authority mismatch ratepercentage of remote results that fail lease or lineage reconciliationnear-zero
token overhead deltaincrease in input token cost after envelope adoptionbounded and visible
latency overhead deltaincrease in end-to-end latency after policy changesbounded and visible

SLO candidates

  1. SLO-context-bleed { zero deterministic bleed regressions on main.
  2. SLO-high-risk-grounding: no enforced high-risk path ships with unsupported-claim rate above agreed budget.
  3. SLO-handoff-lineage: remote and local handoff lineage integrity remains above 99% in gated suites.
  4. SLO-observability: every enforced policy decision emits a correlated trace or event.

Acceptance criteria for phase 1 completion

Phase 1 is complete only when all of the following are true:

  1. Canonical envelopes exist in code and contract form.
  2. Session and thread lineage are explicit on local task-submit and handoff paths.
  3. Search trigger policy is shared between MCP and orchestrator.
  4. Corrective retrieval is available in shadow mode with telemetry.
  5. Remote envelopes can carry structured lineage and artifact references.
  6. Conflict classes and observability vocabulary exist, even if full enforcement is still gated.
  7. Deterministic eval suites cover bleed, grounding, corrective retrieval, and handoff integrity.

Suggested next expansion after phase 1

After the first wave, expand the program by generating capability-level tasks under each epic using the work-item schema. This document now seeds 120+ explicit tasks when the detailed operation expansion is included, but the full program should still grow beyond this into the full hundreds-item implementation set described in the blueprint.