Context management implementation blueprint
Purpose
This document translates the research dossier into an implementation program that can expand into hundreds of work items without turning into an unstructured backlog.
Primary companion documents:
- Context management research findings 2026
- Context management phase 1 backlog
contracts/orchestration/context-work-item.schema.json
Delivery model
Work-item hierarchy
The program should use three levels only:
| Level | Meaning | Typical size |
|---|---|---|
| Epic | a user-visible or architecture-visible pillar | 6-12 capabilities |
| Capability | a coherent slice of behavior or infrastructure | 3-8 tasks |
| Task | one implementable change or testable rollout step | 1 PR or small series |
Required fields for every work item
Every epic, capability, and task should conform to:
Required operational fields:
- stable ID,
- owner type,
- risk tier,
- dependencies,
- acceptance criteria,
- verification method,
- files hint,
- KPI targets where applicable.
Example work item
{
"schema_version": 1,
"program_id": "context_management_sota_2026",
"work_item_type": "task",
"id": "ctx.session.reject-default-for-remote",
"parent_id": "ctx.session.identity-contract",
"title": "Reject implicit default session on remote task handoff",
"description": "Require explicit session lineage when a task crosses agent or node boundaries.",
"owner_type": "orchestrator",
"deliverable_type": "code",
"risk_tier": "high",
"effort_band": "m",
"status": "planned",
"depends_on": ["ctx.contract.context-envelope-v1"],
"files_hint": [
"crates/vox-orchestrator/src/orchestrator/task_dispatch/submit/goal.rs",
"crates/vox-orchestrator/src/a2a/envelope.rs"
],
"acceptance_criteria": [
"remote-bound tasks include explicit session lineage",
"missing lineage causes structured fallback or rejection",
"telemetry identifies the rejection reason"
],
"verification_methods": [
"integration_test",
"manual_trace",
"telemetry_review"
]
}
Program epics
Epic 1: Canonical context contract
Goal: make all context-bearing payloads adapt to one envelope.
Capabilities:
ContextEnvelopev1 schema and examples.- Adapters for MCP retrieval, session summary, task context, and remote handoff.
- Dual-write and canonical-write migration support.
How to implement:
- Add envelope structs and serde adapters in Rust.
- Normalize legacy payloads at ingress boundaries.
- Emit versioned contract-validation tests for known payload fixtures.
Epic 2: Session and thread identity
Goal: eliminate accidental context bleed.
Capabilities:
- Canonical session/thread/workspace identity contract.
- Default-session hardening rules.
- Session lineage on task submit, handoff, and remote execution.
How to implement:
- Introduce session identity helpers in MCP and orchestrator.
- Reject or relabel implicit defaults on remote/handoff paths.
- Add invariants and regression tests for concurrent sessions.
Epic 3: Compaction and note-taking
Goal: preserve long-horizon coherence without bloating prompts.
Capabilities:
- Envelope-based compaction outputs.
- Structured notes and session summaries.
- Compaction lineage and regeneration policy.
How to implement:
- Create summary and note envelope variants.
- Persist compaction generation and parent lineage.
- Add selection policy that prefers summaries plus recent working set over raw history.
Epic 4: Retrieval policy engine
Goal: make search-vs-memory decisions explicit and consistent.
Capabilities:
- Shared trigger evaluation across MCP and orchestrator.
- Risk-tier to retrieval-policy mapping.
- Budget-aware injection and refresh rules.
How to implement:
- Centralize trigger logic in a policy module rather than duplicating it in tool handlers.
- Thread policy version through retrieval diagnostics and envelopes.
- Emit traces for every retrieval decision.
Epic 5: Corrective retrieval and evidence repair
Goal: recover when first-pass retrieval is weak or contradictory.
Capabilities:
- Retrieval quality evaluator.
- Query/corpus rewrite stage.
- Escalation and replan contract.
How to implement:
- Convert evidence-quality and contradiction metrics into decision thresholds.
- Add a second-pass retrieval mode with rewritten query and recommended corpora.
- Make Socrates and planning consume the correction result explicitly.
Epic 6: Search-plane unification
Goal: expose the same retrieval semantics to all surfaces.
Capabilities:
- Common budgets for preamble, tool, and task-submit retrieval.
- Corpus selection policy that covers memory, knowledge, chunks, repo, and future web.
- Stable retrieval evidence shape for both local and remote use.
How to implement:
- Move per-surface limits into policy config.
- Preserve both lexical and vector diagnostics visibly.
- Add support for a future web-research corpus without changing envelope shape.
Epic 7: Handoff and A2A context integrity
Goal: make agent handoffs stateful, structured, and debuggable.
Capabilities:
- Handoff payloads carry normalized context lineage.
- A2A messages include session/thread/task identity.
- Handoff policy specifies what is copied, summarized, or refreshed.
How to implement:
- Add context-envelope wrappers to handoff and A2A send paths.
- Preserve sender and receiver identity in every handoff span.
- Add tests for local and remote handoff continuity.
Epic 8: MENs and Populi remote context delivery
Goal: make remote execution context-safe and single-owner.
Capabilities:
- Remote task envelopes carry context lineage and artifact refs.
A2ARetrievalRequest/Response/Refinementbecome production flows, not just contracts.- Lease-aware remote result reconciliation.
How to implement:
- Extend
RemoteTaskEnvelopepopulation to include context refs or embedded envelope snapshots. - Add remote retrieval worker handling using shared
vox-search. - Reconcile lease, task, and context lineage at result ingestion.
Epic 9: Conflict resolution and governance
Goal: merge or escalate contradictory context deterministically.
Capabilities:
- Conflict taxonomy and precedence engine.
- Evidence-bound overwrite rules.
- Tombstoning, expiry, dedupe, and stale suppression.
How to implement:
- Implement conflict classifier before merge.
- Apply strategy by conflict class rather than one global merge rule.
- Persist conflict events for debugging and KPI measurement.
Epic 10: Context observability
Goal: make context behavior traceable end to end.
Capabilities:
- OpenTelemetry-aligned spans and events.
- Stable context lifecycle event names.
- Dashboards and query surfaces for debugging.
How to implement:
- Add explicit span hooks at capture, retrieve, compact, select, handoff, resolve, and gate stages.
- Include conversation, task, session, agent, and node identifiers.
- Add operator-facing views for policy version, merge strategy, and retrieval path.
Epic 11: Evaluation and release gates
Goal: block regressions before context bugs reach users.
Capabilities:
- Deterministic session and retrieval test corpus.
- Eval harness for handoff and corrective retrieval.
- Rollout scorecards and CI gates.
How to implement:
- Add fixed fixtures for chat, retrieval, and handoff cases.
- Run per-epic benchmark suites with baseline comparisons.
- Promote gates from shadow to enforce only after metrics stabilize.
Epic 12: Rollout, migration, and deprecation
Goal: ship safely without breaking existing clients or stored data.
Capabilities:
- Dual-write transition plan.
- Fallback and kill-switch matrix.
- Legacy payload retirement criteria.
How to implement:
- Use additive payload fields first.
- Record adoption and failure rates by surface.
- Remove legacy shapes only after coverage and error budgets pass.
Second-pass critique and corrections
What the first blueprint got right
- It chose the correct architectural center: a canonical context envelope.
- It identified the right major systems: MCP, orchestrator, search, Socrates, Populi, and MENs.
- It prioritized anti-bleed, retrieval policy, handoff, conflict handling, and telemetry in the right broad order.
What the first blueprint under-specified
| Weak spot in v1 | Why it is a problem | Correction in this revision |
|---|---|---|
| “centralize policy” was too vague | current code has multiple trigger enums and call-site ownership boundaries | use a shared policy contract and parity tests before extracting shared code |
| compaction was listed too casually | there is no obvious single compaction runtime owner yet | add a compaction-ownership design slice before implementation |
| handoff work was too small | current handoff payloads and accept path do not preserve session/thread context | break handoff into identity, payload, context-store bridge, and verification tasks |
| remote context delivery was too compressed | remote relay ordering and payload shape are both incomplete | split remote work into ordering fix, payload expansion, worker intake, and result reconciliation |
| conflict handling was scheduled too late | trust/precedence fields influence adapter design immediately | define minimal conflict vocabulary at contract stage and delay full enforcement only |
| task counts were too low for distributed work | A2A, MENs, and corrective retrieval each require many integration and rollout steps | expand complex epics into explicit operation packs |
Corrected sequencing
The safer program order is:
- contract and identity,
- current-path telemetry,
- ordering fixes on submit and handoff paths,
- retrieval policy parity,
- corrective retrieval,
- compaction ownership and implementation,
- remote context payload expansion,
- remote retrieval delegation,
- conflict engine shadow mode,
- enforce only after eval and canary evidence.
Explicit operation packs by epic
This section expands each epic into concrete operations. These are intentionally explicit so that complex work does not collapse into underspecified “implementation” tasks.
Epic 1 operations: canonical context contract
- Define the Rust
ContextEnvelopetype and serde helpers. - Create fixture examples for each envelope variant.
- Add validation tests against
contracts/communication/context-envelope.schema.json. - Define a backward-compatible “legacy projection” API for legacy payloads.
- Add versioned parsing behavior: strict for tests, permissive for runtime additive fields.
- Add tracing helpers that log envelope IDs without dumping sensitive payloads.
- Document allowed producers and consumers for each variant.
- Add a migration note for legacy shapes that cannot losslessly round-trip.
Entry points:
crates/vox-orchestrator/src/mcp_tools/memory/retrieval.rscrates/vox-orchestrator/src/socrates.rscrates/vox-orchestrator/src/handoff.rscrates/vox-orchestrator/src/a2a/envelope.rs
Epic 2 operations: session and thread identity
- Define canonical identity fields and defaulting rules.
- Add MCP helper for explicit session allocation and validation.
- Audit all current uses of default
"default"session behavior. - Tag remote or handoff-bound work as requiring explicit lineage.
- Thread session and thread IDs through task submit and planning paths.
- Add session lineage fields to handoff payloads.
- Add rejection or warn-only modes for missing lineage.
- Add concurrent-session tests for bleed prevention.
- Add migration behavior for existing clients that omit session IDs.
- Emit telemetry whenever fallback defaulting still occurs.
Entry points:
crates/vox-orchestrator/src/mcp_tools/tools/chat_tools/chat/message.rscrates/vox-orchestrator/src/mcp_tools/tools/task_tools.rscrates/vox-orchestrator/src/orchestrator/task_dispatch/submit/goal.rscrates/vox-orchestrator/src/handoff.rscrates/vox-orchestrator/src/orchestrator/agent_lifecycle.rs
Epic 3 operations: compaction and note-taking
- Decide compaction owner: MCP turn loop, orchestrator, or dedicated helper surface.
- Define compaction input and output envelope shapes.
- Define what raw history is preserved, summarized, or dropped.
- Define compaction lineage fields and generation increments.
- Add summary storage and retrieval rules.
- Add note-taking envelope shape distinct from compaction summaries.
- Define reinjection priority between raw history, summaries, and notes.
- Add compaction-trigger thresholds and disable flags.
- Add tests for factual continuity after compaction.
- Add tests for not re-injecting stale or superseded summaries.
Important critique:
The first blueprint assumed compaction could be scheduled immediately. The codebase currently has memory and transcript surfaces but not a single obvious compaction runtime owner, so this epic must start with design and ownership, not code-first implementation.
Epic 4 operations: retrieval policy engine
- Define a policy contract shared by MCP and orchestrator call sites.
- Normalize trigger names and semantics across surfaces.
- Define risk-tier classes and mapping to retrieval requirements.
- Define common budget knobs for preamble, explicit tool, and submit-time retrieval.
- Add a policy-evaluation result struct with explanation fields.
- Add parity tests comparing MCP and orchestrator decisions for the same input.
- Preserve policy version in all retrieval evidence envelopes.
- Add operator-visible traces for “why retrieval ran” or “why retrieval skipped.”
- Add deny-list or forced-search rules for high-risk categories.
- Add canary mode for policy decisions before enforcement.
Important critique:
The first blueprint talked about “centralizing trigger logic,” but the correct first move is to centralize the contract and semantics, not necessarily the code module, because current crate ownership is still split.
Epic 5 operations: corrective retrieval and evidence repair
- Convert retrieval quality signals into a first-pass evaluator.
- Define thresholds for contradiction, narrow evidence, stale evidence, and weak coverage.
- Implement rewrite rules for query broadening and narrowing.
- Implement corpus override or recommendation hints.
- Preserve verification reason and verification query consistently.
- Add retry budget and loop limit controls.
- Thread corrective results into Socrates context and planning metadata.
- Add explicit “still insufficient” escalation outputs.
- Add eval cases where second pass improves outcome.
- Add eval cases where second pass should stop and ask or abstain.
Epic 6 operations: search-plane unification
- Inventory per-surface search limits and modes.
- Move those settings into policy and env-backed config where appropriate.
- Define a single evidence envelope surface for local and remote use.
- Preserve backend provenance across MCP and orchestrator callers.
- Make RRF and corpus-specific contributions visible in telemetry.
- Define how Tantivy and Qdrant participation should be surfaced to callers.
- Add explicit deferred-scope handling for
WebResearch. - Add tests for exact-token, semantic, and hybrid search parity.
- Add docs describing supported vs deferred corpora.
Important critique:
The first blueprint implied that future web corpus integration was near at hand. The code review shows it should remain explicitly deferred until a real executor and trust model exist.
Epic 7 operations: handoff and A2A context integrity
- Extend
HandoffPayloadwith session/thread/context-envelope references. - Define which fields are embedded vs referenced by durable artifact IDs.
- Add validation invariants for session/thread continuity.
- Bridge handoff payloads to context-store retrieval envelopes where appropriate.
- Add sender/receiver identity traces.
- Add local A2A message wrappers for envelope-aware handoff.
- Add context-transfer tests for local handoff.
- Add stale-handoff tests for missing or expired lineage.
- Add policy for partial handoff versus hard reset.
- Add documentation for receiver obligations before resuming work.
Epic 8 operations: MENs and Populi remote context delivery
- Fix submit ordering so required context exists before remote relay uses it.
- Expand
RemoteTaskEnvelopepopulation with lineage and context references. - Decide when context is embedded versus passed as durable artifact refs.
- Add worker-side intake that can parse the richer envelope.
- Add remote retrieval request handling using
A2ARetrievalRequest. - Add remote retrieval response handling and requester-side normalization.
- Add refinement follow-up flow for weak remote evidence.
- Add result reconciliation against lease, task, and session lineage.
- Add failure handling for missing artifacts or expired context.
- Add kill-switches and staged rollout controls.
- Add remote inbox, relay, and result tests.
- Add explicit operator docs for context-safe remote execution.
Important critique:
This was the most under-decomposed part of the first blueprint. Distributed context delivery is not one capability. It is a chain of ordering, serialization, transport, worker intake, result reconciliation, and rollback work.
Epic 9 operations: conflict resolution and governance
- Define minimal conflict classes in the envelope contract.
- Add a conflict classifier operating on normalized envelopes.
- Define precedence order across system, user, policy, peer, and derived context.
- Add freshness and expiry rules.
- Add evidence-required overwrite rules for high-risk updates.
- Add dedupe keys and tombstoning behavior.
- Add event logging for conflict decisions.
- Add shadow-mode merge strategy output before enforcement.
- Add regression tests for semantic disagreement and stale-summary suppression.
- Add docs for operator interpretation of conflict events.
Epic 10 operations: context observability
- Define stable span names and event payload fields.
- Map them to OpenTelemetry conventions where possible.
- Add envelope, session, task, thread, agent, and node identifiers to traces.
- Add sampling guidance so context-debugging spans are not dropped during rollout.
- Add retrieval, handoff, compaction, and conflict dashboards or query specs.
- Add correlation rules between local and remote events.
- Add redaction guidance for payload-bearing spans and logs.
- Add canary review queries and operator runbook snippets.
Epic 11 operations: evaluation and release gates
- Define deterministic fixture families by failure mode.
- Create session bleed test corpus.
- Create retrieval trigger parity test corpus.
- Create contradiction and corrective-retrieval test corpus.
- Create handoff continuity test corpus.
- Create remote relay and remote result reconciliation test corpus.
- Define scorecard formats and threshold interpretation.
- Add shadow-vs-enforce comparison dashboards or reports.
- Add CI gating order for unit, integration, eval, and canary evidence.
Epic 12 operations: rollout, migration, and deprecation
- Define dual-write and dual-read stages by surface.
- Add per-surface feature flags.
- Define fallback behavior when envelope parsing fails.
- Define compatibility behavior for missing lineage fields.
- Define rollback conditions for each major epic.
- Define telemetry thresholds required to move from shadow to enforce.
- Define deprecation criteria for legacy payloads.
- Define archival or replay strategy for legacy stored payloads.
- Add operator-facing upgrade and rollback notes.
Capability generation rules
When splitting an epic into capabilities, every capability must answer:
- What user-visible or operator-visible problem does it solve?
- Which code surfaces own the behavior?
- What evidence proves success?
- What contexts can it break if incorrectly rolled out?
When splitting a capability into tasks, every task must:
- change one contract, one policy, one test surface, or one rollout control at a time,
- have a rollback path,
- have an observable success signal,
- avoid mixing unrelated surfaces in one PR unless the change is purely mechanical.
For complex distributed or multi-surface capabilities, add one more rule:
- break sequencing-sensitive work into explicit ordering, serialization, transport, intake, reconciliation, and rollback tasks rather than one “wire it up” task.
Suggested epic-to-owner map
| Epic | Primary owner | Secondary owner |
|---|---|---|
| canonical contract | orchestrator | mcp |
| session identity | mcp | orchestrator |
| compaction | mcp | orchestrator |
| retrieval policy | search | orchestrator |
| corrective retrieval | search | mcp |
| search-plane unification | search | mcp |
| handoff integrity | orchestrator | mcp |
| MENs/Populi context delivery | populi | orchestrator |
| conflict governance | orchestrator | search |
| observability | cross_cutting | ops |
| evaluation | tests | search |
| rollout and deprecation | ops | cross_cutting |
Sequencing rules
Order of operations
- Freeze the canonical contract and session identity model.
- Instrument the current lifecycle before changing behavior.
- Unify retrieval policy and corrective retrieval next.
- Harden handoff and remote execution once envelope semantics are stable.
- Introduce conflict-resolution enforcement after observability and tests exist.
- Promote from shadow to enforce only after eval metrics hold.
What must not happen
- Do not deploy remote context delivery before session lineage is explicit.
- Do not enforce search requirements before the retrieval policy engine is shared.
- Do not merge conflicting context silently once conflict classes are available.
- Do not compact aggressively without compaction lineage and recovery tests.
Target scale
The following sizing is intentionally large because the system spans multiple crates and rollout phases:
| Epic count | Capabilities per epic | Tasks per capability | Estimated total tasks |
|---|---|---|---|
| 12 | 8-12 | 4-10 | 384-1440 |
This is the correct scale for the program. The system already exists in partial form; the remaining work is integration, hardening, telemetry, and release engineering.
Verification posture
Each epic should include at least one of:
- unit tests for adapters or policy logic,
- integration tests across MCP/orchestrator/Populi seams,
- deterministic eval fixtures,
- telemetry review queries,
- canary rollout checks.
The preferred rollout path is always:
- contract added,
- adapter added,
- telemetry added,
- shadow behavior enabled,
- benchmark reviewed,
- enforce only when safe.
Next document
The prioritized first implementation wave lives in: