"Context management implementation blueprint"

Context management implementation blueprint

Purpose

This document translates the research dossier into an implementation program that can expand into hundreds of work items without turning into an unstructured backlog.

Primary companion documents:

Delivery model

Work-item hierarchy

The program should use three levels only:

Level	Meaning	Typical size
Epic	a user-visible or architecture-visible pillar	6-12 capabilities
Capability	a coherent slice of behavior or infrastructure	3-8 tasks
Task	one implementable change or testable rollout step	1 PR or small series

Required fields for every work item

Every epic, capability, and task should conform to:

contracts/orchestration/context-work-item.schema.json

Required operational fields:

stable ID,
owner type,
risk tier,
dependencies,
acceptance criteria,
verification method,
files hint,
KPI targets where applicable.

Example work item

{
  "schema_version": 1,
  "program_id": "context_management_sota_2026",
  "work_item_type": "task",
  "id": "ctx.session.reject-default-for-remote",
  "parent_id": "ctx.session.identity-contract",
  "title": "Reject implicit default session on remote task handoff",
  "description": "Require explicit session lineage when a task crosses agent or node boundaries.",
  "owner_type": "orchestrator",
  "deliverable_type": "code",
  "risk_tier": "high",
  "effort_band": "m",
  "status": "planned",
  "depends_on": ["ctx.contract.context-envelope-v1"],
  "files_hint": [
    "crates/vox-orchestrator/src/orchestrator/task_dispatch/submit/goal.rs",
    "crates/vox-orchestrator/src/a2a/envelope.rs"
  ],
  "acceptance_criteria": [
    "remote-bound tasks include explicit session lineage",
    "missing lineage causes structured fallback or rejection",
    "telemetry identifies the rejection reason"
  ],
  "verification_methods": [
    "integration_test",
    "manual_trace",
    "telemetry_review"
  ]
}

Program epics

Epic 1: Canonical context contract

Goal: make all context-bearing payloads adapt to one envelope.

Capabilities:

ContextEnvelope v1 schema and examples.
Adapters for MCP retrieval, session summary, task context, and remote handoff.
Dual-write and canonical-write migration support.

How to implement:

Add envelope structs and serde adapters in Rust.
Normalize legacy payloads at ingress boundaries.
Emit versioned contract-validation tests for known payload fixtures.

Epic 2: Session and thread identity

Goal: eliminate accidental context bleed.

Capabilities:

Canonical session/thread/workspace identity contract.
Default-session hardening rules.
Session lineage on task submit, handoff, and remote execution.

How to implement:

Introduce session identity helpers in MCP and orchestrator.
Reject or relabel implicit defaults on remote/handoff paths.
Add invariants and regression tests for concurrent sessions.

Epic 3: Compaction and note-taking

Goal: preserve long-horizon coherence without bloating prompts.

Capabilities:

Envelope-based compaction outputs.
Structured notes and session summaries.
Compaction lineage and regeneration policy.

How to implement:

Create summary and note envelope variants.
Persist compaction generation and parent lineage.
Add selection policy that prefers summaries plus recent working set over raw history.

Epic 4: Retrieval policy engine

Goal: make search-vs-memory decisions explicit and consistent.

Capabilities:

Shared trigger evaluation across MCP and orchestrator.
Risk-tier to retrieval-policy mapping.
Budget-aware injection and refresh rules.

How to implement:

Centralize trigger logic in a policy module rather than duplicating it in tool handlers.
Thread policy version through retrieval diagnostics and envelopes.
Emit traces for every retrieval decision.

Epic 5: Corrective retrieval and evidence repair

Goal: recover when first-pass retrieval is weak or contradictory.

Capabilities:

Retrieval quality evaluator.
Query/corpus rewrite stage.
Escalation and replan contract.

How to implement:

Convert evidence-quality and contradiction metrics into decision thresholds.
Add a second-pass retrieval mode with rewritten query and recommended corpora.
Make Socrates and planning consume the correction result explicitly.

Epic 6: Search-plane unification

Goal: expose the same retrieval semantics to all surfaces.

Capabilities:

Common budgets for preamble, tool, and task-submit retrieval.
Corpus selection policy that covers memory, knowledge, chunks, repo, and future web.
Stable retrieval evidence shape for both local and remote use.

How to implement:

Move per-surface limits into policy config.
Preserve both lexical and vector diagnostics visibly.
Add support for a future web-research corpus without changing envelope shape.

Epic 7: Handoff and A2A context integrity

Goal: make agent handoffs stateful, structured, and debuggable.

Capabilities:

Handoff payloads carry normalized context lineage.
A2A messages include session/thread/task identity.
Handoff policy specifies what is copied, summarized, or refreshed.

How to implement:

Add context-envelope wrappers to handoff and A2A send paths.
Preserve sender and receiver identity in every handoff span.
Add tests for local and remote handoff continuity.

Epic 8: MENs and Populi remote context delivery

Goal: make remote execution context-safe and single-owner.

Capabilities:

Remote task envelopes carry context lineage and artifact refs.
A2ARetrievalRequest/Response/Refinement become production flows, not just contracts.
Lease-aware remote result reconciliation.

How to implement:

Extend RemoteTaskEnvelope population to include context refs or embedded envelope snapshots.
Add remote retrieval worker handling using shared vox-search.
Reconcile lease, task, and context lineage at result ingestion.

Epic 9: Conflict resolution and governance

Goal: merge or escalate contradictory context deterministically.

Capabilities:

Conflict taxonomy and precedence engine.
Evidence-bound overwrite rules.
Tombstoning, expiry, dedupe, and stale suppression.

How to implement:

Implement conflict classifier before merge.
Apply strategy by conflict class rather than one global merge rule.
Persist conflict events for debugging and KPI measurement.

Epic 10: Context observability

Goal: make context behavior traceable end to end.

Capabilities:

OpenTelemetry-aligned spans and events.
Stable context lifecycle event names.
Dashboards and query surfaces for debugging.

How to implement:

Add explicit span hooks at capture, retrieve, compact, select, handoff, resolve, and gate stages.
Include conversation, task, session, agent, and node identifiers.
Add operator-facing views for policy version, merge strategy, and retrieval path.

Epic 11: Evaluation and release gates

Goal: block regressions before context bugs reach users.

Capabilities:

Deterministic session and retrieval test corpus.
Eval harness for handoff and corrective retrieval.
Rollout scorecards and CI gates.

How to implement:

Add fixed fixtures for chat, retrieval, and handoff cases.
Run per-epic benchmark suites with baseline comparisons.
Promote gates from shadow to enforce only after metrics stabilize.

Epic 12: Rollout, migration, and deprecation

Goal: ship safely without breaking existing clients or stored data.

Capabilities:

Dual-write transition plan.
Fallback and kill-switch matrix.
Legacy payload retirement criteria.

How to implement:

Use additive payload fields first.
Record adoption and failure rates by surface.
Remove legacy shapes only after coverage and error budgets pass.

Second-pass critique and corrections

What the first blueprint got right

It chose the correct architectural center: a canonical context envelope.
It identified the right major systems: MCP, orchestrator, search, Socrates, Populi, and MENs.
It prioritized anti-bleed, retrieval policy, handoff, conflict handling, and telemetry in the right broad order.

What the first blueprint under-specified

Weak spot in v1	Why it is a problem	Correction in this revision
“centralize policy” was too vague	current code has multiple trigger enums and call-site ownership boundaries	use a shared policy contract and parity tests before extracting shared code
compaction was listed too casually	there is no obvious single compaction runtime owner yet	add a compaction-ownership design slice before implementation
handoff work was too small	current handoff payloads and accept path do not preserve session/thread context	break handoff into identity, payload, context-store bridge, and verification tasks
remote context delivery was too compressed	remote relay ordering and payload shape are both incomplete	split remote work into ordering fix, payload expansion, worker intake, and result reconciliation
conflict handling was scheduled too late	trust/precedence fields influence adapter design immediately	define minimal conflict vocabulary at contract stage and delay full enforcement only
task counts were too low for distributed work	A2A, MENs, and corrective retrieval each require many integration and rollout steps	expand complex epics into explicit operation packs

Corrected sequencing

The safer program order is:

contract and identity,
current-path telemetry,
ordering fixes on submit and handoff paths,
retrieval policy parity,
corrective retrieval,
compaction ownership and implementation,
remote context payload expansion,
remote retrieval delegation,
conflict engine shadow mode,
enforce only after eval and canary evidence.

Explicit operation packs by epic

This section expands each epic into concrete operations. These are intentionally explicit so that complex work does not collapse into underspecified “implementation” tasks.

Epic 1 operations: canonical context contract

Define the Rust ContextEnvelope type and serde helpers.
Create fixture examples for each envelope variant.
Add validation tests against contracts/communication/context-envelope.schema.json.
Define a backward-compatible “legacy projection” API for legacy payloads.
Add versioned parsing behavior: strict for tests, permissive for runtime additive fields.
Add tracing helpers that log envelope IDs without dumping sensitive payloads.
Document allowed producers and consumers for each variant.
Add a migration note for legacy shapes that cannot losslessly round-trip.

Entry points:

crates/vox-orchestrator/src/mcp_tools/memory/retrieval.rs
crates/vox-orchestrator/src/socrates.rs
crates/vox-orchestrator/src/handoff.rs
crates/vox-orchestrator/src/a2a/envelope.rs

Epic 2 operations: session and thread identity

Define canonical identity fields and defaulting rules.
Add MCP helper for explicit session allocation and validation.
Audit all current uses of default "default" session behavior.
Tag remote or handoff-bound work as requiring explicit lineage.
Thread session and thread IDs through task submit and planning paths.
Add session lineage fields to handoff payloads.
Add rejection or warn-only modes for missing lineage.
Add concurrent-session tests for bleed prevention.
Add migration behavior for existing clients that omit session IDs.
Emit telemetry whenever fallback defaulting still occurs.

Entry points:

crates/vox-orchestrator/src/mcp_tools/tools/chat_tools/chat/message.rs
crates/vox-orchestrator/src/mcp_tools/tools/task_tools.rs
crates/vox-orchestrator/src/orchestrator/task_dispatch/submit/goal.rs
crates/vox-orchestrator/src/handoff.rs
crates/vox-orchestrator/src/orchestrator/agent_lifecycle.rs

Epic 3 operations: compaction and note-taking

Decide compaction owner: MCP turn loop, orchestrator, or dedicated helper surface.
Define compaction input and output envelope shapes.
Define what raw history is preserved, summarized, or dropped.
Define compaction lineage fields and generation increments.
Add summary storage and retrieval rules.
Add note-taking envelope shape distinct from compaction summaries.
Define reinjection priority between raw history, summaries, and notes.
Add compaction-trigger thresholds and disable flags.
Add tests for factual continuity after compaction.
Add tests for not re-injecting stale or superseded summaries.

Important critique:

The first blueprint assumed compaction could be scheduled immediately. The codebase currently has memory and transcript surfaces but not a single obvious compaction runtime owner, so this epic must start with design and ownership, not code-first implementation.

Epic 4 operations: retrieval policy engine

Define a policy contract shared by MCP and orchestrator call sites.
Normalize trigger names and semantics across surfaces.
Define risk-tier classes and mapping to retrieval requirements.
Define common budget knobs for preamble, explicit tool, and submit-time retrieval.
Add a policy-evaluation result struct with explanation fields.
Add parity tests comparing MCP and orchestrator decisions for the same input.
Preserve policy version in all retrieval evidence envelopes.
Add operator-visible traces for “why retrieval ran” or “why retrieval skipped.”
Add deny-list or forced-search rules for high-risk categories.
Add canary mode for policy decisions before enforcement.

Important critique:

The first blueprint talked about “centralizing trigger logic,” but the correct first move is to centralize the contract and semantics, not necessarily the code module, because current crate ownership is still split.

Epic 5 operations: corrective retrieval and evidence repair

Convert retrieval quality signals into a first-pass evaluator.
Define thresholds for contradiction, narrow evidence, stale evidence, and weak coverage.
Implement rewrite rules for query broadening and narrowing.
Implement corpus override or recommendation hints.
Preserve verification reason and verification query consistently.
Add retry budget and loop limit controls.
Thread corrective results into Socrates context and planning metadata.
Add explicit “still insufficient” escalation outputs.
Add eval cases where second pass improves outcome.
Add eval cases where second pass should stop and ask or abstain.

Epic 6 operations: search-plane unification

Inventory per-surface search limits and modes.
Move those settings into policy and env-backed config where appropriate.
Define a single evidence envelope surface for local and remote use.
Preserve backend provenance across MCP and orchestrator callers.
Make RRF and corpus-specific contributions visible in telemetry.
Define how Tantivy and Qdrant participation should be surfaced to callers.
Add explicit deferred-scope handling for WebResearch.
Add tests for exact-token, semantic, and hybrid search parity.
Add docs describing supported vs deferred corpora.

Important critique:

The first blueprint implied that future web corpus integration was near at hand. The code review shows it should remain explicitly deferred until a real executor and trust model exist.

Epic 7 operations: handoff and A2A context integrity

Extend HandoffPayload with session/thread/context-envelope references.
Define which fields are embedded vs referenced by durable artifact IDs.
Add validation invariants for session/thread continuity.
Bridge handoff payloads to context-store retrieval envelopes where appropriate.
Add sender/receiver identity traces.
Add local A2A message wrappers for envelope-aware handoff.
Add context-transfer tests for local handoff.
Add stale-handoff tests for missing or expired lineage.
Add policy for partial handoff versus hard reset.
Add documentation for receiver obligations before resuming work.

Epic 8 operations: MENs and Populi remote context delivery

Fix submit ordering so required context exists before remote relay uses it.
Expand RemoteTaskEnvelope population with lineage and context references.
Decide when context is embedded versus passed as durable artifact refs.
Add worker-side intake that can parse the richer envelope.
Add remote retrieval request handling using A2ARetrievalRequest.
Add remote retrieval response handling and requester-side normalization.
Add refinement follow-up flow for weak remote evidence.
Add result reconciliation against lease, task, and session lineage.
Add failure handling for missing artifacts or expired context.
Add kill-switches and staged rollout controls.
Add remote inbox, relay, and result tests.
Add explicit operator docs for context-safe remote execution.

Important critique:

This was the most under-decomposed part of the first blueprint. Distributed context delivery is not one capability. It is a chain of ordering, serialization, transport, worker intake, result reconciliation, and rollback work.

Epic 9 operations: conflict resolution and governance

Define minimal conflict classes in the envelope contract.
Add a conflict classifier operating on normalized envelopes.
Define precedence order across system, user, policy, peer, and derived context.
Add freshness and expiry rules.
Add evidence-required overwrite rules for high-risk updates.
Add dedupe keys and tombstoning behavior.
Add event logging for conflict decisions.
Add shadow-mode merge strategy output before enforcement.
Add regression tests for semantic disagreement and stale-summary suppression.
Add docs for operator interpretation of conflict events.

Epic 10 operations: context observability

Define stable span names and event payload fields.
Map them to OpenTelemetry conventions where possible.
Add envelope, session, task, thread, agent, and node identifiers to traces.
Add sampling guidance so context-debugging spans are not dropped during rollout.
Add retrieval, handoff, compaction, and conflict dashboards or query specs.
Add correlation rules between local and remote events.
Add redaction guidance for payload-bearing spans and logs.
Add canary review queries and operator runbook snippets.

Epic 11 operations: evaluation and release gates

Define deterministic fixture families by failure mode.
Create session bleed test corpus.
Create retrieval trigger parity test corpus.
Create contradiction and corrective-retrieval test corpus.
Create handoff continuity test corpus.
Create remote relay and remote result reconciliation test corpus.
Define scorecard formats and threshold interpretation.
Add shadow-vs-enforce comparison dashboards or reports.
Add CI gating order for unit, integration, eval, and canary evidence.

Epic 12 operations: rollout, migration, and deprecation

Define dual-write and dual-read stages by surface.
Add per-surface feature flags.
Define fallback behavior when envelope parsing fails.
Define compatibility behavior for missing lineage fields.
Define rollback conditions for each major epic.
Define telemetry thresholds required to move from shadow to enforce.
Define deprecation criteria for legacy payloads.
Define archival or replay strategy for legacy stored payloads.
Add operator-facing upgrade and rollback notes.

Capability generation rules

When splitting an epic into capabilities, every capability must answer:

What user-visible or operator-visible problem does it solve?
Which code surfaces own the behavior?
What evidence proves success?
What contexts can it break if incorrectly rolled out?

When splitting a capability into tasks, every task must:

change one contract, one policy, one test surface, or one rollout control at a time,
have a rollback path,
have an observable success signal,
avoid mixing unrelated surfaces in one PR unless the change is purely mechanical.

For complex distributed or multi-surface capabilities, add one more rule:

break sequencing-sensitive work into explicit ordering, serialization, transport, intake, reconciliation, and rollback tasks rather than one “wire it up” task.

Suggested epic-to-owner map

Epic	Primary owner	Secondary owner
canonical contract	orchestrator	mcp
session identity	mcp	orchestrator
compaction	mcp	orchestrator
retrieval policy	search	orchestrator
corrective retrieval	search	mcp
search-plane unification	search	mcp
handoff integrity	orchestrator	mcp
MENs/Populi context delivery	populi	orchestrator
conflict governance	orchestrator	search
observability	cross_cutting	ops
evaluation	tests	search
rollout and deprecation	ops	cross_cutting

Sequencing rules

Order of operations

Freeze the canonical contract and session identity model.
Instrument the current lifecycle before changing behavior.
Unify retrieval policy and corrective retrieval next.
Harden handoff and remote execution once envelope semantics are stable.
Introduce conflict-resolution enforcement after observability and tests exist.
Promote from shadow to enforce only after eval metrics hold.

What must not happen

Do not deploy remote context delivery before session lineage is explicit.
Do not enforce search requirements before the retrieval policy engine is shared.
Do not merge conflicting context silently once conflict classes are available.
Do not compact aggressively without compaction lineage and recovery tests.

Target scale

The following sizing is intentionally large because the system spans multiple crates and rollout phases:

Epic count	Capabilities per epic	Tasks per capability	Estimated total tasks
12	8-12	4-10	384-1440

This is the correct scale for the program. The system already exists in partial form; the remaining work is integration, hardening, telemetry, and release engineering.

Verification posture

Each epic should include at least one of:

unit tests for adapters or policy logic,
integration tests across MCP/orchestrator/Populi seams,
deterministic eval fixtures,
telemetry review queries,
canary rollout checks.

The preferred rollout path is always:

contract added,
adapter added,
telemetry added,
shadow behavior enabled,
benchmark reviewed,
enforce only when safe.

Next document

The prioritized first implementation wave lives in:

Context management phase 1 backlog

Vox: The AI-Native Programming Language