Agent Handoff Continuity & Context Compaction
1. Context
Evaluation of multi-agent orchestration architecture involving conversation history compaction, state sharing across agent invocations, and dynamic retrieval constraints.
2. Empirical Findings & Failure Modes
Silent Context Truncation
- Compaction surfaces (like flat files or raw buffers) that rely on arbitrary line/byte limits result in silent truncation. Foundational prompt instructions and constraints are quietly evicted.
- Fail Mode: Agents confidently output incorrect results because they lack awareness their initialization logic was dropped.
Context Bleed in Multi-Agent Handoffs
- Passing the full conversational history of Agent A into Agent B pollutes Agent B's reasoning context.
- Fail Mode: Planner agents hallucinate logic derived from the raw tool outputs of downstream worker agents.
Identity Smuggling & Infinite Loops
- Lacking cryptographically tied session boundaries (thread_id) across handoffs causes identity confusion.
- Fail Mode: Agents enter infinite cycles of output rejection ("Mirror Mirror" loop) or assume authority levels of upstream callers improperly.
Naive RAG Attention Dilution
- Hardcoding "always retrieve" policies across tool suites floods context windows with tangentially related chunks ("hard distractors"), diluting attention and burning budget.
3. Validated Architectural Adjustments
- Opaque Execution (A2A Protocol): Implement Agent-to-Agent opaque execution. Do not pass conversational transcripts across boundaries. Pass strictly scoped Task definitions, and leverage secure URI "Artifacts" for large data transmission.
- On-Behalf-Of (OBO) Token Binding: Enforce cryptographic provenance by attaching user-scoped OBO tokens and unique Thread IDs to every agent handoff.
- Unified CRAG Gateway: Strip generic RAG triggers. Deploy Corrective Retrieval-Augmented Generation (CRAG) via a lightweight evaluator model to dynamically route requests between Trust Memory, Vector Retrieval, or Web searches.
- Asynchronous Memory Distillation: Separate active turns (Short-Term Memory) from durational persistence. Dedicate an async background worker to extract semantic key-value relationships from the transcript into a Graph/Vector store, preventing silent rolling truncation.