Research Notes: Achieving Serverless-like Performance with MCP
Context
The goal is to analyze what can be learned from connectionless or "serverless" paradigms like UCP (Universal Commerce Protocol or conceptually connectionless protocols like UDP) -> enhance the Model Context Protocol (MCP) in Vox. We want to decrease overhead and improve performance while maintaining the power and compatibility of the existing MCP standard.
Findings & Enhancements for MCP
1. In-Memory Short-Circuiting (Fast Path)
Native Vox tools (like read_file or write_file) should completely bypass standard MCP JSON-RPC over stdio when called from an internal agent.
- How to apply: Implement a
NativeToolRegistrythat handles native file-system tool requests synchronously and in-process. This removes serialization, pipe overhead, and latency constraints.
2. Prompt Caching & Schema LRU
MCP often suffers from redundant schema transmissions during tool initialization.
- How to apply: Use an LRU
SchemaCacheto avoid re-serializing and re-sending tool descriptions on every request. Implement Anthropic'scache_controlheaders so schemas are only parsed once per session by the LLM Provider.
3. Serverless Invocation & Streamable HTTP
To eliminate persistent server costs and avoid idle CPU overhead, MCP servers can be natively scaled down to zero.
- How to apply: Follow the SSE (Server-Sent Events) or HTTP chunked-encoding model. Instead of a long-lived process, tools can be triggered via HTTP routes or lambda-like handlers (e.g.
awslabs/mcp).
4. Dynamic Context & "Pull" vs "Push"
MCP typically pushes context proactively. Serverless patterns prefer pulling only what is immediately required.
- How to apply: Resources and templates in MCP should return lightweight URIs or pagination cursors first, streaming the bulk payload only when requested.
Implementation Task Plan
The following tasks are broken down with roughly equal difficulty to advance our infrastructure and optimizations natively.
-
Task 1: Complete the SchemaCache Implementation
- Ensure the
vox-mcpcrate caches all tool JSON schemas with LRU eviction. - Implement and verify the
prompt_cachingformatting for Anthropic / OpenAI.
- Ensure the
-
Task 2: Native Tool Short-Circuit
- In
vox-mcp, handle file tools (read_file,write_file) in-process for orchestrator agents without initiating a subprocess. - Enable and pass integration tests for
test_native_read_file_short_circuit.
- In
-
Task 3: Implement A2A (Agent-To-Agent) Connectionless Handoff
- Implement lightweight context handoff in the
vox-mcpcrate instead of routing through full prompt evaluation. - Minimize JSON payload size by transmitting diffs or delta states between agents.
- Implement lightweight context handoff in the
-
Task 4: Setup Compiler-Driven Data Extraction (CI/CD)
- Add logic to the
vox checkcommand to emit training data JSONL. - Prepare a script to generate instruction-code pairs for model sync.
- Add logic to the
-
Task 5: Refine
check_search_indexinvox-typeck- Implement the missing type-checking blocks for
SearchIndexDeclto ensure database stability.
- Implement the missing type-checking blocks for