"Unified orchestration — SSOT"

Unified orchestration — SSOT

This document captures compatibility rules and opt-in migration toggles while MCP, CLI, and DeI share one orchestrator contract (vox-orchestrator).

Workspace journey store (Codex)

Repo-backed vox-mcp and vox-orchestrator-d open the primary VoxDb via connect_workspace_journey_optional (default .vox/store.db). Env: VOX_WORKSPACE_JOURNEY_STORE, VOX_WORKSPACE_JOURNEY_FALLBACK_CANONICAL (env SSOT). Daemon diagnostics: JSON-RPC method orch.workspace_journey (bind repository_id vs discovered repo).

Bridge / routing policy: Vox-first codegen remains the default MCP path (vox_generate_code, local inference server for vox generate); non-Vox edits stay bounded behind explicit tools and repository policy — see completion policy SSOT.

Journey envelope (v1): contracts/orchestration/journey-envelope.v1.schema.json is the machine SSOT for per-request metadata (journey_id, session_id, thread_id, trace/correlation ids, repository_id, origin_surface). MCP vox_chat_message embeds this shape in structured transcript payloads; CLI and daemon surfaces wire fields incrementally.

Canonical MENS dev journey (Codex): Tables developer_journey_definitions / developer_journey_steps (baseline fragment developer_journeys) seed canonical_journey.v1.greenfield_vox_mens_devloop. MCP vox_journey_canonical_steps returns ordered step_json rows when VoxDb is attached. Human-readable limitation ids for journey maturity live in contracts/journeys/limitations.v1.yaml.

DeI planning on the daemon: JSON-line DeI methods ai.plan.new, ai.plan.replan, ai.plan.status, and ai.plan.execute are handled on the vox-orchestrator-d stdio surface (orch_daemon::dei_dispatch); docs may still say vox-dei-d as the logical stdio peer. Persistent plan rows require the same Codex VoxDb handle the orchestrator was built with.

Ownership: who writes what

ConcernEmbedded MCP (vox-mcp)vox-orchestrator-d (daemon)VoxDb / Turso
Session chat transcript (RAM)Orchestrator ContextStore in-processSame process model per ADR 022 until RPC parity
Structured chat turnschat_append_workspace_message + journey envelope v1Future orch.* parity for remote clientsconversation_messages, conversations
Legacy chat_transcripts rowsMCP chat path (dual-write)Not primary writer todaychat_transcripts
Workspace journey attach / diagnosticsconnect_workspace_journey_optional, MCP toolingJSON-RPC orch.workspace_journeyjourney + repo bind rows
Routing decisions (routing_decisions)MCP chat / codegen tools; orchestrator AiTaskProcessor when DB attachedSame table when daemon shares DBlocal-first SQLite
Unified routing experiment flagVOX_UNIFIED_ROUTING (telemetry reason shape in vox-runtime::routing_telemetry)

HITL Doubt Flow

The unified orchestrator integrates seamlessly with the vox-dei Human-In-The-Loop (HITL) crate. When agents detect ambiguity, they invoke the vox_doubt_task MCP tool. This transitions the task to TaskStatus::Doubted and emits a TaskDoubted event. The ResolutionAgent inside vox-dei then takes over to resolve the doubt with the user, submitting an audit report that hooks into the gamification system (vox-ludus). For structural details, see the canonical HITL Doubt Loop SSOT.

Contract surfaces

  • Repo reconstruction campaigns: JSON Schema contracts/orchestration/repo-reconstruction.schema.json; benchmark tiers and KPI guidance in repo reconstruction benchmark ladder. Remote task envelopes may include optional exec_lease_id and campaign_id for mesh correlation (see ADR 017).
  • Types: vox_orchestrator::contractTaskCapabilityHints, SessionContractEnvelope, OrchestrationMigrationFlags (orchestration_v2_enabled, legacy_orchestration_fallback), MCP ↔ DeI plan tool alignment (MCP_PLAN_TOOL_NAMES, DEI_PLAN_METHODS_NEW_REPLAN_STATUS).
  • Runtime config: vox_orchestrator::OrchestratorConfig — process-wide limits, Socrates gates, scaling knobs, and nested orchestration_migration (OrchestrationMigrationFlags). Loaded from Vox.toml [orchestrator] and VOX_ORCHESTRATOR_* env overrides via OrchestratorConfig::merge_env_overrides in crates/vox-orchestrator/src/config/.

Agent queue capabilities (TaskCapabilityHints)

On Orchestrator::spawn_agent, each new AgentQueue gets capabilities from merge_agent_capabilities (crates/vox-orchestrator/src/capability_probe.rs):

  1. Start from default_agent_capabilities in config / TOML.
  2. Overlay host probe via probe_host_capabilities: cpu_cores (from available_parallelism), arch (std::env::consts::ARCH), hostname (HOSTNAME / COMPUTERNAME, or sysinfo when built with system-metrics).
  3. Labels: config labels preserved first; probe-supplied labels appended without duplicates.
  4. GPU / NPU flags: operator config wins if already true; otherwise probe may set gpu_cuda when VOX_MESH_ADVERTISE_GPU=1|true (legacy workstation advertisement), or gpu_vulkan / gpu_webgpu / npu from the matching VOX_MESH_ADVERTISE_* vars (not driver probes). Optional VOX_MESH_DEVICE_CLASS fills device_class. See mobile / edge AI SSOT.
  5. min_vram_mb / min_cpu_cores: filled from probe only when unset in config.

Routing reads capability_requirements on tasks and applies GPU / VRAM / min_cpu_cores / prefer_gpu_compute soft penalties in crates/vox-orchestrator/src/services/routing.rs (mens / Mens-style training hints).

When MCP polls GET /v1/populi/nodes, each row becomes a RemotePopuliRoutingHint: if last_seen_unix_ms is older than orchestrator stale_threshold_ms at poll time, heartbeat_stale is set and experimental Populi routing signals skip that node (maintenance / quarantine were already excluded).

Optional VOX_ORCHESTRATOR_MESH_EXEC_LEASE_RECONCILE: same poll tick may call GET /v1/populi/exec/leases and compare each holder_node_id to the fresh node list (tracing target vox.mcp.populi_reconcile; Codex event mesh_exec_lease_reconcile when VOX_MESH_CODEX_TELEMETRY). Opt-in VOX_ORCHESTRATOR_MESH_EXEC_LEASE_AUTO_REVOKE performs POST /v1/populi/admin/exec-lease/revoke on mismatches (mesh/admin token; aggressive — see env SSOT).

See also mens SSOT for VOX_MESH_* and local registry.

Mesh distribution vs single-process embedding

  • Embedding: Each vox-mcp (or vox dei CLI) process constructs an in-memory Orchestrator. That is “single-process gravity” for RAM-local queues and locks.
  • Distribution: With VOX_MESH_ENABLED, durable coordination (locks, oplog mirror, A2A inboxes, heartbeats) is backed by Turso so another MCP or laptop can participate in the same logical mesh. Two nodes = two orchestrator instances sharing one cross-node SSOT via the DB and HTTP A2A relay — not one magic cluster master in RAM.
  • Bootstrap SSOT: build_repo_scoped_orchestrator and build_repo_scoped_orchestrator_for_repository are the shared factory for MCP, CLI, and other embedders so repository id, affinity groups, and memory shard paths stay aligned.

For table-level detail and conflict rules, see Mens coordination.

A2A delivery planes

The orchestrator intentionally uses more than one delivery plane; these are not interchangeable transports with hidden semantics.

Canonical planeCurrent wire token(s)GuaranteesUse for
local_ephemeralMCP route=localin-process only, best-effort per-receiver FIFO, restart-volatilelow-latency same-node agent coordination
local_durableMCP route=dbdurable row storage, explicit durable ack/poll semanticscross-process local inboxes and persistence-friendly retries
remote_meshMCP route=mesh, Populi HTTP A2AHTTP relay with bearer/JWT auth, explicit inbox lease + ack, client-supplied idempotencycross-node messaging and remote task envelopes
broadcastlocal bus broadcast, bulletin/event fanoutreceiver-local ordering only, no shared durable semanticsfanout notifications
streamDeI JSON lines, vox-orchestrator-d orch.* JSON lines/TCP, MCP WS gateway, SSE, OpenClaw WSordered per connection/byte stream, reconnect semantics vary by transportincremental output and live updates

Machine-readable source of truth for these names lives in contracts/communication/protocol-catalog.yaml. MCP A2A responses surface the canonical plane names in addition to legacy wire tokens so callers can migrate without breaking compatibility.

Environment and config

OrchestratorConfigVOX_ORCHESTRATOR_*

Boolean fields use Rust bool parsing (true / false only). Invalid values log a warning and leave the current setting unchanged.

VariableMaps to
VOX_ORCHESTRATOR_ENABLEDenabled
VOX_ORCHESTRATOR_MAX_AGENTSmax_agents
VOX_ORCHESTRATOR_LOCK_TIMEOUT_MSlock_timeout_ms
VOX_ORCHESTRATOR_TOESTUB_GATEtoestub_gate
VOX_ORCHESTRATOR_MAX_DEBUG_ITERATIONSmax_debug_iterations
VOX_ORCHESTRATOR_SOCRATES_GATE_SHADOWsocrates_gate_shadow
VOX_ORCHESTRATOR_SOCRATES_GATE_ENFORCEsocrates_gate_enforce
VOX_ORCHESTRATOR_SOCRATES_REPUTATION_ROUTINGsocrates_reputation_routing
VOX_ORCHESTRATOR_SOCRATES_REPUTATION_WEIGHTsocrates_reputation_weight
VOX_ORCHESTRATOR_TRUST_GATE_RELAX_ENABLEDtrust_gate_relax_enabled — when true and Codex agent_reliability for the agent is ≥ trust_gate_relax_min_reliability, Socrates enforce, completion grounding enforce, and strict scope may skip completion requeue / enqueue denial (see PolicyTrustRelax).
VOX_ORCHESTRATOR_TRUST_GATE_RELAX_MIN_RELIABILITYtrust_gate_relax_min_reliability — minimum reliability (default 0.85, aligned with trust auto-approve floor).
VOX_ORCHESTRATOR_ATTENTION_ENABLED / VOX_ORCHESTRATOR_ATTENTION_BUDGET_MS / VOX_ORCHESTRATOR_ATTENTION_ALERT_THRESHOLD / VOX_ORCHESTRATOR_ATTENTION_INTERRUPT_COST_MS / VOX_ORCHESTRATOR_ATTENTION_TRUST_ROUTING_WEIGHTPilot attention budget + dynamic interruption gating (see information-theoretic-questioning.md, env-vars.md). Vox.toml also supports [orchestrator].interruption_calibration for per-channel gain offsets and backlog/trust calibration.
VOX_ORCHESTRATOR_LOG_LEVELlog_level (raw string)
VOX_ORCHESTRATOR_FALLBACK_SINGLEfallback_to_single_agent
VOX_ORCHESTRATOR_MIN_AGENTSmin_agents
VOX_ORCHESTRATOR_SCALING_THRESHOLDscaling_threshold
VOX_ORCHESTRATOR_IDLE_RETIREMENT_MSidle_retirement_ms
VOX_ORCHESTRATOR_SCALING_ENABLEDscaling_enabled
VOX_ORCHESTRATOR_COST_PREFERENCEcost_preference (performance | economy)
VOX_ORCHESTRATOR_SCALING_LOOKBACKscaling_lookback_ticks
VOX_ORCHESTRATOR_RESOURCE_WEIGHTresource_weight
VOX_ORCHESTRATOR_RESOURCE_CPU_MULTresource_cpu_multiplier
VOX_ORCHESTRATOR_RESOURCE_MEM_MULTresource_mem_multiplier
VOX_ORCHESTRATOR_RESOURCE_EXPONENTresource_exponent
VOX_ORCHESTRATOR_SCALING_PROFILEscaling_profile (conservative | balanced | aggressive)
VOX_ORCHESTRATOR_MAX_SPAWN_PER_TICKmax_spawn_per_tick
VOX_ORCHESTRATOR_SCALING_COOLDOWN_MSscaling_cooldown_ms
VOX_ORCHESTRATOR_URGENT_REBALANCE_THRESHOLDurgent_rebalance_threshold
VOX_ORCHESTRATOR_MIGRATION_V2_ENABLEDorchestration_migration.orchestration_v2_enabled
VOX_ORCHESTRATOR_MIGRATION_LEGACY_FALLBACKorchestration_migration.legacy_orchestration_fallback
VOX_ORCHESTRATOR_MESH_CONTROL_URLpopuli_control_url — HTTP base for GET /v1/populi/nodes (read-only); MCP vox_orchestrator_status includes mesh_snapshot JSON when set. Uses VOX_MESH_TOKEN on the client when present. Does not change task routing.
VOX_ORCHESTRATOR_MESH_REMOTE_EXECUTE_EXPERIMENTALpopuli_remote_execute_experimental (TOML alias: mesh_remote_execute_experimental) — enables staged rollout for remote task-envelope dispatch over populi A2A relay (with local fallback).
VOX_ORCHESTRATOR_MESH_REMOTE_LEASE_GATING_ENABLEDpopuli_remote_lease_gating_enabled (TOML: mesh_remote_lease_gating_enabled) — when true with matching roles, relay is awaited before local enqueue; success puts the task in remote-hold (single owner, no local dequeue). Relay failure deterministically falls back to local queue only (no fire-and-forget duplicate relay).
VOX_ORCHESTRATOR_MESH_REMOTE_LEASE_GATED_ROLESpopuli_remote_lease_gated_roles — comma-separated planner, builder, verifier, reproducer, researcher (case-insensitive). Empty list means no task matches gating.
VOX_ORCHESTRATOR_MESH_REMOTE_RESULT_POLL_INTERVAL_SECSpopuli_remote_result_poll_interval_secs (TOML alias: mesh_remote_result_poll_interval_secs) — remote_task_result inbox poll interval in seconds; 0 disables. Implemented in vox_orchestrator::a2a::spawn_populi_remote_result_poller (MCP and other embedders pass a join slot).
VOX_ORCHESTRATOR_MESH_REMOTE_WORKER_POLL_INTERVAL_SECSpopuli_remote_worker_poll_interval_secs (TOML alias: mesh_remote_worker_poll_interval_secs) — remote_task_envelope worker poll interval in seconds; 0 disables remote worker consumption while keeping result polling optional. Implemented in vox_orchestrator::a2a::spawn_populi_remote_worker_poller.
VOX_ORCHESTRATOR_MESH_REMOTE_RESULT_MAX_MESSAGES_PER_POLLpopuli_remote_result_max_messages_per_pollper-page size when draining the parent mesh inbox for remote_task_result rows (minimum 1; default 64). The poller walks cursor pages (before_message_id, newest-first) up to a fixed cap so deep inboxes do not hide older results behind unrelated A2A mail.

Populi client helpers now expose typed HTTP status errors (PopuliRegistryError::HttpStatus) and non-claimer inbox cursor paging (before_message_id, plus A2AInboxPager), so orchestrator fallback logic can branch on status codes (403/404/409) without brittle string matching.

Placement and lease observability (roadmap contract)

Phase 5 (scheduler unification) targets decision reason codes and structured fields so operators can audit why a task ran locally, on a lease-held remote worker, or on a cloud dispatch surface. Until code catches up, rely on the experimental toggles in the table above and on mens SSOT.

Documentation contract for eventual stable instrumentation (field names may differ slightly in Rust, but the concepts are stable):

Field / conceptPurpose
task_idCorrelate orchestrator task lifecycle across logs and traces.
lease_idCorrelate remote execution with Populi lease records when ADR 017 semantics are implemented.
placement_reasonMachine-readable code for the selected execution surface (local vs lease-remote vs cloud dispatch).
populi_node_id / claimer_node_idMesh identity for inbox claims and execution attribution where applicable.

Current stable placement_reason codes:

  • local_queue_default
  • populi_remote_lease_hold
  • local_queue_fallback_after_remote_relay_error

Rollout and kill switches: Populi remote execution rollout checklist. Work-type boundaries: placement policy matrix.

Other CLI / data plane

Canonical descriptions for VOX_BENCHMARK_TELEMETRY / VOX_SYNTAX_K_TELEMETRY (and related Codex row shapes) live in env-vars.md. Trust boundaries for optional telemetry: telemetry-trust-ssot.

VariablePurpose
VOX_BENCHMARK_TELEMETRYWhen 1 / true, CLI benchmark entry points append benchmark_event rows via VoxDb::record_benchmark_event.
VOX_SYNTAX_K_TELEMETRYWhen 1 / true, syntax-K benchmark classes append syntax_k_event rows via VoxDb::record_syntax_k_event (session syntaxk:<repository_id>). If unset, falls back to VOX_BENCHMARK_TELEMETRY.
VOX_WORKFLOW_JOURNAL_CODEX_OFFWhen 1 / true, skip Codex append for interpreted workflow journal rows. By default, when DB config resolves after vox workflow run / vox mens workflow run ( workflow-runtime ), Vox appends versioned workflow journal rows via VoxDb::record_workflow_journal_entry (session workflow:<repository_id>, metric workflow_journal_entry). Rows can include lifecycle events, retry events (ActivityAttemptRecovered, ActivityAttemptFailed, ActivityRetryScheduled), replay events, and per-step payloads (for example MeshActivity / MeshActivitySkipped) keyed by durable run_id + activity_id semantics described in durable execution.
VOX_MESH_MAX_STALE_MSClient-side filter for mens node lists in MCP snapshots (see mens SSOT).
VOX_MESH_CODEX_TELEMETRYWhen 1 / true, append populi_control_event rows via VoxDb::record_populi_control_event (session mens:<repository_id>): after vox run local registry publish when the CLI was built with populi (includes vox-populi), after vox-mcp startup publish when mens is enabled, and after MCP vox_orchestrator_status mens HTTP snapshot when Codex is connected. Implementation: vox_db::populi_registry_telemetry. Never stores VOX_MESH_TOKEN.
VOX_MCP_LLM_COST_EVENTSOptional override for MCP LLM CostIncurred bus events vs Codex-only accounting; see vox-mcp.md.
VOX_REPOSITORY_ROOTOptional directory for repository_id discovery in benchmark telemetry (and other CLI paths that adopt the same pattern); align with MCP’s discovered repo root when subprocess CWD differs.

TOML: under [orchestrator], set orchestration_migration = { orchestration_v2_enabled = true, … } (field names match OrchestrationMigrationFlags in crates/vox-orchestrator/src/contract.rs). When v2 is enabled, MCP vox_submit_task success JSON may include orchestration_contract { "v2" as a client hint.

Optional [mens] in Vox.toml merges mens scope/URL/labels for CLI and MCP (see mens SSOT); env wins per field when set.

Effective Socrates thresholds still merge from vox-socrates-policy with optional overrides in OrchestratorConfig::socrates_policy — no literal drift outside the policy crate + merge logic.

Deprecation / compatibility matrix (current)

SurfaceRule
MCP tool namesAdd aliases before removing names; vox_plan, vox_replan, vox_plan_status stay stable.
DeI RPC idsai.plan.* method strings unchanged (vox_cli::dei_daemon::method).
Orchestrator daemon RPC idsorch.* method strings are versioned in vox_protocol::orch_daemon_method; contract schema contracts/orchestration/orch-daemon-rpc-methods.schema.json.
File sessions + CodexBoth remain valid; MCP SessionManager uses with_db when Codex is attached.
vox dbRemains implementation SSOT; vox scientia is a documented facade only.