External repositories & workspace SSOT
Single source of truth for repository identity, layout-derived affinity, and tenant-scoped on-disk paths. Applies to the Vox monorepo and arbitrary Git checkouts.
Invariants
- Repository root — Prefer the Git work tree root (ancestor with
.git). If there is no Git checkout, fall back to the canonicalized starting path (typically process CWD or a client override). repository_id— Stable 16-hex string:blake3(origin_url + NUL + canonical_root_path)whenremote.origin.urlis readable from.git/config; otherwiseblake3(canonical_root_path)only.- Tool CWD — Git MCP tools use
current_dir= Git work tree (or repository root). Cargo MCP tools usecurrent_dir= repository root and return a structured error when the root is not a Cargo package/workspace. - Affinity groups — If
repo_root/Vox.tomlcontains a non-emptyaffinity_groupsarray,load_from_configbuilds the registry from explicitname+patterns(glob strings). OtherwiseAffinityGroupRegistry::detect_from_repository_layout(invox-orchestrator) prefers, in order:- Cargo
[workspace].members(including simplecrates/*expansion), - Node
package.jsonworkspaces(incl. Yarn object form) andpnpm-workspace.yamlpackages(glob expansion to dirs withpackage.json), - Python root (
pyproject.toml/setup.py), - Go root (
go.mod), crates/directory scan,- single catch-all
**/*.
- Cargo
- Orchestrator memory —
vox-mcpshards file-backed memory underrepo_root/.vox/cache/repos/<repository_id>/memory/(andMEMORY.mdbeside it) so concurrent opens of different repos do not share the same relative./memorytree. - CLI benchmark telemetry vs MCP — Opt-in Codex rows use
bench:<repository_id>(seeVoxDb::record_benchmark_event). Subprocesses spawned with a different CWD than the IDE/MCP server should setVOX_REPOSITORY_ROOTto the same logical repo root MCP discovered sorepository_id(and thus session keys) stay aligned. - Sessions — JSONL sessions default to
.sessions/<repository_id>/when using MCPServerState::new;SessionConfig.repository_idis set so dual-written Codexagent_sessions.task_snapshotJSON includes the same tenant id. - Codex / Turso rows — Repo-scoped filesystem paths use
repository_id; optional future migrations may add arepository_idcolumn (or composite keys) on Codex tables per ADR 004 — not required for MCP memory/session sharding above. - Agent scopes —
.vox/agents/{name}.mdscope:lists are parsed byvox_repository::load_agent_scopes; task paths are checked withnormalize_task_path. - Cross-repo working set — Explicit polyrepo manifests live at
repo_root/.vox/repositories.yaml; Vox does not ambient-scan the whole machine for unrelated clones. - Cross-repo refresh cache — Re-resolved catalog snapshots and related metadata live under
repo_root/.vox/cache/repos/<repository_id>/.
MCP tools
| Tool | Behavior |
|---|---|
vox_git_* | current_dir = Git root (see git_tools::git_cwd); subprocesses use tokio::process from the async tool dispatcher. |
vox_validate_file, vox_run_tests, vox_check_workspace, vox_test_all, vox_build_crate, vox_lint_crate, vox_coverage_report | current_dir = repository root when invoking cargo; tokio::process + tokio::fs for validate. vox_lint_crate runs TOESTUB via tokio::task::spawn_blocking after clippy. |
vox_repo_index_status / vox_repo_index_refresh | Bounded walk of repository.root; optional JSON cache under .vox/cache/repos/<repository_id>/repo_index.json. |
Config
VoxConfig::load_from_repo_root(vox-config) — Appliesrepo_root/Vox.tomlbefore CWDVox.toml, then env. Use when loading settings from a discovered repository root.- Cross-repo catalog manifest —
.vox/repositories.yamlis the local-first workspace manifest for cataloged repositories. It may include local roots plus remote adapter descriptors (remote_mcp,remote_git_host,remote_search_service) without weakening single-repo path safety.
Crates
Policy: New code that needs Git root, repository_id, workspace layout, or agent scope parsing must depend on vox-repository (and vox-config for Vox.toml), not ad-hoc std::env::current_dir + manual walks in vox-cli or other crates.
| Crate | Role |
|---|---|
vox-repository | discover_repository, RepositoryContext (has_vox_agents_dir, vox_toml), RepoCapabilities, layout helpers (cargo_workspace_member_dirs, node_workspace_packages, python_roots, go_roots), load_agent_scopes, normalize_task_path. |
vox-orchestrator | load_from_config / AffinityGroupRegistry::detect_from_repository_layout, sessions, memory config consumed by MCP. |
vox-mcp | ServerState::repository, git/compiler/task/repo_index wiring. Included in the root workspace (cargo check --workspace / CI). |
Cross-repo catalog
Use the repo catalog when you want one operator workflow to query several repositories without rebinding the MCP server root.
Current policy:
- catalog membership is explicit
- each local entry resolves into its own
RepositoryContext - remote entries are adapter metadata first, query backends later
- cross-repo paths stay per-repository; there is no shared global path namespace
See also: Cross-repo querying and observability.
Related
orchestration-unified.md— MCP/DeI plan alignment, migration flags, benchmark telemetry env.mens.md—VOX_MESH_*contract, local registry, HTTP control plane.- ADR 004 (
docs/src/adr/004-codex-arca-turso.md) — Codex env and Turso. AGENTS.md§2.2.2 — short agent-oriented summary.