"ADR 005: Socrates anti-hallucination SSOT"

ADR 005: Socrates anti-hallucination SSOT

Status

Accepted — baseline implementation in progress.

LLM surfaces (MCP chat, planning, TOESTUB review, research-style flows) each used ad hoc confidence thresholds and prompts. That caused drift (e.g. prompt “≥80%” vs client filter ≥40) and made abstention and escalation non-deterministic for agents.

Decision

Single policy crate — vox-socrates-policy holds ConfidencePolicy, RiskDecision, and RiskBand; all crates import it for thresholds and classification.
Orchestrator types — vox-orchestrator::socrates defines EvidenceItem, ClaimRecord, ConfidenceSignal, SocratesOutcome, and optional SocratesTaskContext on AgentTask.
Gating — Task completion may run a Socrates gate when socrates_gate_enforce is true and the task has socrates context; shadow mode logs without blocking.
Persistence — Reliability and claim outcomes use Codex tables from schema V10 (agent_reliability, claim_outcomes).
MCP — Chat/plan responses may include optional socrates telemetry JSON.

Consequences

New workspace member vox-socrates-policy (minimal dependency surface).
Schema migration V10 for reputation-style metrics.
Documentation cross-links: AGENTS.md, docs/agents/orchestrator.md, handoff protocol, MCP reference.

Rollout

Deploy policy crate + docs (no behavior change if gates off).
Enable socrates_gate_shadow in staging; inspect logs.
Enable socrates_gate_enforce for pilot agents/tasks with explicit SocratesTaskContext.

References

Socrates protocol SSOT
crates/vox-socrates-policy
crates/vox-orchestrator/src/socrates.rs

Vox: The AI-Native Programming Language