ADR 005: Socrates anti-hallucination SSOT
Status
Accepted — baseline implementation in progress.
Context
LLM surfaces (MCP chat, planning, TOESTUB review, research-style flows) each used ad hoc confidence thresholds and prompts. That caused drift (e.g. prompt “≥80%” vs client filter ≥40) and made abstention and escalation non-deterministic for agents.
Decision
- Single policy crate —
vox-socrates-policyholdsConfidencePolicy,RiskDecision, andRiskBand; all crates import it for thresholds and classification. - Orchestrator types —
vox-orchestrator::socratesdefinesEvidenceItem,ClaimRecord,ConfidenceSignal,SocratesOutcome, and optionalSocratesTaskContextonAgentTask. - Gating — Task completion may run a Socrates gate when
socrates_gate_enforceis true and the task hassocratescontext; shadow mode logs without blocking. - Persistence — Reliability and claim outcomes use Codex tables from schema V10 (
agent_reliability,claim_outcomes). - MCP — Chat/plan responses may include optional
socratestelemetry JSON.
Consequences
- New workspace member
vox-socrates-policy(minimal dependency surface). - Schema migration V10 for reputation-style metrics.
- Documentation cross-links:
AGENTS.md,docs/agents/orchestrator.md, handoff protocol, MCP reference.
Rollout
- Deploy policy crate + docs (no behavior change if gates off).
- Enable
socrates_gate_shadowin staging; inspect logs. - Enable
socrates_gate_enforcefor pilot agents/tasks with explicitSocratesTaskContext.
References
- Socrates protocol SSOT
crates/vox-socrates-policycrates/vox-orchestrator/src/socrates.rs