"Telemetry retention and sensitivity SSOT"

Telemetry retention and sensitivity SSOT

Status

Roadmap: sensitivity classes below are normative for future implementation. Current TTLs are authoritative in retention-policy.yaml and db_retention.

Sensitivity classes

Class	Definition	Examples
S0	Coarse counters, version strings, bucketed timings	Aggregated benchmark names, build timing buckets
S1	Operational metadata without user content	`repository_id` labels, mesh event names, model ids
S2	Workspace-adjacent: can infer project shape	Relative paths in CI findings, repo-scoped session keys, cross-repo query metadata (see telemetry-metric-contract)
S3	Content-bearing	Chat text, prompts, tool args (full), retrieval hits, transcripts

Rule: centralized “usage telemetry” MUST stay at S0–S1 unless explicitly classified as S2 with user/org opt-in and documented re-identification risk.

Retention alignment

Today: `research_metrics`

retention-policy.yaml lists research_metrics with 365 days (days relative to created_at). Prune is operator-driven via vox db prune-plan / prune-apply.

Today: `build_run*` telemetry tables

The vox ci build-timings --deep command persists structured build telemetry in build_run plus child tables (build_crate_sample, build_warning, build_run_dependency_shape). Retention follows retention-policy.yaml:

Table	Prune rule	Notes
`build_run`	`days` / 365 / `recorded_at`	Parent run cadence aligned with benchmark retention horizon.
`build_crate_sample`, `build_warning`, `build_run_dependency_shape`	(via FK)	`ON DELETE CASCADE` from `build_run`; no separate policy rows needed.

Today: `ci_completion_*`

Completion ingest persists workspace-adjacent rows (ci_completion.rs), classified S2 (paths, fingerprints). retention-policy.yaml defines:

Table	Prune rule	Notes
`ci_completion_run`	`days` / 365 / `finished_at`	Same default horizon as `research_metrics` for comparable org-local telemetry.
`ci_completion_finding`, `ci_completion_detector_snapshot`	(via FK)	`ON DELETE CASCADE` from `ci_completion_run`; no separate policy rows.
`ci_completion_suppression`	`expires_lt_now` / `expires_at`	TTL suppressions auto-prune when `expires_at` is set and past `datetime('now')`; `expires_at` NULL stays until manual change or a future policy decision.

Policy alignment: there is no separate “manual vs automated” conflict for runs: automated prune-apply ages out old runs (and cascaded children) on the same 365-day calendar basis as research_metrics. Suppressions without expiry remain operator-visible for governance until edited or a stricter rule is adopted.

Other adjacent tables

Tables such as conversation_messages, agent_events, behavior_events, llm_interactions (see agents.rs schema) are content or behavior stores. They MUST NOT be folded into “telemetry” naming without a separate data-class chapter in telemetry-trust-ssot.

Today: `agent_exec_history`

Execution time telemetry records for agentic budgeting (exec_time_telemetry). Classified S1 (tool names, IDs, duration, costs). Retention is set to 90 days in retention-policy.yaml because budgeting models only need a recent trailing window to detect anomalies; stale execution timings become irrelevant quickly.

Orchestrator and Populi sidecars

Memory / log retention in orchestrator (for example local log retention knobs) is separate from SQL TTL; document any future alignment in this file.
Populi privacy_class on envelopes (a2a/envelope.rs) MUST be referenced when classifying mesh-visible events.

Controls linkage

Prune: contracts/db/retention-policy.yaml
Emergency / feature off: env and flags documented per subsystem (mesh telemetry, Ludus, MCP cost events) — consolidated index in env-vars

Vox: The AI-Native Programming Language