Vox Scientia Gap Analysis (April 2026)
[!IMPORTANT] This document is a research artifact written to
docs/src/architecture/scientia-gap-analysis-2026.mdper the project's AGENTS.md policy. It identifies 45 concrete problems across all stages of the Scientia lifecycle with proposed solutions and a recommended execution wave order.
Dimension 1 — Inbound Research Discovery
Problem 1: The "inbound" pipeline exists only in a research doc
Status: scientia-external-discovery-research-2026.md describes a Collector → Evaluator → Synthesizer multi-agent inbound stack, but no crate, no schema, no CLI command, and no DB table has been created for it.
Impact: Scientia is entirely outbound. It can package discoveries but cannot autonomously surface new ones from external literature. Without the inbound stack, "making discoveries externally" requires fully manual effort.
Solution: Implement the inbound pipeline in three slices:
- Add
crates/vox-scientia-ingest/as a new crate withInboundItem,FeedSource, andIngestSessionstructs. - Add
scientia_external_intelligenceDB table underpublish_cloud. - Expose
vox scientia ingest-feedsCLI andvox_scientia_ingest_feedsMCP tool.
Owner crates: vox-scientia-ingest (new), vox-db, vox-cli, vox-mcp | Severity: Critical | Effort: Large
Problem 2: No RSS/Atom feed parsing crate is wired
Status: The research doc recommends feed-rs, but there is no Cargo.toml dependency and no source code consuming feeds.
Solution:
- Add
feed-rs = "1.3"dependency. - Implement
FeedCrawler::crawl_all(sources: &[FeedSource]) -> Vec<InboundItem>. - Persist source registry in
scientia_feed_sourcestable keyed by URL +last_crawled_at_ms.
Severity: High | Effort: Small
Problem 3: No Reddit/HN inbound read path exists (only outbound)
Status: vox-publisher/src/adapters/reddit.rs handles outbound submission. The research doc proposes inverting this for read-only monitoring, but no implementation exists.
Solution:
- Add
RedditInboundClientbehindscientia-inbound-redditfeature flag. - Use existing
refresh_access_tokenmachinery (read-only scope). - Gate on
VOX_SCIENTIA_REDDIT_INBOUND=1via Clavis.
Severity: Medium | Effort: Medium
Problem 4: No Socrates inbound policy profile — only outbound preflight profiles
Status: PreflightProfile variants (DoubleBlind, MetadataComplete, ArxivAssist) evaluate outgoing manifests. The research doc specifies a NewsInbound profile that doesn't exist in publication_preflight.rs.
Impact: Any inbound external article would bypass the quality gate entirely. Noise and "slop" would enter the discovery corpus unchecked.
Solution:
- Add
PreflightProfile::NewsInboundvariant checking:requires_code_repo_link,requires_reproducible_benchmark,maximum_opinion_ratio. - Apply
ComplexityJudgefromvox-socrates-policyon inbound article text. - High-contradiction items go to
Quarantinestate inscientia_external_intelligence.status.
Owner: vox-publisher, vox-socrates-policy | Severity: Critical | Effort: Medium
Problem 5: No semantic deduplication before inbound insert
Status: memory_hybrid.rs does BM25 + vector retrieval, but there is no pre-insert duplicate-detection call for the inbound pipeline. The research doc specifies a similarity > 0.9 guard that is unimplemented.
Impact: The same arXiv preprint reported by multiple sources will be inserted three times, bloating the corpus with redundant signal.
Solution:
- Add
IngestDeduplicator::is_duplicate(embedding: &[f32], threshold: f64) -> boolquerying the SQLite embeddings table before insert. - On duplicate, append the source URL to the existing document's
provenance_json. - Threshold pinned in
scientia_heuristics.rs(not a magic constant).
Severity: Medium | Effort: Small
Problem 6: No scientia_external_intelligence DB table or migration
Status: The research doc identifies this table but it does not exist in publish_cloud.rs.
Solution: Add additive migration:
CREATE TABLE IF NOT EXISTS scientia_external_intelligence (
id TEXT PRIMARY KEY,
source_url TEXT NOT NULL,
source_kind TEXT NOT NULL, -- 'rss', 'reddit', 'hn', 'arxiv'
title TEXT NOT NULL,
abstract_text TEXT,
embedding_id TEXT,
provenance_json TEXT DEFAULT '[]',
ingest_status TEXT NOT NULL DEFAULT 'pending',
preflight_score REAL,
ingested_at_ms INTEGER NOT NULL,
reviewed_at_ms INTEGER
);
Owner: vox-db | Severity: Critical | Effort: Small
Problem 7: Inbound Scholarly Digest has no synthesis loop contract
Status: The research doc specifies a Collector → Evaluator → Synthesizer multi-agent flow, but the Synthesizer has no design contract in code or contracts directory.
Solution:
- Add
contracts/scientia/scholarly-digest.v1.schema.jsonspecifying the digest output structure (cluster, delta summary, impact assessment). - Add
vox scientia digest-generateCLI to drive the A2A multi-agent synthesis flow. - Use
Tier 1(local model) for initial categorization; escalateComplexityBand::ComplextoTier 2.
Severity: High | Effort: Medium
Problem 8: No persistent registry of external intelligence sources
Status: Feed URLs have no registry table. Sources would be hardcoded or passed per-invocation.
Solution:
- Add
scientia_feed_sourcestable:(id, url, source_kind, crawl_interval_ms, enabled, last_crawled_at_ms, last_error). - Add
vox scientia feed-source-add/feed-source-list/feed-source-disablecommands.
Severity: Medium | Effort: Small
Dimension 2 — RAG-to-Scientia Feedback Loop
Problem 9: Scientia publications never re-enter the search corpora
Status: After a successful publication, the manifest and evidence pack are stored in publish_cloud tables but are never indexed into vox-search corpora.
Impact: The system cannot search its own published discoveries. This is a fundamental closed-loop failure.
Solution:
- Add
PostPublishIndexerstep inpostPublishAudit. - On
publication_status = 'published', embed manifest title + abstract + evidence metadata intoDocumentChunkscorpus withsource_kind = 'scientia_publication'. - Tag chunk with manifest digest for retrieval attribution.
Owner: vox-publisher, vox-search | Severity: Critical | Effort: Medium
Problem 10: Evidence packs are not linked into the knowledge graph
Status: metadata_json.scientia_evidence is stored per-manifest but never inserted into the KnowledgeGraph SQLite tables.
Impact: Multi-hop queries like "what findings relate to our GRPO reward shaping work?" cannot traverse from publication to its evidence chain.
Solution:
- Add
EvidencePackKGIndexerinserting typed nodes and edges:- Node:
Publication(id, title, pub_date) - Node:
BenchmarkRun(run_id, result_summary) - Edge:
has_evidence(publication_id → benchmark_run_id) - Edge:
cites_doc(publication_id → doc_path)
- Node:
Severity: Medium | Effort: Medium
Problem 11: Socrates Abstain events are not persisted for analysis or training
Status: The RAG SSOT §8 explicitly identifies "Hallucination events → Not persisted" as a gap.
Impact: We cannot detect patterns in what Scientia fails to answer. min_training_pair_confidence = 0.75 floor is defined but high-confidence Abstain events are lost.
Solution:
- Add
socrates_abstain_eventsArca table:(id, query_hash, confidence, contradiction_ratio, risk_decision, suggested_query, timestamp). - Persist on every
Abstainoutcome from the research path. - Include abstain rate and top abstain queries in
vox telemetry search-quality-report.
Owner: vox-db, vox-socrates-policy | Severity: High | Effort: Small
Problem 12: CRAG loop fires and fetches web evidence that is never persisted
Status: The CRAG loop in bundle.rs fetches Tavily results and re-runs RRF fusion. However, there is no mechanism to persist the corrected retrieval result.
Impact: The same low-quality query will trigger Tavily again on the next execution — burning credits and adding latency — because the new evidence was never stored.
Solution:
- After CRAG correction (evidence_quality improved above threshold), store Tavily-retrieved content into
DocumentChunkscorpus withsource_kind = 'crag_web_result'and a 7-day TTL.
Severity: High | Effort: Small
Problem 13: No awareness of in-progress Scientia findings in the RAG pipeline
Status: When an agent query matches a topic that Scientia has already identified as a StrongCandidate discovery, the RAG pipeline has no way to surface this.
Solution:
- Add
FindingsDraftCorpusas a new optionalSearchCorpusvariant backed bypublication_manifestswherestatus = 'draft' AND discovery_tier = 'strong_candidate'. - Activate when
SearchIntent::Researchand query relevance exceeds threshold. - Gate with
VOX_SEARCH_FINDINGS_DRAFT=1.
Severity: Medium | Effort: Medium
Dimension 3 — Internal Scientific Discovery Mechanisms
Problem 14: Discovery ranking constants are hardcoded in Rust
Status: scientia_discovery.rs calls ScientiaHeuristics::default() with embedded numeric constants. The impact-readership research doc explicitly identifies this as architectural debt.
Impact: Tuning discovery sensitivity requires a code change and recompile.
Solution:
- Load heuristics from
contracts/scientia/scientia-discovery-heuristics.v1.yaml. - Implement
ScientiaHeuristics::from_yaml(path: &Path) -> Result<Self>.
Owner: vox-publisher, vox-scientia-core | Severity: High | Effort: Small
Problem 15: Signal catalog (discovery_signals) has no formal schema contract
Status: Signal codes like eval_gate_passed, human_advance_attested are string literals without a machine-checkable registry.
Impact: A typo in a signal code silently produces an Informational signal instead of Strong.
Solution:
- Add
contracts/scientia/discovery-signal-codes.v1.yamlenumerating all valid codes with their strength level. - Add
vox ci scientia-signal-codesCI check. - Consider
SignalCodeenum generated from the YAML at build time.
Severity: Medium | Effort: Small
Problem 16: No multi-hop hypothesis chain generation
Status: scientia_prior_art.rs checks overlap and scientia_finding_ledger.rs scores novelty, but there is no mechanism to chain multiple findings into a composite hypothesis.
Solution:
- Design
HypothesisChainBuilderinvox-scientia-core:- Fetch
StrongCandidatemanifests. - Query KnowledgeGraph for shared evidence nodes.
- Use MENS Lane G or Tier 2 model to propose hypothesis chains.
- Return
HypothesisCandidatestructs with attribution map.
- Fetch
- Add
vox scientia hypothesis-scanCLI. - Gate as
human_approval_required = trueper automation boundary matrix.
Severity: High | Effort: Large
Problem 17: No experimental design scaffolding
Status: Once a hypothesis is identified, there is no tooling to scaffold a research experiment (define metrics, set baseline run, configure eval gate).
Solution:
- Add
vox scientia experiment-scaffold --hypothesis-id <id>which:- Creates a draft manifest pre-filled with the hypothesis.
- Emits a
scientia_evidencetemplate with placeholder eval gate and benchmark block. - Generates a checklist of evidence needed to reach
AutoDraftEligible.
- All generated content marked
machine_suggested = true.
Severity: Medium | Effort: Medium
Problem 18: prior_art_max_lexical_overlap and prior_art_max_semantic_overlap are always None
Status: In scientia_discovery.rs lines 289-291, both overlap fields are hardcoded to None in rank_candidate(). They are only populated by a separately-called merge_novelty_overlap_into_rank().
Impact: Any ranking performed without the explicit merge call returns None for novelty overlap, making the rank appear to have perfect novelty when it may not.
Solution:
- Rename
rank_candidate()→rank_candidate_without_novelty(). - Add
rank_candidate_with_novelty(…, novelty_bundle: Option<&NoveltyEvidenceBundleV1>)that internally merges. - Update all callers (CLI, MCP, scan paths).
Owner: vox-publisher | Severity: High | Effort: Small
Problem 19: evidence_completeness_score counts 11 binary signals with equal weight
Status: All 11 evidence signals contribute 1 point each. human_meaningful_advance = true weighs the same as !doc_section_hints.is_empty().
Impact: Completeness scores are misleading. The submission_readiness_score KPI is contaminated.
Solution:
- Load per-signal weights from the heuristics YAML (Problem 14).
human_meaningful_advanceandeval_gate_passedshould weigh 3×; doc hints 1×.
Severity: Medium | Effort: Small
Problem 20: No contamination risk detection for internal eval corpora
Status: The worthiness unification research doc identifies contamination_risk_flag as a candidate signal. No implementation exists.
Impact: An internal benchmark may be inflated due to training data overlapping with the eval set — a form of benchmark leakage that Scientia has no detector for.
Solution:
- Add
ContaminationRiskAssessor::assess(eval_corpus_id, training_corpus_ids) -> ContaminationRiskinvox-scientia-core. - Use n-gram overlap as a first-pass detector.
- Emit
contamination_risk_flaginworthiness_signals.v2withsoft_gateclassification.
Severity: Medium | Effort: Medium
Problem 21: MENS Lane G (research-expert) is not integrated into Scientia evidence flow
Status: mens-research-track-blueprint-2026.md gives Lane G a spec. The blueprint says "when research_model_enabled is true, the orchestrator delegates to this adapter." But:
research_model_enabledis not a field in any config or runtime struct.- No gate in
scientia_evidence.rsor the orchestrator dispatches to Lane G.
Solution:
- Add
research_model_enabled: booltoVoxPopuliConfig(orSocratesTaskContext). - When
research_model_enabled && complexity >= Complex, dispatch synthesis to Lane G endpoint. - Add
MENS_LANE_G_ENDPOINTenv var resolved via Clavis.
Owner: vox-orchestrator, vox-scientia-core | Severity: High | Effort: Medium
Dimension 4 — Outbound Publication Pipeline
Problem 22: LaTeX/journal template engine is absent from submission/mod.rs
Status: The readiness audit (§Phase 1 "Remaining") explicitly lists: "LaTeX/camera-ready package builder, figure/filename validators, template compliance against JMLR/TMLR/JAIR style packs" as still missing.
Solution:
- Add
TemplateProfileenum:Jmlr,Tmlr,Jair,Arxiv,Generic. - Implement
SubmissionPackageBuilder::build_with_template(profile):- Validate source directory against profile requirements.
- Check figure formats (PDF preferred for JMLR, etc.).
- Generate
manifest.jsonwith SHA-256 digests. - Create deterministic
.ziparchive.
Owner: vox-publisher | Severity: High | Effort: Large
Problem 23: arXiv format preflight profile is missing
Status: The readiness audit explicitly states arxiv_format_profile is "missing."
Solution:
- Add
PreflightProfile::ArxivFormatchecking:- No filenames with spaces or non-ASCII characters.
- Root LaTeX file present.
- All
\includegraphicstargets resolvable. - No disallowed extensions in root.
- Wire into
publication-preflight --profile arxiv_format.
Severity: Medium | Effort: Small
Problem 24: Crossref adapter is documented but not wired
Status: crossref_metadata.rs exists (transform is drafted). But no adapter in scholarly/ actually submits to Crossref.
Solution:
- Implement
CrossrefAdapterinscholarly/crossref.rs. - Use existing
crossref_metadata.rsfor payload construction. - Gate behind
VOX_SCHOLARLY_ENABLE_CROSSREF=1andCROSSREF_API_KEYvia Clavis. - Add
vox scientia crossref-depositCLI (dry-run by default).
Severity: High | Effort: Medium
Problem 25: CITATION.cff generation is incomplete / not wired to CLI
Status: citation_cff.rs exists (5.4KB) but the readiness audit lists this as "Missing machine-readable citation assets."
Solution:
- Audit
citation_cff.rsagainst CFF 1.2.0 spec. - Wire
vox scientia generate-citation-cff --output CITATION.cffas a CLI command. - Include
CITATION.cffinSubmissionPackageBuilderoutput for Zenodo profile.
Severity: Medium | Effort: Small
Problem 26: Zenodo adapter only generates metadata JSON — no HTTP deposit
Status: The readiness audit says "Zenodo → partial (metadata done, upload/deposit not done)."
Solution:
- Add
ZenodoDepositClientinscholarly/zenodo.rsusing the Zenodo REST API. - Implement: deposition creation → file upload → publish workflow.
ZENODO_ACCESS_TOKENvia Clavis.- Add
--sandboxmode for pre-production validation.
Owner: vox-publisher | Severity: High | Effort: Medium
Problem 27: No automatic submission status synchronization
Status: publication-scholarly-remote-status-sync-batch requires manual invocation. No scheduler calls it.
Impact: Submission status drift: an accepted paper may show as "submitted" indefinitely.
Solution:
- Add a scheduled worker that calls
publication-scholarly-remote-status-sync-batchfor all non-terminal submissions. - Add
milestone_eventstable:(publication_id, milestone, recorded_at_ms, external_id)with valuessubmitted | under_review | accepted | published | rejected.
Owner: vox-db, vox-publisher | Severity: High | Effort: Medium
Problem 28: Author / co-author model mismatch (single author string vs authors[] array)
Status: The readiness audit §Lifecycle stage 2 flags: digest and CLI use a single author string; full co-author list lives in a JSON block. Mismatches if they disagree.
Solution:
- Add preflight check: if
scientific_publication.authors[]present, derivedisplay_authorfromauthors[0], warn on disagreement. - Soft-deprecate the manifest
authorfield. - Update
manifest_completion_reportto checkauthors[].orcidcompleteness separately.
Severity: Medium | Effort: Small
Problem 29: Revision lifecycle has no external venue revision ID mapping
Status: When digest changes, there is no way to know what revision number it corresponds to at the external venue (e.g., TMLR v2, OpenReview R2).
Solution:
- Add
scholarly_revision_maptable perscholarly-external-schema-plan.md. - Capture external revision ID on each adapter submit response.
publication-statusshould show unified timeline:v1(digest=abc) → submitted → R1 → v2(digest=xyz) → R2 → accepted.
Severity: Medium | Effort: Medium
Problem 30: Double-blind anonymization gate is partial (email heuristic only)
Status: The readiness audit (§Lifecycle stage 3) states: "email heuristic present, broader anonymization missing" for double_blind profile.
Solution:
- Extend
publication_preflight.rsdouble-blind checks to scan:abstract_textfield for name/institution patterns (heuristic regex).- Generated filenames and LaTeX comments for author metadata.
- Acknowledgements section stub.
- Add
AnonymizationScanResult { risk_level: High | Medium | Low }. High→ hard fail;Medium→ warning innext_actions.
Severity: Medium | Effort: Small
Problem 31: HN submission has no structured handoff payload
Status: The social execution board template exists but hn_assist in destination_transform_previews() (scientia_discovery.rs:470) just concatenates a string.
Solution:
- Add
HnHandoffPayload { title: String, url: String, comment: String }tosyndication_outcome.rs. - Generate structured JSON during
destination_transform_previews(). - Add CI check that
titlerespects the 80-char HN limit.
Severity: Low | Effort: Small
Dimension 5 — SSOT Convergence and Structural Problems
Problem 32: Worthiness scoring exists in 5 competing locations with no CI parity check
Status: Numerics appear in publication_worthiness.rs, publication-worthiness.default.yaml, worthiness-signals.v2.schema.json, scientia_heuristics.rs, and scientia_finding_ledger.rs.
Impact: Updating a threshold requires touching 2-4 files. Silent inconsistency risk is high.
Solution:
- Declare
publication-worthiness.default.yamlas the single source of numeric truth. ScientiaHeuristics::from_default_yaml()loads and validates against the JSON schema at startup.- Add
vox ci scientia-worthiness-paritycross-checking YAML values against unit test constants. - All Rust constants reference the loaded struct, not magic numbers.
Owner: vox-publisher, contracts | Severity: High | Effort: Medium
Problem 33: The 232-task wave backlog has no CI tracking or CLI surface
Status: implementation-wave-backlog.v1.yaml exists but there is no vox ci scientia-wave-progress and no CLI to query wave completion.
Solution:
- Add
vox scientia wave-statusCLI that reads the YAML and checks which expected artifacts exist on disk. - Emit completion percentage per wave.
- Add as informational step in
vox ci ssot-drift.
Severity: Medium | Effort: Small
Problem 34: vox-publisher is still the God Object the package-family split was meant to dissolve
Status: vox-publisher/src/ has 28 source files; lib.rs alone is 40KB. vox-scientia-core does not exist as a crate. AGENTS.md limits to 500 lines / 12 methods.
Solution:
- Execute the Split Wave: move
scientia_evidence.rs,scientia_heuristics.rs,scientia_discovery.rs,scientia_contracts.rstovox-scientia-core. - Wire
vox-publisheras a re-export shim. - Track in a
scientia-split-migration-ledger.md.
Severity: Medium | Effort: Large
Problem 35: Research Index does not link the RAG SSOT as the canonical retrieval reference
Status: rag-and-research-architecture-2026.md is the current-state SSOT for retrieval. research-index.md mentions it tangentially but does not surface it as the canonical SSOT.
Solution:
- Add "Retrieval and RAG Architecture (Current)" section to
research-index.mdlinking to the RAG SSOT. - Also cross-link from
scientia-publication-automation-ssot.mdsource anchors.
Severity: Low | Effort: Small
Problem 36: contracts/index.yaml likely does not register all 27 scientia contracts
Status: The impact-readership research doc mandates contract registration in contracts/index.yaml. No evidence all 27 contracts/scientia/ files are registered.
Solution:
- Audit
contracts/index.yamlagainstcontracts/scientia/directory listing. - Add missing registrations.
- Add CI check that enforces
contracts/scientia/⊆contracts/index.yaml.
Severity: Medium | Effort: Small
Problem 37: voxgiantia-publication-architecture.md may be a shadow SSOT
Status: This 6.7KB doc is not referenced in the main SSOT's source anchors. It is unclear if it is superseded or covers a distinct scope.
Solution:
- Audit the doc for overlap with
scientia-publication-automation-ssot.md. - If superseded: add deprecation header + link to current SSOT.
- If distinct: add to SSOT source anchors with a scope label.
Severity: Low | Effort: Small
Problem 38: Syndication security docs are architecturally isolated from Scientia
Status: news_syndication_incident_patterns.md and news_syndication_security.md are not linked from the Scientia SSOT or the inbound discovery research doc.
Solution:
- Add links from
scientia-external-discovery-research-2026.mdto both syndication docs in a "Security constraints" section. - Ensure
NewsInboundpreflight (Problem 4) incorporates the threat taxonomy fromnews_syndication_security.md.
Severity: Low | Effort: Small
Dimension 6 — Quality, Evaluation, and Autonomy Gaps
Problem 39: No golden test set for search recall
Status: The RAG SSOT §8 explicitly identifies "Recall@K golden set → Not built" as a gap.
Solution:
- Build 50-100 labelled
(query, expected_doc_ids)pairs from real orchestrator queries. - Add
vox ci search-recall-at-kemitting Recall@5 and MRR metrics. - Gate on ≤5% relative regression budget per PR.
Severity: Medium | Effort: Medium
Problem 40: No RAGAS-style faithfulness metric
Status: The RAG SSOT §8 identifies "RAGAS faithfulness → Not implemented" as a gap.
Solution:
- Implement lightweight faithfulness check: compare claim-sentences in answers against retrieved passages using existing BM25 lexical overlap logic.
- Run as a periodic background job (not on every completion).
- Persist results to Arca. Flag completions below
min_faithfulness = 0.4for analysis.
Severity: Medium | Effort: Medium
Problem 41: Socrates has no evaluate_research_need() dispatch path
Status: The RAG SSOT §4.4 shows SocratesResearchDecision as [PLANNED]. The struct is defined in the doc but does not exist in crates/vox-socrates-policy/src/lib.rs.
Impact: When Socrates returns Abstain, the caller has no structured signal about whether to trigger CRAG or simply decline.
Solution:
- Implement
evaluate_research_need(confidence, contradiction_ratio, complexity) -> SocratesResearchDecisioninvox-socrates-policy. - Wire into the orchestrator's pre-generation hook.
- Auto-dispatch CRAG when
should_research = true.
Owner: vox-socrates-policy, vox-orchestrator | Severity: High | Effort: Medium
Problem 42: The Coverage Paradox fix is documented but not coded
Status: The RAG SSOT §4.3 documents the fix (only apply contradiction penalty when citation_coverage >= 0.3) as [PLANNED].
Impact: Agents fall into a refusal loop on abstract synthesis queries — the very class most relevant to Scientia research workflows.
Solution:
- Add
citation_coverage: Option<f64>parameter toclassify_risk(). - When
citation_coverage < 0.3, suppressmax_contradiction_ratio_for_answerpenalty. - Add unit test:
low_coverage_high_contradiction_should_ask_not_abstain.
Owner: vox-socrates-policy | Severity: High | Effort: Small
Problem 43: No Tavily credit budget tracking or doctor warning
Status: The RAG SSOT §8 identifies "Tavily credit usage → Not tracked" as a gap.
Impact: Aggressive CRAG loops can exhaust the session credit budget silently.
Solution:
- Track
tavily_credits_used: u32in theSearchPolicysession context. - When usage ≥ 80% of budget, emit
SearchRefinementAction::BudgetWarning. - Add
vox clavis doctorcheck displaying current credit budget.
Severity: Medium | Effort: Small
Problem 44: CLI/MCP tools bypass the vox-scientia-api package boundary
Status: vox-cli/src/commands/scientia.rs and vox-mcp/src/tools/scientia_tools.rs both directly import from vox-publisher, not vox-scientia-api.
Impact: When vox-publisher is eventually split, every CLI/MCP callsite will break.
Solution:
- Create
crates/vox-scientia-api/as a façade crate. - Update
vox-cliandvox-mcpCargo.tomlto depend onvox-scientia-api. - Add FROZEN marker on
vox-publisher's public surface.
Severity: Medium | Effort: Small
Problem 45: No end-to-end integration test for the Scientia lifecycle
Status: Unit tests exist for individual functions. acceptance_matrix.ps1 exists. But no integration test exercises the full pipeline: prepare → preflight → approve → scholarly-pipeline-run → status → metrics.
Solution:
- Add
tests/scientia_lifecycle_test.rsusinglocal_ledger/echo_ledgeradapters (no external credentials needed). - Cover: manifest creation → preflight pass → dual approval → external job tick → status assertion.
- Add to
vox ci scientia-novelty-ledger-contractsor asvox ci scientia-lifecycle.
Severity: Medium | Effort: Medium
Summary Priority Matrix
| # | Problem | Severity | Effort | Owner Crate |
|---|---|---|---|---|
| 1 | No inbound pipeline crate | Critical | Large | vox-scientia-ingest (new) |
| 4 | No Socrates inbound profile | Critical | Medium | vox-publisher, vox-socrates-policy |
| 6 | No external intelligence DB table | Critical | Small | vox-db |
| 9 | Publications never re-enter search corpora | Critical | Medium | vox-publisher, vox-search |
| 18 | Prior art overlaps always None in rank_candidate() | High | Small | vox-publisher |
| 11 | Socrates Abstain events not persisted | High | Small | vox-db, vox-socrates-policy |
| 12 | CRAG results not stored back | High | Small | vox-search |
| 14 | Discovery ranking constants hardcoded in Rust | High | Small | vox-publisher |
| 16 | No multi-hop hypothesis chain generation | High | Large | vox-scientia-core |
| 21 | Lane G not integrated into Scientia evidence flow | High | Medium | vox-orchestrator |
| 22 | LaTeX package builder absent | High | Large | vox-publisher |
| 24 | Crossref adapter not wired | High | Medium | vox-publisher |
| 26 | Zenodo adapter metadata-only, no HTTP deposit | High | Medium | vox-publisher |
| 27 | No automatic submission status sync | High | Medium | vox-db, vox-publisher |
| 32 | Worthiness scoring split across 5 locations | High | Medium | vox-publisher, contracts |
| 41 | Socrates research dispatch not coded | High | Medium | vox-socrates-policy |
| 42 | Coverage Paradox fix not coded | High | Small | vox-socrates-policy |
| 5 | No semantic deduplication inbound | Medium | Small | vox-scientia-ingest |
| 7 | No Scholarly Digest contract | Medium | Medium | contracts, vox-scientia-core |
| 10 | Evidence packs not in knowledge graph | Medium | Medium | vox-scientia-core, vox-search |
| 13 | No FindingsDraftCorpus in RAG | Medium | Medium | vox-search |
| 15 | No signal code registry/CI check | Medium | Small | contracts, CI |
| 19 | Evidence completeness uses equal weights | Medium | Small | vox-publisher |
| 20 | No contamination risk detection | Medium | Medium | vox-scientia-core |
| 23 | arXiv format preflight missing | Medium | Small | vox-publisher |
| 25 | CITATION.cff generation incomplete | Medium | Small | vox-publisher |
| 28 | Author/co-author model mismatch | Medium | Small | vox-publisher, vox-db |
| 29 | No revision lifecycle mapping | Medium | Medium | vox-db, vox-publisher |
| 30 | Double-blind anonymization gate is partial | Medium | Small | vox-publisher |
| 33 | Wave backlog has no CI tracking | Medium | Small | CI, vox-cli |
| 34 | vox-publisher God Object not split | Medium | Large | All Scientia crates |
| 36 | Contract index missing scientia registrations | Medium | Small | contracts |
| 39 | No golden test set for search recall | Medium | Medium | vox-search |
| 40 | No RAGAS-style faithfulness metric | Medium | Medium | vox-search, vox-db |
| 43 | No Tavily credit tracking | Medium | Small | vox-search, vox-clavis |
| 44 | CLI/MCP bypass vox-scientia-api boundary | Medium | Small | vox-cli, vox-mcp |
| 45 | No lifecycle integration test | Medium | Medium | vox-db |
| 2 | No RSS/Atom feed parsing crate | Medium | Small | vox-scientia-ingest |
| 8 | No feed source registry table | Medium | Small | vox-db |
| 17 | No experimental design scaffolding | Medium | Medium | vox-scientia-core |
| 3 | No Reddit/HN inbound read path | Low | Medium | vox-publisher |
| 31 | HN submission unstructured handoff | Low | Small | vox-publisher |
| 35 | Research index missing RAG SSOT link | Low | Small | docs |
| 37 | Shadow SSOT doc voxgiantia-publication-architecture.md | Low | Small | docs |
| 38 | Syndication security docs isolated from Scientia | Low | Small | docs |
Recommended Execution Order (7 Waves)
Wave 0 — Quick Wins (1–3 days each, unblock parity and safety)
- P18: Fix
rank_candidate()always-None novelty overlap - P42: Code the Coverage Paradox fix in
classify_risk() - P43: Add Tavily credit tracking and doctor warning
- P15: Add discovery signal code registry and CI check
- P19: Load evidence completeness weights from YAML
- P44: Create
vox-scientia-apifaçade and update CLI/MCP
Wave 1 — Foundation Hardening (1–2 weeks)
- P11: Persist Socrates Abstain events to Arca
- P12: Store CRAG results back into DocumentChunks
- P14: Load
ScientiaHeuristicsfrom YAML contract - P28: Author/co-author model preflight + soft-deprecation
- P32: Unify worthiness scoring to YAML source of truth + parity CI
- P35, P36, P37, P38: Documentation and contract housekeeping
- P41: Implement
evaluate_research_need()dispatch in Socrates - P33: Add
vox scientia wave-statusCLI
Wave 2 — Inbound Pipeline (new crate focus)
- P6: Add
scientia_external_intelligenceDB table - P8: Add
scientia_feed_sourcesDB table and CLI commands - P1: Create
vox-scientia-ingestcrate shell - P2: Wire
feed-rsfor RSS/Atom crawling - P4: Add
PreflightProfile::NewsInboundin Socrates - P5: Add
IngestDeduplicatoragainst embeddings table - P7: Add
scholarly-digest.v1.schema.json+digest-generateCLI
Wave 3 — RAG Feedback Loop
- P9:
PostPublishIndexer— publications back intoDocumentChunks - P10:
EvidencePackKGIndexer— evidence chains into KnowledgeGraph - P13:
FindingsDraftCorpusvariant for in-progress findings
Wave 4 — Discovery Intelligence Upgrade
- P16:
HypothesisChainBuilderwith Lane G integration - P17:
experiment-scaffoldCLI - P20:
ContaminationRiskAssessor - P21: Wire Lane G into the Scientia synthesis path
Wave 5 — Outbound Publication Completeness
- P22: LaTeX/template engine in
SubmissionPackageBuilder - P23:
PreflightProfile::ArxivFormat - P24:
CrossrefAdapterwired - P25: Complete
citation_cff.rsand wire CLI - P26:
ZenodoDepositClientHTTP submit - P27: Auto status sync scheduler +
milestone_eventstable - P29:
scholarly_revision_maptable - P30: Extended double-blind anonymization scan
- P31: Structured
HnHandoffPayload
Wave 6 — God Object Split and Structural
- P34: Extract
vox-scientia-corefromvox-publisher - P45: Lifecycle integration test suite
Wave 7 — Quality and Evaluation
- P39: Golden recall test set +
vox ci search-recall-at-k - P40: Lightweight RAGAS-style faithfulness metric
Appendix: Cross-References
| Concern | Primary SSOT | Owner Crate |
|---|---|---|
| Publication pipeline | scientia-publication-automation-ssot.md | vox-publisher |
| RAG retrieval | rag-and-research-architecture-2026.md | vox-search |
| Hallucination gate | vox-socrates-policy/src/lib.rs | vox-socrates-policy |
| Evidence model | scientia_evidence.rs, scientia-evidence-graph.schema.json | vox-publisher |
| Discovery ranking | scientia_discovery.rs, publication-worthiness.default.yaml | vox-publisher |
| Inbound discovery | scientia-external-discovery-research-2026.md | vox-scientia-ingest (TBD) |
| MENS Lane G | mens-research-track-blueprint-2026.md | vox-orchestrator |
| Worthiness signals | worthiness-signals.v2.schema.json | contracts |
| Impact/readership | scientia-impact-readership-research-2026.md | assistive only |
| Automation boundaries | scientia-publication-worthiness-ssot-unification-research-2026.md | policy |