"Vox Scientia Gap Analysis (April 2026)"

Vox Scientia Gap Analysis (April 2026)

[!IMPORTANT] This document is a research artifact written to docs/src/architecture/scientia-gap-analysis-2026.md per the project's AGENTS.md policy. It identifies 45 concrete problems across all stages of the Scientia lifecycle with proposed solutions and a recommended execution wave order.


Dimension 1 — Inbound Research Discovery

Problem 1: The "inbound" pipeline exists only in a research doc

Status: scientia-external-discovery-research-2026.md describes a Collector → Evaluator → Synthesizer multi-agent inbound stack, but no crate, no schema, no CLI command, and no DB table has been created for it.

Impact: Scientia is entirely outbound. It can package discoveries but cannot autonomously surface new ones from external literature. Without the inbound stack, "making discoveries externally" requires fully manual effort.

Solution: Implement the inbound pipeline in three slices:

  1. Add crates/vox-scientia-ingest/ as a new crate with InboundItem, FeedSource, and IngestSession structs.
  2. Add scientia_external_intelligence DB table under publish_cloud.
  3. Expose vox scientia ingest-feeds CLI and vox_scientia_ingest_feeds MCP tool.

Owner crates: vox-scientia-ingest (new), vox-db, vox-cli, vox-mcp | Severity: Critical | Effort: Large


Problem 2: No RSS/Atom feed parsing crate is wired

Status: The research doc recommends feed-rs, but there is no Cargo.toml dependency and no source code consuming feeds.

Solution:

  • Add feed-rs = "1.3" dependency.
  • Implement FeedCrawler::crawl_all(sources: &[FeedSource]) -> Vec<InboundItem>.
  • Persist source registry in scientia_feed_sources table keyed by URL + last_crawled_at_ms.

Severity: High | Effort: Small


Problem 3: No Reddit/HN inbound read path exists (only outbound)

Status: vox-publisher/src/adapters/reddit.rs handles outbound submission. The research doc proposes inverting this for read-only monitoring, but no implementation exists.

Solution:

  • Add RedditInboundClient behind scientia-inbound-reddit feature flag.
  • Use existing refresh_access_token machinery (read-only scope).
  • Gate on VOX_SCIENTIA_REDDIT_INBOUND=1 via Clavis.

Severity: Medium | Effort: Medium


Problem 4: No Socrates inbound policy profile — only outbound preflight profiles

Status: PreflightProfile variants (DoubleBlind, MetadataComplete, ArxivAssist) evaluate outgoing manifests. The research doc specifies a NewsInbound profile that doesn't exist in publication_preflight.rs.

Impact: Any inbound external article would bypass the quality gate entirely. Noise and "slop" would enter the discovery corpus unchecked.

Solution:

  • Add PreflightProfile::NewsInbound variant checking: requires_code_repo_link, requires_reproducible_benchmark, maximum_opinion_ratio.
  • Apply ComplexityJudge from vox-socrates-policy on inbound article text.
  • High-contradiction items go to Quarantine state in scientia_external_intelligence.status.

Owner: vox-publisher, vox-socrates-policy | Severity: Critical | Effort: Medium


Problem 5: No semantic deduplication before inbound insert

Status: memory_hybrid.rs does BM25 + vector retrieval, but there is no pre-insert duplicate-detection call for the inbound pipeline. The research doc specifies a similarity > 0.9 guard that is unimplemented.

Impact: The same arXiv preprint reported by multiple sources will be inserted three times, bloating the corpus with redundant signal.

Solution:

  • Add IngestDeduplicator::is_duplicate(embedding: &[f32], threshold: f64) -> bool querying the SQLite embeddings table before insert.
  • On duplicate, append the source URL to the existing document's provenance_json.
  • Threshold pinned in scientia_heuristics.rs (not a magic constant).

Severity: Medium | Effort: Small


Problem 6: No scientia_external_intelligence DB table or migration

Status: The research doc identifies this table but it does not exist in publish_cloud.rs.

Solution: Add additive migration:

CREATE TABLE IF NOT EXISTS scientia_external_intelligence (
  id TEXT PRIMARY KEY,
  source_url TEXT NOT NULL,
  source_kind TEXT NOT NULL,  -- 'rss', 'reddit', 'hn', 'arxiv'
  title TEXT NOT NULL,
  abstract_text TEXT,
  embedding_id TEXT,
  provenance_json TEXT DEFAULT '[]',
  ingest_status TEXT NOT NULL DEFAULT 'pending',
  preflight_score REAL,
  ingested_at_ms INTEGER NOT NULL,
  reviewed_at_ms INTEGER
);

Owner: vox-db | Severity: Critical | Effort: Small


Problem 7: Inbound Scholarly Digest has no synthesis loop contract

Status: The research doc specifies a Collector → Evaluator → Synthesizer multi-agent flow, but the Synthesizer has no design contract in code or contracts directory.

Solution:

  • Add contracts/scientia/scholarly-digest.v1.schema.json specifying the digest output structure (cluster, delta summary, impact assessment).
  • Add vox scientia digest-generate CLI to drive the A2A multi-agent synthesis flow.
  • Use Tier 1 (local model) for initial categorization; escalate ComplexityBand::Complex to Tier 2.

Severity: High | Effort: Medium


Problem 8: No persistent registry of external intelligence sources

Status: Feed URLs have no registry table. Sources would be hardcoded or passed per-invocation.

Solution:

  • Add scientia_feed_sources table: (id, url, source_kind, crawl_interval_ms, enabled, last_crawled_at_ms, last_error).
  • Add vox scientia feed-source-add / feed-source-list / feed-source-disable commands.

Severity: Medium | Effort: Small


Dimension 2 — RAG-to-Scientia Feedback Loop

Problem 9: Scientia publications never re-enter the search corpora

Status: After a successful publication, the manifest and evidence pack are stored in publish_cloud tables but are never indexed into vox-search corpora.

Impact: The system cannot search its own published discoveries. This is a fundamental closed-loop failure.

Solution:

  • Add PostPublishIndexer step in postPublishAudit.
  • On publication_status = 'published', embed manifest title + abstract + evidence metadata into DocumentChunks corpus with source_kind = 'scientia_publication'.
  • Tag chunk with manifest digest for retrieval attribution.

Owner: vox-publisher, vox-search | Severity: Critical | Effort: Medium


Problem 10: Evidence packs are not linked into the knowledge graph

Status: metadata_json.scientia_evidence is stored per-manifest but never inserted into the KnowledgeGraph SQLite tables.

Impact: Multi-hop queries like "what findings relate to our GRPO reward shaping work?" cannot traverse from publication to its evidence chain.

Solution:

  • Add EvidencePackKGIndexer inserting typed nodes and edges:
    • Node: Publication(id, title, pub_date)
    • Node: BenchmarkRun(run_id, result_summary)
    • Edge: has_evidence(publication_id → benchmark_run_id)
    • Edge: cites_doc(publication_id → doc_path)

Severity: Medium | Effort: Medium


Problem 11: Socrates Abstain events are not persisted for analysis or training

Status: The RAG SSOT §8 explicitly identifies "Hallucination events → Not persisted" as a gap.

Impact: We cannot detect patterns in what Scientia fails to answer. min_training_pair_confidence = 0.75 floor is defined but high-confidence Abstain events are lost.

Solution:

  • Add socrates_abstain_events Arca table: (id, query_hash, confidence, contradiction_ratio, risk_decision, suggested_query, timestamp).
  • Persist on every Abstain outcome from the research path.
  • Include abstain rate and top abstain queries in vox telemetry search-quality-report.

Owner: vox-db, vox-socrates-policy | Severity: High | Effort: Small


Problem 12: CRAG loop fires and fetches web evidence that is never persisted

Status: The CRAG loop in bundle.rs fetches Tavily results and re-runs RRF fusion. However, there is no mechanism to persist the corrected retrieval result.

Impact: The same low-quality query will trigger Tavily again on the next execution — burning credits and adding latency — because the new evidence was never stored.

Solution:

  • After CRAG correction (evidence_quality improved above threshold), store Tavily-retrieved content into DocumentChunks corpus with source_kind = 'crag_web_result' and a 7-day TTL.

Severity: High | Effort: Small


Problem 13: No awareness of in-progress Scientia findings in the RAG pipeline

Status: When an agent query matches a topic that Scientia has already identified as a StrongCandidate discovery, the RAG pipeline has no way to surface this.

Solution:

  • Add FindingsDraftCorpus as a new optional SearchCorpus variant backed by publication_manifests where status = 'draft' AND discovery_tier = 'strong_candidate'.
  • Activate when SearchIntent::Research and query relevance exceeds threshold.
  • Gate with VOX_SEARCH_FINDINGS_DRAFT=1.

Severity: Medium | Effort: Medium


Dimension 3 — Internal Scientific Discovery Mechanisms

Problem 14: Discovery ranking constants are hardcoded in Rust

Status: scientia_discovery.rs calls ScientiaHeuristics::default() with embedded numeric constants. The impact-readership research doc explicitly identifies this as architectural debt.

Impact: Tuning discovery sensitivity requires a code change and recompile.

Solution:

  • Load heuristics from contracts/scientia/scientia-discovery-heuristics.v1.yaml.
  • Implement ScientiaHeuristics::from_yaml(path: &Path) -> Result<Self>.

Owner: vox-publisher, vox-scientia-core | Severity: High | Effort: Small


Problem 15: Signal catalog (discovery_signals) has no formal schema contract

Status: Signal codes like eval_gate_passed, human_advance_attested are string literals without a machine-checkable registry.

Impact: A typo in a signal code silently produces an Informational signal instead of Strong.

Solution:

  • Add contracts/scientia/discovery-signal-codes.v1.yaml enumerating all valid codes with their strength level.
  • Add vox ci scientia-signal-codes CI check.
  • Consider SignalCode enum generated from the YAML at build time.

Severity: Medium | Effort: Small


Problem 16: No multi-hop hypothesis chain generation

Status: scientia_prior_art.rs checks overlap and scientia_finding_ledger.rs scores novelty, but there is no mechanism to chain multiple findings into a composite hypothesis.

Solution:

  • Design HypothesisChainBuilder in vox-scientia-core:
    1. Fetch StrongCandidate manifests.
    2. Query KnowledgeGraph for shared evidence nodes.
    3. Use MENS Lane G or Tier 2 model to propose hypothesis chains.
    4. Return HypothesisCandidate structs with attribution map.
  • Add vox scientia hypothesis-scan CLI.
  • Gate as human_approval_required = true per automation boundary matrix.

Severity: High | Effort: Large


Problem 17: No experimental design scaffolding

Status: Once a hypothesis is identified, there is no tooling to scaffold a research experiment (define metrics, set baseline run, configure eval gate).

Solution:

  • Add vox scientia experiment-scaffold --hypothesis-id <id> which:
    1. Creates a draft manifest pre-filled with the hypothesis.
    2. Emits a scientia_evidence template with placeholder eval gate and benchmark block.
    3. Generates a checklist of evidence needed to reach AutoDraftEligible.
  • All generated content marked machine_suggested = true.

Severity: Medium | Effort: Medium


Problem 18: prior_art_max_lexical_overlap and prior_art_max_semantic_overlap are always None

Status: In scientia_discovery.rs lines 289-291, both overlap fields are hardcoded to None in rank_candidate(). They are only populated by a separately-called merge_novelty_overlap_into_rank().

Impact: Any ranking performed without the explicit merge call returns None for novelty overlap, making the rank appear to have perfect novelty when it may not.

Solution:

  • Rename rank_candidate()rank_candidate_without_novelty().
  • Add rank_candidate_with_novelty(…, novelty_bundle: Option<&NoveltyEvidenceBundleV1>) that internally merges.
  • Update all callers (CLI, MCP, scan paths).

Owner: vox-publisher | Severity: High | Effort: Small


Problem 19: evidence_completeness_score counts 11 binary signals with equal weight

Status: All 11 evidence signals contribute 1 point each. human_meaningful_advance = true weighs the same as !doc_section_hints.is_empty().

Impact: Completeness scores are misleading. The submission_readiness_score KPI is contaminated.

Solution:

  • Load per-signal weights from the heuristics YAML (Problem 14).
  • human_meaningful_advance and eval_gate_passed should weigh 3×; doc hints 1×.

Severity: Medium | Effort: Small


Problem 20: No contamination risk detection for internal eval corpora

Status: The worthiness unification research doc identifies contamination_risk_flag as a candidate signal. No implementation exists.

Impact: An internal benchmark may be inflated due to training data overlapping with the eval set — a form of benchmark leakage that Scientia has no detector for.

Solution:

  • Add ContaminationRiskAssessor::assess(eval_corpus_id, training_corpus_ids) -> ContaminationRisk in vox-scientia-core.
  • Use n-gram overlap as a first-pass detector.
  • Emit contamination_risk_flag in worthiness_signals.v2 with soft_gate classification.

Severity: Medium | Effort: Medium


Problem 21: MENS Lane G (research-expert) is not integrated into Scientia evidence flow

Status: mens-research-track-blueprint-2026.md gives Lane G a spec. The blueprint says "when research_model_enabled is true, the orchestrator delegates to this adapter." But:

  • research_model_enabled is not a field in any config or runtime struct.
  • No gate in scientia_evidence.rs or the orchestrator dispatches to Lane G.

Solution:

  • Add research_model_enabled: bool to VoxPopuliConfig (or SocratesTaskContext).
  • When research_model_enabled && complexity >= Complex, dispatch synthesis to Lane G endpoint.
  • Add MENS_LANE_G_ENDPOINT env var resolved via Clavis.

Owner: vox-orchestrator, vox-scientia-core | Severity: High | Effort: Medium


Dimension 4 — Outbound Publication Pipeline

Problem 22: LaTeX/journal template engine is absent from submission/mod.rs

Status: The readiness audit (§Phase 1 "Remaining") explicitly lists: "LaTeX/camera-ready package builder, figure/filename validators, template compliance against JMLR/TMLR/JAIR style packs" as still missing.

Solution:

  • Add TemplateProfile enum: Jmlr, Tmlr, Jair, Arxiv, Generic.
  • Implement SubmissionPackageBuilder::build_with_template(profile):
    1. Validate source directory against profile requirements.
    2. Check figure formats (PDF preferred for JMLR, etc.).
    3. Generate manifest.json with SHA-256 digests.
    4. Create deterministic .zip archive.

Owner: vox-publisher | Severity: High | Effort: Large


Problem 23: arXiv format preflight profile is missing

Status: The readiness audit explicitly states arxiv_format_profile is "missing."

Solution:

  • Add PreflightProfile::ArxivFormat checking:
    • No filenames with spaces or non-ASCII characters.
    • Root LaTeX file present.
    • All \includegraphics targets resolvable.
    • No disallowed extensions in root.
  • Wire into publication-preflight --profile arxiv_format.

Severity: Medium | Effort: Small


Problem 24: Crossref adapter is documented but not wired

Status: crossref_metadata.rs exists (transform is drafted). But no adapter in scholarly/ actually submits to Crossref.

Solution:

  • Implement CrossrefAdapter in scholarly/crossref.rs.
  • Use existing crossref_metadata.rs for payload construction.
  • Gate behind VOX_SCHOLARLY_ENABLE_CROSSREF=1 and CROSSREF_API_KEY via Clavis.
  • Add vox scientia crossref-deposit CLI (dry-run by default).

Severity: High | Effort: Medium


Problem 25: CITATION.cff generation is incomplete / not wired to CLI

Status: citation_cff.rs exists (5.4KB) but the readiness audit lists this as "Missing machine-readable citation assets."

Solution:

  • Audit citation_cff.rs against CFF 1.2.0 spec.
  • Wire vox scientia generate-citation-cff --output CITATION.cff as a CLI command.
  • Include CITATION.cff in SubmissionPackageBuilder output for Zenodo profile.

Severity: Medium | Effort: Small


Problem 26: Zenodo adapter only generates metadata JSON — no HTTP deposit

Status: The readiness audit says "Zenodo → partial (metadata done, upload/deposit not done)."

Solution:

  • Add ZenodoDepositClient in scholarly/zenodo.rs using the Zenodo REST API.
  • Implement: deposition creation → file upload → publish workflow.
  • ZENODO_ACCESS_TOKEN via Clavis.
  • Add --sandbox mode for pre-production validation.

Owner: vox-publisher | Severity: High | Effort: Medium


Problem 27: No automatic submission status synchronization

Status: publication-scholarly-remote-status-sync-batch requires manual invocation. No scheduler calls it.

Impact: Submission status drift: an accepted paper may show as "submitted" indefinitely.

Solution:

  • Add a scheduled worker that calls publication-scholarly-remote-status-sync-batch for all non-terminal submissions.
  • Add milestone_events table: (publication_id, milestone, recorded_at_ms, external_id) with values submitted | under_review | accepted | published | rejected.

Owner: vox-db, vox-publisher | Severity: High | Effort: Medium


Problem 28: Author / co-author model mismatch (single author string vs authors[] array)

Status: The readiness audit §Lifecycle stage 2 flags: digest and CLI use a single author string; full co-author list lives in a JSON block. Mismatches if they disagree.

Solution:

  • Add preflight check: if scientific_publication.authors[] present, derive display_author from authors[0], warn on disagreement.
  • Soft-deprecate the manifest author field.
  • Update manifest_completion_report to check authors[].orcid completeness separately.

Severity: Medium | Effort: Small


Problem 29: Revision lifecycle has no external venue revision ID mapping

Status: When digest changes, there is no way to know what revision number it corresponds to at the external venue (e.g., TMLR v2, OpenReview R2).

Solution:

  • Add scholarly_revision_map table per scholarly-external-schema-plan.md.
  • Capture external revision ID on each adapter submit response.
  • publication-status should show unified timeline: v1(digest=abc) → submitted → R1 → v2(digest=xyz) → R2 → accepted.

Severity: Medium | Effort: Medium


Problem 30: Double-blind anonymization gate is partial (email heuristic only)

Status: The readiness audit (§Lifecycle stage 3) states: "email heuristic present, broader anonymization missing" for double_blind profile.

Solution:

  • Extend publication_preflight.rs double-blind checks to scan:
    • abstract_text field for name/institution patterns (heuristic regex).
    • Generated filenames and LaTeX comments for author metadata.
    • Acknowledgements section stub.
  • Add AnonymizationScanResult { risk_level: High | Medium | Low }.
  • High → hard fail; Medium → warning in next_actions.

Severity: Medium | Effort: Small


Problem 31: HN submission has no structured handoff payload

Status: The social execution board template exists but hn_assist in destination_transform_previews() (scientia_discovery.rs:470) just concatenates a string.

Solution:

  • Add HnHandoffPayload { title: String, url: String, comment: String } to syndication_outcome.rs.
  • Generate structured JSON during destination_transform_previews().
  • Add CI check that title respects the 80-char HN limit.

Severity: Low | Effort: Small


Dimension 5 — SSOT Convergence and Structural Problems

Problem 32: Worthiness scoring exists in 5 competing locations with no CI parity check

Status: Numerics appear in publication_worthiness.rs, publication-worthiness.default.yaml, worthiness-signals.v2.schema.json, scientia_heuristics.rs, and scientia_finding_ledger.rs.

Impact: Updating a threshold requires touching 2-4 files. Silent inconsistency risk is high.

Solution:

  • Declare publication-worthiness.default.yaml as the single source of numeric truth.
  • ScientiaHeuristics::from_default_yaml() loads and validates against the JSON schema at startup.
  • Add vox ci scientia-worthiness-parity cross-checking YAML values against unit test constants.
  • All Rust constants reference the loaded struct, not magic numbers.

Owner: vox-publisher, contracts | Severity: High | Effort: Medium


Problem 33: The 232-task wave backlog has no CI tracking or CLI surface

Status: implementation-wave-backlog.v1.yaml exists but there is no vox ci scientia-wave-progress and no CLI to query wave completion.

Solution:

  • Add vox scientia wave-status CLI that reads the YAML and checks which expected artifacts exist on disk.
  • Emit completion percentage per wave.
  • Add as informational step in vox ci ssot-drift.

Severity: Medium | Effort: Small


Problem 34: vox-publisher is still the God Object the package-family split was meant to dissolve

Status: vox-publisher/src/ has 28 source files; lib.rs alone is 40KB. vox-scientia-core does not exist as a crate. AGENTS.md limits to 500 lines / 12 methods.

Solution:

  • Execute the Split Wave: move scientia_evidence.rs, scientia_heuristics.rs, scientia_discovery.rs, scientia_contracts.rs to vox-scientia-core.
  • Wire vox-publisher as a re-export shim.
  • Track in a scientia-split-migration-ledger.md.

Severity: Medium | Effort: Large


Status: rag-and-research-architecture-2026.md is the current-state SSOT for retrieval. research-index.md mentions it tangentially but does not surface it as the canonical SSOT.

Solution:

  • Add "Retrieval and RAG Architecture (Current)" section to research-index.md linking to the RAG SSOT.
  • Also cross-link from scientia-publication-automation-ssot.md source anchors.

Severity: Low | Effort: Small


Problem 36: contracts/index.yaml likely does not register all 27 scientia contracts

Status: The impact-readership research doc mandates contract registration in contracts/index.yaml. No evidence all 27 contracts/scientia/ files are registered.

Solution:

  • Audit contracts/index.yaml against contracts/scientia/ directory listing.
  • Add missing registrations.
  • Add CI check that enforces contracts/scientia/contracts/index.yaml.

Severity: Medium | Effort: Small


Problem 37: voxgiantia-publication-architecture.md may be a shadow SSOT

Status: This 6.7KB doc is not referenced in the main SSOT's source anchors. It is unclear if it is superseded or covers a distinct scope.

Solution:

  • Audit the doc for overlap with scientia-publication-automation-ssot.md.
  • If superseded: add deprecation header + link to current SSOT.
  • If distinct: add to SSOT source anchors with a scope label.

Severity: Low | Effort: Small


Problem 38: Syndication security docs are architecturally isolated from Scientia

Status: news_syndication_incident_patterns.md and news_syndication_security.md are not linked from the Scientia SSOT or the inbound discovery research doc.

Solution:

  • Add links from scientia-external-discovery-research-2026.md to both syndication docs in a "Security constraints" section.
  • Ensure NewsInbound preflight (Problem 4) incorporates the threat taxonomy from news_syndication_security.md.

Severity: Low | Effort: Small


Dimension 6 — Quality, Evaluation, and Autonomy Gaps

Problem 39: No golden test set for search recall

Status: The RAG SSOT §8 explicitly identifies "Recall@K golden set → Not built" as a gap.

Solution:

  • Build 50-100 labelled (query, expected_doc_ids) pairs from real orchestrator queries.
  • Add vox ci search-recall-at-k emitting Recall@5 and MRR metrics.
  • Gate on ≤5% relative regression budget per PR.

Severity: Medium | Effort: Medium


Problem 40: No RAGAS-style faithfulness metric

Status: The RAG SSOT §8 identifies "RAGAS faithfulness → Not implemented" as a gap.

Solution:

  • Implement lightweight faithfulness check: compare claim-sentences in answers against retrieved passages using existing BM25 lexical overlap logic.
  • Run as a periodic background job (not on every completion).
  • Persist results to Arca. Flag completions below min_faithfulness = 0.4 for analysis.

Severity: Medium | Effort: Medium


Problem 41: Socrates has no evaluate_research_need() dispatch path

Status: The RAG SSOT §4.4 shows SocratesResearchDecision as [PLANNED]. The struct is defined in the doc but does not exist in crates/vox-socrates-policy/src/lib.rs.

Impact: When Socrates returns Abstain, the caller has no structured signal about whether to trigger CRAG or simply decline.

Solution:

  • Implement evaluate_research_need(confidence, contradiction_ratio, complexity) -> SocratesResearchDecision in vox-socrates-policy.
  • Wire into the orchestrator's pre-generation hook.
  • Auto-dispatch CRAG when should_research = true.

Owner: vox-socrates-policy, vox-orchestrator | Severity: High | Effort: Medium


Problem 42: The Coverage Paradox fix is documented but not coded

Status: The RAG SSOT §4.3 documents the fix (only apply contradiction penalty when citation_coverage >= 0.3) as [PLANNED].

Impact: Agents fall into a refusal loop on abstract synthesis queries — the very class most relevant to Scientia research workflows.

Solution:

  • Add citation_coverage: Option<f64> parameter to classify_risk().
  • When citation_coverage < 0.3, suppress max_contradiction_ratio_for_answer penalty.
  • Add unit test: low_coverage_high_contradiction_should_ask_not_abstain.

Owner: vox-socrates-policy | Severity: High | Effort: Small


Problem 43: No Tavily credit budget tracking or doctor warning

Status: The RAG SSOT §8 identifies "Tavily credit usage → Not tracked" as a gap.

Impact: Aggressive CRAG loops can exhaust the session credit budget silently.

Solution:

  • Track tavily_credits_used: u32 in the SearchPolicy session context.
  • When usage ≥ 80% of budget, emit SearchRefinementAction::BudgetWarning.
  • Add vox clavis doctor check displaying current credit budget.

Severity: Medium | Effort: Small


Problem 44: CLI/MCP tools bypass the vox-scientia-api package boundary

Status: vox-cli/src/commands/scientia.rs and vox-mcp/src/tools/scientia_tools.rs both directly import from vox-publisher, not vox-scientia-api.

Impact: When vox-publisher is eventually split, every CLI/MCP callsite will break.

Solution:

  • Create crates/vox-scientia-api/ as a façade crate.
  • Update vox-cli and vox-mcp Cargo.toml to depend on vox-scientia-api.
  • Add FROZEN marker on vox-publisher's public surface.

Severity: Medium | Effort: Small


Problem 45: No end-to-end integration test for the Scientia lifecycle

Status: Unit tests exist for individual functions. acceptance_matrix.ps1 exists. But no integration test exercises the full pipeline: prepare → preflight → approve → scholarly-pipeline-run → status → metrics.

Solution:

  • Add tests/scientia_lifecycle_test.rs using local_ledger / echo_ledger adapters (no external credentials needed).
  • Cover: manifest creation → preflight pass → dual approval → external job tick → status assertion.
  • Add to vox ci scientia-novelty-ledger-contracts or as vox ci scientia-lifecycle.

Severity: Medium | Effort: Medium


Summary Priority Matrix

#ProblemSeverityEffortOwner Crate
1No inbound pipeline crateCriticalLargevox-scientia-ingest (new)
4No Socrates inbound profileCriticalMediumvox-publisher, vox-socrates-policy
6No external intelligence DB tableCriticalSmallvox-db
9Publications never re-enter search corporaCriticalMediumvox-publisher, vox-search
18Prior art overlaps always None in rank_candidate()HighSmallvox-publisher
11Socrates Abstain events not persistedHighSmallvox-db, vox-socrates-policy
12CRAG results not stored backHighSmallvox-search
14Discovery ranking constants hardcoded in RustHighSmallvox-publisher
16No multi-hop hypothesis chain generationHighLargevox-scientia-core
21Lane G not integrated into Scientia evidence flowHighMediumvox-orchestrator
22LaTeX package builder absentHighLargevox-publisher
24Crossref adapter not wiredHighMediumvox-publisher
26Zenodo adapter metadata-only, no HTTP depositHighMediumvox-publisher
27No automatic submission status syncHighMediumvox-db, vox-publisher
32Worthiness scoring split across 5 locationsHighMediumvox-publisher, contracts
41Socrates research dispatch not codedHighMediumvox-socrates-policy
42Coverage Paradox fix not codedHighSmallvox-socrates-policy
5No semantic deduplication inboundMediumSmallvox-scientia-ingest
7No Scholarly Digest contractMediumMediumcontracts, vox-scientia-core
10Evidence packs not in knowledge graphMediumMediumvox-scientia-core, vox-search
13No FindingsDraftCorpus in RAGMediumMediumvox-search
15No signal code registry/CI checkMediumSmallcontracts, CI
19Evidence completeness uses equal weightsMediumSmallvox-publisher
20No contamination risk detectionMediumMediumvox-scientia-core
23arXiv format preflight missingMediumSmallvox-publisher
25CITATION.cff generation incompleteMediumSmallvox-publisher
28Author/co-author model mismatchMediumSmallvox-publisher, vox-db
29No revision lifecycle mappingMediumMediumvox-db, vox-publisher
30Double-blind anonymization gate is partialMediumSmallvox-publisher
33Wave backlog has no CI trackingMediumSmallCI, vox-cli
34vox-publisher God Object not splitMediumLargeAll Scientia crates
36Contract index missing scientia registrationsMediumSmallcontracts
39No golden test set for search recallMediumMediumvox-search
40No RAGAS-style faithfulness metricMediumMediumvox-search, vox-db
43No Tavily credit trackingMediumSmallvox-search, vox-clavis
44CLI/MCP bypass vox-scientia-api boundaryMediumSmallvox-cli, vox-mcp
45No lifecycle integration testMediumMediumvox-db
2No RSS/Atom feed parsing crateMediumSmallvox-scientia-ingest
8No feed source registry tableMediumSmallvox-db
17No experimental design scaffoldingMediumMediumvox-scientia-core
3No Reddit/HN inbound read pathLowMediumvox-publisher
31HN submission unstructured handoffLowSmallvox-publisher
35Research index missing RAG SSOT linkLowSmalldocs
37Shadow SSOT doc voxgiantia-publication-architecture.mdLowSmalldocs
38Syndication security docs isolated from ScientiaLowSmalldocs

Wave 0 — Quick Wins (1–3 days each, unblock parity and safety)

  • P18: Fix rank_candidate() always-None novelty overlap
  • P42: Code the Coverage Paradox fix in classify_risk()
  • P43: Add Tavily credit tracking and doctor warning
  • P15: Add discovery signal code registry and CI check
  • P19: Load evidence completeness weights from YAML
  • P44: Create vox-scientia-api façade and update CLI/MCP

Wave 1 — Foundation Hardening (1–2 weeks)

  • P11: Persist Socrates Abstain events to Arca
  • P12: Store CRAG results back into DocumentChunks
  • P14: Load ScientiaHeuristics from YAML contract
  • P28: Author/co-author model preflight + soft-deprecation
  • P32: Unify worthiness scoring to YAML source of truth + parity CI
  • P35, P36, P37, P38: Documentation and contract housekeeping
  • P41: Implement evaluate_research_need() dispatch in Socrates
  • P33: Add vox scientia wave-status CLI

Wave 2 — Inbound Pipeline (new crate focus)

  • P6: Add scientia_external_intelligence DB table
  • P8: Add scientia_feed_sources DB table and CLI commands
  • P1: Create vox-scientia-ingest crate shell
  • P2: Wire feed-rs for RSS/Atom crawling
  • P4: Add PreflightProfile::NewsInbound in Socrates
  • P5: Add IngestDeduplicator against embeddings table
  • P7: Add scholarly-digest.v1.schema.json + digest-generate CLI

Wave 3 — RAG Feedback Loop

  • P9: PostPublishIndexer — publications back into DocumentChunks
  • P10: EvidencePackKGIndexer — evidence chains into KnowledgeGraph
  • P13: FindingsDraftCorpus variant for in-progress findings

Wave 4 — Discovery Intelligence Upgrade

  • P16: HypothesisChainBuilder with Lane G integration
  • P17: experiment-scaffold CLI
  • P20: ContaminationRiskAssessor
  • P21: Wire Lane G into the Scientia synthesis path

Wave 5 — Outbound Publication Completeness

  • P22: LaTeX/template engine in SubmissionPackageBuilder
  • P23: PreflightProfile::ArxivFormat
  • P24: CrossrefAdapter wired
  • P25: Complete citation_cff.rs and wire CLI
  • P26: ZenodoDepositClient HTTP submit
  • P27: Auto status sync scheduler + milestone_events table
  • P29: scholarly_revision_map table
  • P30: Extended double-blind anonymization scan
  • P31: Structured HnHandoffPayload

Wave 6 — God Object Split and Structural

  • P34: Extract vox-scientia-core from vox-publisher
  • P45: Lifecycle integration test suite

Wave 7 — Quality and Evaluation

  • P39: Golden recall test set + vox ci search-recall-at-k
  • P40: Lightweight RAGAS-style faithfulness metric

Appendix: Cross-References

ConcernPrimary SSOTOwner Crate
Publication pipelinescientia-publication-automation-ssot.mdvox-publisher
RAG retrievalrag-and-research-architecture-2026.mdvox-search
Hallucination gatevox-socrates-policy/src/lib.rsvox-socrates-policy
Evidence modelscientia_evidence.rs, scientia-evidence-graph.schema.jsonvox-publisher
Discovery rankingscientia_discovery.rs, publication-worthiness.default.yamlvox-publisher
Inbound discoveryscientia-external-discovery-research-2026.mdvox-scientia-ingest (TBD)
MENS Lane Gmens-research-track-blueprint-2026.mdvox-orchestrator
Worthiness signalsworthiness-signals.v2.schema.jsoncontracts
Impact/readershipscientia-impact-readership-research-2026.mdassistive only
Automation boundariesscientia-publication-worthiness-ssot-unification-research-2026.mdpolicy