SCIENTIA publication readiness audit
Primary companion SSOT documents:
docs/src/architecture/scientia-publication-automation-ssot.mddocs/src/reference/scientia-publication-worthiness-rules.mddocs/src/reference/scientia-ssot-handbook.md(glossary, status vocabulary, checklists, SLOs)
Goal and scope
This audit maps the current SCIENTIA publication architecture in Vox to publication requirements needed for:
- core AI journals and workflows (
JMLR,TMLR,JAIR, and common ML journal expectations), - self-publication and archival identifiers (
arXiv,Zenodo,Crossref-grade metadata).
It also defines the implementation gap between where the codebase is now and what is needed for end-to-end automated scientific publication.
Current architecture baseline (where we are)
Implemented publication surfaces
- CLI facade:
vox scientiadelegates tovox dbpublication lifecycle handlers.crates/vox-cli/src/commands/scientia.rscrates/vox-cli/src/commands/db.rs
- Canonical publication object with digest hashing:
crates/vox-publisher/src/publication.rs
- Scholarly adapter interface and current local adapter:
crates/vox-publisher/src/scholarly/
- Persistence and state ledger:
crates/vox-db/src/schema/domains/publish_cloud.rscrates/vox-db/src/store/ops_publication.rs
- MCP parity tooling:
crates/vox-orchestrator/src/mcp_tools/tools/scientia_tools.rscontracts/mcp/tool-registry.canonical.yaml
- Existing docs and decision record:
docs/src/adr/011-scientia-publication-ssot.mddocs/src/how-to/how-to-scientia-publication.md
Implemented workflow
- Prepare manifest (
publication-prepare) - Run
publication-preflightand follow orderednext_actions - Record digest-bound approvals (
publication-approve) - Use
publication-scholarly-pipeline-runas the default scholarly path (dry-run first, then live) - Track state/submissions/checklist state in
publication-status
Architecture strengths
- Canonical
PublicationManifestwith stable digest. - Strong digest-bound approval semantics (dual approver gate).
- Durable ledger tables for manifest, approvals, attempts, scholarly submissions, and status events.
- CLI and MCP both expose the same lifecycle primitives.
Current adapter reality (2026-03)
Code ships local_ledger, echo_ledger, and credentialed zenodo / openreview adapters behind VOX_SCHOLARLY_ADAPTER, plus operator-assisted arXiv via staging export + handoff events. Journal portals (ScholarOne, native TMLR UI-only flows) and automated Crossref deposit remain out of scope until wired.
Phase 0 metadata (implemented)
Publication manifests may embed structured scholarly fields under metadata_json.scientific_publication (see vox_publisher::scientific_metadata). CLI: vox scientia publication-prepare … --scholarly-metadata-json <file>. MCP: optional scholarly_metadata object on vox_scientia_publication_prepare. This keeps the digest-bound contract while normalizing authors, license, funding, and reproducibility attestations for upcoming adapters.
External requirements matrix (where the target ecosystem is)
Core AI journals and venues
| Venue/workflow | Key requirements relevant to automation | Source |
|---|---|---|
JMLR | Mandatory official style, camera-ready source archive, reproducible build of manuscript, strict final preparation checks. | JMLR author guide |
TMLR | OpenReview submission flow, mandatory TMLR template, anonymized double-blind submission, ethics/broader-impact conditions when risk applies, supplementary reproducibility artifacts encouraged. | TMLR author guide, TMLR submissions |
JAIR | Mandatory JAIR style/template, production-ready source bundle, final formatting checklist, publication agreement and source package expectations. | JAIR final preparation, JAIR formatting |
| Common ML journal norm | Replication-oriented methodology, software/data disclosure expectations, statistical reporting quality. | Machine Learning journal info summary |
Self-publication and identifier systems
| Platform | Key requirements relevant to automation | Source |
|---|---|---|
arXiv | Registered submitter flow, accepted source/figure constraints, strict packaging/file naming, metadata quality and moderation rules. | arXiv submission guidelines, arXiv format policy |
Zenodo | GitHub release archiving flow, .zenodo.json and/or CITATION.cff, metadata precedence and richer Zenodo-specific metadata support. | Zenodo .zenodo.json, Zenodo CITATION.cff |
Crossref | DOI-quality metadata schema with required and recommended fields; richer records require contributors, ORCID, funding, license, citations, abstracts. | Crossref required/recommended metadata |
Automation feasibility notes
OpenReview(relevant toTMLR) supports API-based note/submission operations, but venue-level invitations and permissions still govern what automation can execute.ScholarOneexposes web services APIs, but practical automation requires site-specific API provisioning and credentials from the hosting publisher.arXivautomation is generally packaging-focused; final submit flow is account and policy bound.
Gap analysis (where we need to go)
Lifecycle stage 1: authoring and package assembly
| Item | Current SCIENTIA state | Gap | Risk | Recommended slice |
|---|---|---|---|---|
| Journal template support | Stores markdown body only | No template-aware build for JMLR/TMLR/JAIR | Submission rejects or manual rebuilds | Add SubmissionPackageBuilder with template profiles (jmlr, tmlr, jair, arxiv) |
| Source bundle generation | No camera-ready archive builder | No zip/tar source pack with compile validation | Delays and formatting failures | Add package artifact table + generated archives + compile check |
| Figure and asset checks | No figure policy validation | No arXiv/journal file format checks | Hard submission failures | Add preflight validator (file names, format family, missing includes) |
Lifecycle stage 2: metadata normalization
| Item | Current SCIENTIA state | Gap | Risk | Recommended slice |
|---|---|---|---|---|
| Author metadata | Primary author string plus optional metadata_json.scientific_publication.authors | Digest and CLI still use single author for simplicity; full co-author list lives in JSON block | Mismatches if author string disagrees with authors[] | Prefer deriving display author from first scientific author when present; validate consistency in preflight (Phase 1) |
| Funding/COI/license | Free-form metadata_json only | No normalized compliance fields | Compliance omissions | Add strongly typed compliance block |
| Citations | Optional citations_json blob | No schema/validation/export adapters (BibTeX/JATS/Crossref maps) | Inconsistent citation data | Add citation schema + exporters |
Lifecycle stage 3: policy and compliance gates
| Item | Current SCIENTIA state | Gap | Risk | Recommended slice |
|---|---|---|---|---|
| Double-blind readiness | Dual approver gate exists | No anonymization gate/checklist | Desk reject risk for blind review venues | Add anonymization scanner and attestation |
| Ethics/broader impact | No explicit policy object | No risk flag / statement requirements | Ethics non-compliance | Add policy declarations + required fields by venue |
| Data/code availability | No reproducibility declaration schema | No explicit artifact disclosure gate | Reproducibility review friction | Add reproducibility checklist schema + gate |
Lifecycle stage 4: submission adapters
| Item | Current SCIENTIA state | Gap | Risk | Recommended slice |
|---|---|---|---|---|
| Journal/preprint connectors | local_ledger, echo_ledger, zenodo, openreview, plus arXiv-assist staging/handoff | No Crossref or journal-portal adapters; some venues remain human-submit by design | Manual steps persist for account-bound portals and DOI deposit | Keep current adapters, add Crossref export/deposit only when operationally real |
| Venue-specific payloads | Manifest + staging/export helpers exist for Zenodo/OpenReview/arXiv-assist | Still no single default checklist across scholarly/social surfaces without reading multiple docs | Operator routing overhead | Use publication-preflight / publication-status as the checklist surfaces and publication-scholarly-pipeline-run as the default path |
| Retry/idempotency semantics | Digest-bound jobs, polling, and retry taxonomy exist | Worker preflight and permanent-vs-retryable classification need to stay aligned with operator preflight | Operational fragility if workers retry conceptually permanent failures | Reuse preflight in worker ticks and keep a small explicit classification enum |
Lifecycle stage 5: post-submission tracking
| Item | Current SCIENTIA state | Gap | Risk | Recommended slice |
|---|---|---|---|---|
| External status sync | Records local submit receipt/state | No remote status poll/ingest | State drift | Add periodic status sync job + transition mapping |
| Revision lifecycle | Version increments on digest change | No venue revision linkage semantics | Confusing revision history | Add external revision ID mapping |
| Acceptance/publication milestones | Generic status rows | No normalized milestone model | Weak reporting | Add milestone events (submitted, under_review, accepted, published) |
Lifecycle stage 6: archival and citation outputs
| Item | Current SCIENTIA state | Gap | Risk | Recommended slice |
|---|---|---|---|---|
| DOI and identifier strategy | No real DOI submission adapter | No DOI minting workflow support | No persistent identifier automation | Add DOI adapter path (Zenodo first, Crossref metadata export next) |
| Citation files | No generated CITATION.cff / .zenodo.json | Missing machine-readable citation assets | Reduced discoverability and citation quality | Add deterministic metadata exporters |
| Publication package provenance | Digest present | No signed or policy-bound package attestation | Trust and audit gaps | Add package provenance manifest derived from digest |
Detailed architecture recommendation
flowchart LR
manuscriptSource[ManuscriptSource] --> packageBuilder[SubmissionPackageBuilder]
packageBuilder --> complianceGates[PolicyAndFormatGates]
complianceGates --> metadataMapper[MetadataMapper]
metadataMapper --> adapterRouter[AdapterRouter]
adapterRouter --> journalAdapters[JournalAdapters]
adapterRouter --> preprintAdapters[PreprintAdapters]
adapterRouter --> doiAdapters[DoiAdapters]
journalAdapters --> statusSync[SubmissionStatusSync]
preprintAdapters --> statusSync
doiAdapters --> statusSync
statusSync --> codexLedger[CodexPublicationLedger]
codexLedger --> readinessReports[ReadinessAndOpsReports]
Implementation roadmap
Phase 0 (immediate): schema and policy groundwork
- Extend publication metadata shape in
vox-publisherandvox-dbwith:authors[]with ORCID/affiliation,- funding/conflict/license fields,
- reproducibility and ethics declarations.
- Keep backward compatibility by storing new typed blocks in additive fields before strict migration.
Phase 1 (MVP automation): package and gate engine
- Done (core):
vox_publisher::publication_preflight(metadata parse, author alignment, citations JSON, double-blind email scan, readiness score). CLI:publication-prepare --preflight,publication-prepare-validated,publication-preflight. MCP:vox_scientia_publication_prepare(preflight,preflight_profile),vox_scientia_publication_preflight. - Done (Zenodo bridge):
vox_publisher::zenodo_metadata::zenodo_deposition_metadata+ CLIpublication-zenodo-metadata(metadata JSON only; no HTTP). - Remaining: LaTeX/camera-ready package builder, figure/filename validators, template compliance against JMLR/TMLR/JAIR style packs.
Phase 2 (first external adapters): self-publication first
- Implement adapters in this order:
Zenodoarchive/DOI submission path,OpenReviewsubmission pathway forTMLR-style workflows,- assisted
arXivpackage export and submit handoff, Crossrefmetadata export/deposit pathway when operationally enabled.
- Persist adapter credentials/config via existing
VOX_*conventions and policy gates.
Phase 3 (operations): status sync and revision intelligence
- Add scheduled status synchronization and retry jobs.
- Normalize external status transitions into
publication_status_events. - Add revision mapping between local digest versions and external revision IDs.
Phase 4 (reporting and governance)
- Add readiness dashboards and compliance reports:
- metadata completeness rate,
- submission success/failure rate by adapter,
- median time from draft to submitted/published.
- Add CI checks for publication metadata schema conformance.
Concrete code touchpoints for implementation
- Contract and model:
crates/vox-publisher/src/publication.rscrates/vox-publisher/src/scholarly/
- DB schema and operations:
crates/vox-db/src/schema/domains/publish_cloud.rscrates/vox-db/src/store/ops_publication.rs
- CLI:
crates/vox-cli/src/commands/db.rscrates/vox-cli/src/commands/scientia.rs
- MCP:
crates/vox-orchestrator/src/mcp_tools/tools/scientia_tools.rscontracts/mcp/tool-registry.canonical.yaml
Recommended KPIs
submission_readiness_score: percent of required fields and checks passed for target venue.time_to_submission_ms: draft to first external submission.submission_success_rate: successful submissions per adapter.revision_turnaround_ms: digest update to remote revision acknowledgement.metadata_completeness_rate: share of records with ORCID/funding/license/citations populated.
Rollout stages, legacy modes, and ledger metrics
Stages (recommended):
- Dev / CI —
local_ledger/echo_ledgeronly; no live repository credentials. - Staging — turn on one live adapter with Clavis-backed secrets and per-adapter
VOX_SCHOLARLY_DISABLE_*kill-switches; runpublication-preflight(and venue staging export) before submit. - Production — dual digest-bound approval enforced; a scheduler or supervisor runs
publication-external-jobs-tickandpublication-scholarly-remote-status-sync-batch(or their loop variants with bounded iterations). Operator-assisted arXiv usespublication-arxiv-handoff-recordfor append-only audit rows.
Legacy / restricted: Treat echo-only and dry-run paths as non-production. Shared developer profiles must not embed production Zenodo/OpenReview tokens.
Operational metrics: vox scientia publication-external-pipeline-metrics (alias: vox db publication-external-pipeline-metrics) returns a read-only JSON rollup: job counts by status and adapter (plus in-window slices), attempt/retry totals, error_class histogram, terminal latency averages and p50/p90/p99 in the window, per-adapter terminal success and retry ratios (metrics_schema_version 2), snapshot activity, scholarly submission rows (in-window slice), and publication_attempts counts by channel. KPI baselines: capture periodic snapshots of this JSON (e.g. weekly) for regression review.
Fast local acceptance slice: pwsh -File scripts/scientia/acceptance_matrix.ps1 runs publication DB integration tests and scholarly_remote_status unit tests.
Conclusion
SCIENTIA already has a strong publication ledger and governance core (manifest + digest + approvals + durable state tracking). The main gap is not control-plane integrity; it is publication-system interoperability and venue-specific packaging/compliance automation. The recommended path is to keep the current SSOT model and add typed metadata, preflight gates, and real adapters in phased order.