"Scientia Publication Endpoints — Ground-Truth Research & Implementation Policy (April 2026)"

Scientia Publication Endpoints — Ground-Truth Research & Implementation Policy (April 2026)

[!IMPORTANT] This is v2 of the endpoint research. It supersedes the v1 written earlier in the same session. Web searches and code audit conducted 2026-04-13. Covers all files in crates/vox-publisher/src/adapters/, crates/vox-publisher/src/scholarly/, crates/vox-publisher/src/switching.rs, crates/vox-publisher/src/syndication_outcome.rs, crates/vox-publisher/src/types.rs, crates/vox-publisher/src/gate.rs, crates/vox-publisher/src/social_retry.rs, and crates/vox-publisher/src/scientia_heuristics.rs.


Table of Contents

  1. How to Read This Document
  2. Cross-Cutting Structural Audit
  3. Platform-by-Platform Audit (Social / Community)
  4. Platform-by-Platform Audit (Scholarly / Archival)
  5. ResearchGate — Full Policy Analysis
  6. New Scholarly Targets (ORCID, Figshare)
  7. Platform Priority Matrix (Updated)
  8. Hallucination Inventory (Updated)
  9. Unified SSoT Data Model Requirements
  10. Implementation Policy
  11. Task Backlog (Updated)

1. How to Read

For each channel:

  • Code reality — exact file + line count + what it actually does.
  • True API mechanics — verified, sourced.
  • Gap delta — specific discrepancies numbered EP-NNN for traceability.
  • Maintenance burden — how much ongoing work this will require.
  • Recommendation — keep / fix / defer / do not implement.

2. Cross-Cutting Structural Audit

These gaps span multiple adapters and must be fixed as a baseline before any adapter-specific work.

2.1 social_retry.rs is Dead Code

social_retry.rs (82 lines) defines run_with_retries, budget_from_distribution_policy, and SocialRetryBudget. This is well-designed infrastructure. However, grep across the entire publisher crate reveals zero call sites for run_with_retries. The retry system exists but is never invoked.

EP-001 (Critical): Wire run_with_retries into all social adapter dispatch paths before considering any adapter "complete." Without this, a single transient 429 or network error fails the entire publication attempt and leaves persistent retry state inconsistent.

The correct pattern (to be applied uniformly):

#![allow(unused)]
fn main() {
let budget = social_retry::budget_from_distribution_policy(&item);
let result = social_retry::run_with_retries(budget, || async {
    some_adapter::post(...).await
}).await;
}

2.2 switching.rs Channel Registry Is Stale and Incomplete

switching.rs::apply_channel_allowlist (line 285–311) handles: rss, twitter, github, open_collective, reddit, hacker_news, youtube, crates_io.

EP-002 (High): bluesky, mastodon, linkedin, discord are present in SyndicationConfig (types.rs) and SyndicationResult (syndication_outcome.rs) but are absent from apply_channel_allowlist, failed_channels, successful_channels, and outcome_for_channel in switching.rs.

Consequence: These four channels can never be gated by the allowlist system, never appear in retry plans, and their outcomes are invisible to the retry infrastructure even though SyndicationResult tracks them.

EP-003 (High): normalize_distribution_json_value_with_warnings also omits bluesky, mastodon, linkedin, discord from the contract-shape expansion block (lines 193–211). Publishing via the channels/channel_payloads contract shape will silently ignore these four channels.

2.3 SyndicationResult vs switching.rs Channel Mismatch

SyndicationResult has fields: rss, twitter, github, open_collective, reddit, hacker_news, youtube, crates_io, bluesky, mastodon, linkedin, discord.

switching.rs::outcome_for_channel matches only: rss, twitter, github, open_collective, reddit, hacker_news, youtube, crates_io.

EP-004 (High): The four newer channels have outcomes tracked in SyndicationResult but cannot be addressed by name in retry plans. plan_publication_retry_channels will return blocked_channels with reason: "unknown_channel" for these.

2.4 OpenCollective Adapter Uses Wrong Auth Header

opencollective.rs line 46: .header("Api-Key", token).

The Open Collective GraphQL API v2 uses Personal-Token: {token} as the documented header, not Api-Key. The authenticated endpoint header is Personal-Token.

✅ UPDATE: After verifying OC's API, the header Api-Key is the legacy form which was still accepted as of the audit date, but official docs use Personal-Token. Low severity but should be updated.

EP-005 (Low): Update opencollective.rs header from Api-Key to Personal-Token to align with documented API and avoid breakage if OC deprecates the legacy header.

2.5 makePublicOn Hardcoded to Null in OpenCollective

opencollective.rs line 37: "makePublicOn": null — hardcoded, ignoring config.scheduled_publish_at.

EP-006 (Medium): The OpenCollectiveConfig struct (types.rs line 172) already has scheduled_publish_at: Option<DateTime<Utc>> but the adapter never uses it.

Fix: "makePublicOn": config.scheduled_publish_at.map(|dt| dt.to_rfc3339()).

types.rs line 109: pub link_facet: bool in BlueskyConfig. The bluesky.rs adapter does not implement link facets (rich embed cards with thumbnails). This bool is declared but does nothing — a silent broken promise.

EP-007 (Medium): Either implement AT Protocol $type: app.bsky.embed.external facets or remove the link_facet field and document that richtext facets are deferred.

2.7 content_sha3_256 Includes syndication in Hash — Behavioral Risk

types.rs line 478: "syndication": self.syndication is included in the SHA3-256 content hash. This means changing any syndication routing config (e.g., adding a new channel, changing a dry_run flag) produces a different digest, triggering the dual-approval gate for content that did not actually change.

EP-008 (Medium): The hash should capture content (title, author, body, tags), not routing configuration. Suggest separating content_hash from routing_hash. Content identity should be stable across syndication config changes.

2.8 GitHub Adapter May Create Issues Instead of Discussions

github.rs line 95: calls provider.create_discussion_or_issue(...). The vox-forge trait method is create_discussion_or_issue — the name implies a fallback to Issue creation if Discussion creation fails or if the repo doesn't have Discussions enabled.

EP-009 (Medium): For SCIENTIA publication events, creating an Issue instead of a Discussion is a UX regression (Issues appear in the bug tracker). Verify GitForgeProvider::create_discussion_or_issue never silently falls back to Issue creation when Discussion categories exist. If it does, rename and harden.

2.9 HackerNewsConfig Has No comment_draft Field

types.rs line 211–219 defines HackerNewsConfig with only mode, title_override, url_override. No field for the first-comment draft text.

EP-010 (Low): Add comment_draft: Option<String> to HackerNewsConfig for the queued handoff workflow. Without it, the manual assist output is incomplete.

2.10 No dry_run Guard in YouTube Adapter

youtube.rs::upload_video (line 107): No check of any dry_run flag before calling refresh_access_token, reading the video file from disk, or initiating the resumable upload. A dry-run pass will incur disk I/O and OAuth token refresh.

EP-011 (High): Add if cfg.dry_run { return Ok(format!("dry-run-youtube-{}", ...)); } before any I/O. This requires plumbing dry_run through the adapter signature (currently missing from upload_video's parameter list).

2.11 MastodonConfig.status vs status_text Schema Inconsistency

types.rs line 114: pub status: Option<String> in MastodonConfig. This is the full toot text. However, the Mastodon API field name is also status (in the POST body). But the previous audit documentation referred to it as status_text. The code uses status — this is correct but the documentation (playbook) was inconsistent.

No code fix needed here — the types.rs field name is correct. Audit note only.

2.12 Bluesky.rs Requests Wrong PDS Endpoint

Confirmed in v1 audit: bsky.social is hardcoded at lines 46 and 74. AT Protocol requires resolving the user's PDS from their DID first. Additionally:

EP-012 (Critical): CreateSessionResponse at line 14 expects field access_token but the AT Protocol XRPC response returns accessJwt. This is a compilation-time silent bug — Serde will deserialize successfully but produce an empty string because the field name doesn't match. Every Bluesky post is failing silently.

2.13 social_retry.rs Does Not Parse Retry-After Headers

run_with_retries uses a geometric backoff based on attempt number. It does not inspect HTTP response bodies or headers (it receives Result<T, E>) and thus cannot honour a platform's Retry-After header.

EP-013 (Medium): Extend the retry system to accept platform-specified retry delays. Options:

  1. Make the error type carry an optional retry_after_ms.
  2. Or for specific adapters, parse Retry-After before returning Err and sleep inline.

Option 2 is simpler per adapter. Option 1 is cleaner but requires a new error type.


3. Social Channels (Community Distribution)

3.1 Discord (Webhook)

Code Reality

adapters/discord.rs52 lines, implemented. Uses VoxSocialDiscordWebhook Clavis secret. Sends content + optional embed. Respects dry_run. Uses CRLF line endings (mixed in the file — minor hygiene).

True API Mechanics (2026-04-13)

  • Webhook URL format: https://discord.com/api/webhooks/{id}/{token}.
  • Body: JSON, requires at least one of content, embeds, files, components.
  • content ≤ 2,000 chars. embeds array: max 10 embeds per message. Per-embed: 25 fields, field name ≤ 256, field value ≤ 1,024, embed description ≤ 4,096. Total chars across all embeds ≤ 6,000.
  • Embed color must be decimal integer (e.g., 5793266), not hex string.
  • Only HTTPS image URLs work.
  • Rate limits: per-route, dynamic. Parse X-RateLimit-* headers. IP restriction after 10,000 invalid requests per 10 minutes.

Gap Delta

IDGapSeverity
EP-001run_with_retries not wiredCritical
EP-002Channel absent from allowlist/retry infraHigh
EP-014No content length check (≤ 2,000 chars)Medium
EP-015Total embed char budget (6,000) not enforcedMedium
EP-016embed_color accepts u32 but no doc why not hexLow

Recommendation

Ship. Implement EP-001, EP-002, EP-014. Discord is the highest-confidence adapter.


3.2 Reddit

Code Reality

adapters/reddit.rs129 lines. OAuth refresh token grant (correct). User-Agent correctly sent on both the OAuth endpoint AND the submit endpoint (line 107: .header("User-Agent", auth.user_agent)). Previous v1 audit incorrectly flagged User-Agent on submit as missing — this is corrected.

However: no 40,000-char limit check. No social_retry.rs wiring.

True API Mechanics (2026-04-13)

  • submit scope required. Endpoint: POST https://oauth.reddit.com/api/submit.
  • Self-post text: 40,000 char hard server limit.
  • Link title: 300 char.
  • User-Agent format: <platform>:<app_id>:<version> by u/<username>.
  • Rate limit: 60 requests/minute per OAuth client.
  • AI/ML training prohibition on data: explicit ToS violation.

Gap Delta

IDGapSeverity
EP-001run_with_retries not wiredCritical
EP-002Channel absent from allowlist/retry infraHigh
EP-017No 40,000-char self-post text validationHigh
EP-018No link title 300-char validationMedium
EP-019No subreddit allowlist policy enforcementHigh
EP-020Reddit AI training prohibition not documentedHigh
CorrectionUser-Agent IS sent on submit (v1 was wrong)

Recommendation

Fix EP-017/019 and ship with human-gate policy.


3.3 Twitter / X

Code Reality

adapters/twitter.rs115 lines, CRLF endings. Posts to /2/tweets via Bearer token. Thread mode supported. No 429 handling.

True API Mechanics (2026-04-13)

  • Write access (posting) requires paid plan. Free tier: write access only for "Public Utility." Pay-as-you-go launched February 2026.
  • Rate limits: per-tier, per endpoint, dual 15-min/24-hour windows.
  • Bearer token = app-only auth (posting on behalf of app). OAuth 2.0 user-context needed for user posts.

Gap Delta

IDGapSeverity
EP-001run_with_retries not wiredCritical
EP-002Channel absent from allowlist/retry infraHigh
EP-021Paid plan required — not gatedCritical
EP-022No per-session tweet budgetHigh

Recommendation

Gate behind vox clavis doctor billing status check. Do not dispatch until billing verified.


3.4 Bluesky (AT Protocol)

Code Reality

adapters/bluesky.rs95 lines. Creates session, posts record.

Critical Bugs (EP-012 is confirmed):

  1. CreateSessionResponse.access_token ← should be accessJwt. Silent deserialization failure.
  2. bsky.social hardcoded at both the session URL and the record URL.
  3. No refreshJwt management — new session created per post call.
  4. BlueskyConfig.link_facet field (types.rs) is declared but adapter never uses it (EP-007).
  5. No grapheme cluster count for 300-char limit.
  6. dry_run parameter not in signature — never passed from dispatcher.

True API Mechanics (2026-04-13)

  • Auth: App Password → createSessionaccessJwt (short-lived) + refreshJwt (long-lived).
  • PDS: Must NOT hardcode bsky.social. Resolve via DID document lookup per user handle.
  • Post NSID: app.bsky.feed.post, collection: app.bsky.feed.post.
  • Rate limits: 5,000 pts/hour, 35,000 pts/day; post = 3 pts; createSession = 30/5min.
  • Char limit: 300 grapheme clusters (not bytes or code points).

Gap Delta

IDGapSeverity
EP-012access_token field name wrong — silent failureCritical
EP-001run_with_retries not wiredCritical
EP-002Channel absent from allowlist/retry infraHigh
EP-023bsky.social hardcoded PDSCritical
EP-024No refreshJwt session cachingHigh
EP-007link_facet field declared but unusedMedium
EP-025No grapheme-cluster char countMedium
EP-026dry_run not plumbed to adapterHigh

Recommendation

Fix EP-012 immediately (1-line). Fix EP-023. These are blocking. Then ship.


3.5 Mastodon

Code Reality

adapters/mastodon.rs14 lines, hard stub. Returns Err("Mastodon adapter not implemented").

MastodonConfig in types.rs has: status, visibility, sensitive, spoiler_text.

True API Mechanics (2026-04-13)

  • Per-instance access token, write:statuses scope.
  • POST https://{instance}/api/v1/statuses, Authorization: Bearer {token}.
  • status ≤ 500 chars (default; configurable per instance).
  • Media: separate upload endpoint → id → include in status.
  • Rate limits: 300 requests/5 minutes. Response headers: X-RateLimit-Limit/Remaining/Reset.
  • Visibility: public, unlisted, private, direct.
  • language: ISO 639 code; improves discoverability.
  • spoiler_text: content warning header.

Gap Delta

IDGapSeverity
EP-001run_with_retries not wiredCritical
EP-002Channel absent from allowlist/retry infraHigh
EP-027Adapter is a stub — ~50 lines neededCritical
EP-028language field missing from MastodonConfigMedium
EP-029No instance URL in MastodonConfigCritical
EP-030No 500-char status text validationMedium

MastodonConfig is missing instance_url: String — the adapter would have nowhere to POST without it.

Recommendation

Highest-ROI unimplemented adapter. Implement now (~60 lines). Add instance_url + language to MastodonConfig.


3.6 LinkedIn

Code Reality

adapters/linkedin.rs14 lines, hard stub. Returns Err("LinkedIn adapter not implemented"). Note says "awaiting App approval."

LinkedInConfig in types.rs has: text, visibility.

True API Mechanics (2026-04-13)

  • ugcPosts API is deprecated. Must use Posts API: POST https://api.linkedin.com/v2/posts.
  • Required headers: Linkedin-Version: {YYYYMM}, X-Restli-Protocol-Version: 2.0.0.
  • Auth: 3-legged OAuth. Access tokens valid 60 days — mandatory refresh flow.
  • Post body must include author URN: "urn:li:person:{id}" or "urn:li:organization:{id}".
  • App review required for production w_member_social scope.
  • Media pre-upload required via Images/Videos API → URN reference in post body.
  • Rate limits: not published; monitor via Analytics tab.
  • api_version header needs to be updated regularly (date-versioned).

Gap Delta

IDGapSeverity
EP-001run_with_retries not wiredCritical
EP-002Channel absent from allowlist/retry infraHigh
EP-031Adapter is a stubHigh
EP-032author_urn missing from LinkedInConfigcan't post without itCritical
EP-033api_version field missing — required headerHigh
EP-034App review is an organizational blockerBlocker
EP-035No 60-day token expiry / refresh managementHigh

Recommendation

Defer until after Mastodon ships AND LinkedIn App Review completes AND organizational decision on posting identity (person vs org page) is made.


3.7 Hacker News

Code Reality

adapters/hacker_news.rs — small file, ManualAssist mode only. No HTTP write calls.

HackerNewsConfig has mode, title_override, url_override. Missing: comment_draft (EP-010).

True API Mechanics (2026-04-13)

  • Official HN API is read-only. No write/submit API exists.
  • Programmatic posting is impossible through official channels.
  • Show HN requirements: title starts with "Show HN:", must be a working thing, no landing pages, engage with comments.

Recommendation

ManualAssist is the architecturally correct permanent posture. Add EP-010 (comment_draft). Done.


3.8 YouTube

Code Reality

adapters/youtube.rs211 lines, CRLF endings. Well-implemented resumable upload. Missing: dry_run check (EP-011).

True API Mechanics (2026-04-13)

  • All unverified projects: videos forced private. Compliance Audit required for public uploads.
  • Quota: 10,000 units/day, resets midnight PT. videos.insert = ~100 units.
  • Resumable upload: correctly implemented.
  • OAuth: refresh_token grant — correctly implemented.

Gap Delta

IDGapSeverity
EP-011No dry_run guard before disk I/O + OAuthHigh
EP-036Compliance Audit required — no doctor gateCritical
EP-037No quota budget trackingMedium
EP-001run_with_retries around uploadMedium

Recommendation

Gate behind compliance audit status in vox clavis doctor. Add dry_run guard. Done.


3.9 Open Collective

Code Reality

adapters/opencollective.rs79 lines, implemented. GraphQL createUpdate mutation. makePublicOn: null hardcoded (EP-006). Auth header may need migration (EP-005).

Recommendation

Fix EP-005 and EP-006. Ship.


3.10 GitHub

Code Reality

adapters/github.rs102 lines, implemented via vox-forge::GitHubProvider. Routes Discussion vs Release. Function name create_discussion_or_issue raises concern (EP-009).

Recommendation

Audit vox-forge for Issue fallback. If clean, ship as-is.


3.11 RSS

Code Reality

adapters/rss.rs5.7 KB, implemented. Self-hosted. No external API.

Recommendation

Ship. Low risk.


4. Scholarly Channels

4.1 Zenodo

Code Reality

scholarly/zenodo.rs20 KB. Metadata generation is thorough. Per scientia-publication-automation-ssot.md: "partial (metadata done, upload/deposit not done)." However this file is large enough to potentially contain HTTP calls — requires direct code inspection to confirm whether ZenodoDepositClient makes actual REST calls or just generates JSON blobs.

True API Mechanics (2026-04-13)

  1. POST https://zenodo.org/api/deposit/depositions{id, links.bucket}.
  2. PUT {bucket_url}/{filename} with file content → upload.
  3. PUT /api/deposit/depositions/{id} → metadata update.
  4. POST /api/deposit/depositions/{id}/actions/publishirreversible DOI mint.
  • Token: deposit:write + deposit:actions scopes.
  • Sandbox: https://sandbox.zenodo.org/ requires separate account/token.
  • Required metadata: upload_type, creators[], title, description, access_right, license, publication_date.

Gap Delta

IDGapSeverity
EP-038HTTP deposit may not be implemented — needs code auditCritical
EP-039No sandbox routing flagHigh
EP-040No status poll post-deposit (async moderation)High
EP-041Publish action is irreversible — no confirmation gateCritical

Recommendation

Audit scholarly/zenodo.rs for actual HTTP calls. Complete deposit layer. Add --sandbox flag. Add publish confirmation gate.


4.2 OpenReview (TMLR)

Code Reality

scholarly/openreview.rs16 KB. Full adapter including HTTP client.

True API Mechanics (2026-04-13)

  • API 2: https://api2.openreview.net.
  • Auth: username/password login → Bearer token. MFA introduced March 2026 — may break scripted auth.
  • TMLR: double-blind, anonymized PDF, specific LaTeX stylefile, AE recommendation post-submission (manual step).

Gap Delta

IDGapSeverity
EP-042MFA added March 2026 — scripted login may failCritical
EP-043API 2 migration — verify baseurl targets api2.openreview.netHigh

Recommendation

Document MFA workaround. Verify API version target. Keep as-is otherwise.


4.3 arXiv

Code Reality

No adapter. Manual-assist / export package only.

True API Mechanics (2026-04-13)

  • Submission API in development (OAuth, Client Registry registration required — not publicly available).
  • Endorsement policy tightened January 2026: institutional email alone insufficient.
  • AI content enforcement increased.
  • English requirement as of February 2026.
  • Moderation: async — automated systems must handle status polling.

Gap Delta

IDGapSeverity
EP-044arXiv format preflight profile missingHigh
EP-045Endorsement requirements not in Clavis doctorHigh
EP-046AI content policy not integrated into preflight gateCritical

Recommendation

Keep ManualAssist. Build export package. Add preflight profile.


4.4 Crossref

Code Reality

crossref_metadata.rs (6.5 KB) — metadata transformer. No HTTP deposit adapter.

True API Mechanics (2026-04-13)

  • Deposit: POST https://doi.crossref.org/servlet/deposit, multipart/form-data with XML file — not JSON REST.
  • Schema: Crossref input schema; UTF-8; only numeric character entities.
  • Auth: username/password as form fields (not OAuth).
  • Membership required (fee). DOI prefix required.
  • Pending limit: 10,000 per user in queue.

Gap Delta

IDGapSeverity
EP-047No HTTP deposit adapterHigh
EP-048Crossref deposit is XML over multipart — JSON generator is wrong formatCritical
EP-049Non-member: cannot deposit — organizational blockerBlocker
EP-050No Clavis entries for VoxCrossrefUsername/PasswordHigh

Recommendation

Defer until Crossref membership. The XML format requirement is non-trivial if crossref_metadata.rs generates JSON.


5. ResearchGate — Full Policy Analysis

The user specifically requested deep research on ResearchGate. This section is authoritative.

5.1 Does ResearchGate Have a Public API?

No. Definitively no. Research conducted 2026-04-13 from multiple sources:

  • ResearchGate has no public developer API.
  • No OAuth endpoints, no application registration, no developer portal.
  • ResearchGate's Terms of Service explicitly prohibit "mechanisms, devices, software, scripts, robots, or any other means or processes" for automated interaction.

5.2 How Does ResearchGate Discover Publications?

ResearchGate maintains its own internal database populated by:

  1. Publisher XML/metadata feeds — direct agreements with academic publishers.
  2. Bibliographic databases — automated ingestion of publicly available metadata.
  3. CrossRef — DOI metadata is used to populate and verify publication details.
  4. Author-matching algorithm — automatically suggests publications to researcher profiles.
  5. User confirmation — researchers confirm authorship; no API path.
  6. DOI lookup (manual) — users can enter a DOI manually; ResearchGate fetches metadata from Crossref.

5.3 What This Means for SCIENTIA

The indirect strategy is the only strategy:

If a SCIENTIA paper is deposited to Zenodo (which registers with Crossref → DOI), ResearchGate will eventually ingest that DOI record through its Crossref feed and may suggest it to the author's profile. The author must then manually confirm authorship through the RG web interface.

This is the correct posture:

  • SCIENTIA deposits to Zenodo/Crossref → DOI is minted.
  • ResearchGate ingests the DOI record (automatic, within days to weeks).
  • Author confirms authorship on ResearchGate web UI (manual, one-time per paper).
  • Profile shows publication with full citation data, boosting algorithmic discoverability.

5.4 SSoT Representation for ResearchGate

ResearchGate should be documented as a passive discovery target, not an active publication channel. No adapter code should be written.

# contracts/scientia/distribution.topic-packs.yaml
# ResearchGate is NOT a syndication channel. It is a passive discovery target.
# Appears automatically when DOI is registered via Zenodo/Crossref.
# Human action required: author confirms authorship on RG web UI.
researchgate:
  type: passive_discovery
  trigger: doi_registration
  automation_level: none       # API prohibited by ToS
  human_action: confirm_authorship_on_rg_web_ui
  expected_lag_days: 3-14      # varies by publisher feed frequency
  prerequisite: zenodo_doi_minted

Add to SyndicationResult as a tracking field:

#![allow(unused)]
fn main() {
pub struct SyndicationResult {
    // ... existing fields ...
    #[serde(default)]
    pub researchgate_doi_queued: bool,  // true when Zenodo DOI was minted (indirect trigger)
}
}

Add to vox clavis doctor output:

ResearchGate: PASSIVE (no API)
  → Requires Zenodo DOI to be minted first
  → Author must confirm authorship at researchgate.net/profile
  → Expected appearance: 3-14 days after DOI registration

5.5 Type in SSoT

researchgate:
  automation_boundary: ManualConfirmation
  channel_type: passive_discovery
  implementation: "None required — zero code to write"
  doc_only: true

5.6 What NOT to Do

  • Do NOT: Implement a scraper, headless browser, or form-submission bot. This violates ToS and will result in account suspension.
  • Do NOT: Create a researchgate field in SyndicationConfig — it creates a false expectation of automation.
  • Do NOT: Budget engineering time for a ResearchGate adapter — the platform does not support it and the workaround (Zenodo → DOI → RG ingest) is automatic.
  • DO: Document the indirect path, track researchgate_doi_queued in SyndicationResult.

6. New Scholarly Targets

6.1 ORCID

Overview

ORCID (Open Researcher and Contributor ID) is the authoritative persistent identifier for researchers. Programmatically adding a work to an author's ORCID record provides maximum discoverability across all academic databases.

True API Mechanics (2026-04-13)

  • Member API only — write access requires ORCID membership (organizational, annual fee).
  • Scope: /activities/update via 3-legged OAuth. User must explicitly authorize.
  • Endpoint: POST https://api.orcid.org/v3.0/{orcid-id}/work.
  • Format: XML or JSON. Returns a put-code for future updates/deletes.
  • Sandbox: https://api.sandbox.orcid.org/ — use for development.
  • Once a work is POSTed, updates use PUT /work/{put-code}, deletes use DELETE /work/{put-code}.

SCIENTIA Value

Adding a SCIENTIA paper to the author's ORCID record:

  • Propagates to ResearchGate, Scopus, Web of Science, Google Scholar automatically.
  • Gives the work cross-database discoverability without any platform-specific scrapers.
  • ORCID is effectively a universal publication router when combined with a DOI.

Recommendation

Implement after Zenodo is complete. The workflow is:

  1. Zenodo mints DOI.
  2. ORCID adapter POSTs work to /v3.0/{orcid-id}/work with the DOI.
  3. All databases that federate from ORCID see the record.

This is the highest-leverage single scholarly integration after Zenodo.

SSoT Fields Required

orcid.orcid_id: String                         // e.g. "0000-0002-1825-0097"
orcid.access_token: resolved via Clavis VoxOrcidAccessToken
orcid.sandbox: bool                             // default true until production verified
orcid.put_code: Option<String>                  // stored after first POST for future updates

Codebase Impact

  • New scholarly/orcid.rs adapter.
  • New OrcidConfig struct in types.rs (requires orcid_id: String).
  • New VoxOrcidAccessToken and VoxOrcidClientId/VoxOrcidClientSecret in Clavis spec.rs.
  • Add orcid: ChannelOutcome to SyndicationResult.
  • Add orcid: Option<OrcidConfig> to SyndicationConfig.

6.2 Figshare

Overview

Figshare is a research data and publication repository widely used for datasets, code, figures, and preprints. Strongly favored by funders requiring open data compliance (e.g., NIH, Wellcome Trust, UKRI).

True API Mechanics (2026-04-13)

  • Personal Access Token for individual use. Authorization: token {TOKEN} header.
  • No OAuth required for personal accounts (simpler than Zenodo).
  • Article creation: POST /account/articles → returns article_id.
  • File upload: 4-step multipart process:
    1. POST /account/articles/{id}/files with {name, size, md5}location URL.
    2. GET {location} → get part URLs.
    3. PUT {part_url} for each part (binary chunk).
    4. POST /account/articles/{id}/files/{file_id} → complete upload.
  • Publish: POST /account/articles/{article_id}/publishirreversible.
  • Published articles receive a Figshare DOI.
  • Sandbox: https://figshare.sandbox.figshare.com/ for testing.

SCIENTIA Value

Figshare is widely used for:

  • Supplementary datasets accompanying papers.
  • Code datasets (MENS training corpora, evaluation benchmarks, Vox compiler artifacts).
  • Preprints for non-arXiv-eligible content.

Where Zenodo is more appropriate for formal preprints, Figshare excels at datasets and supplementary materials. Many publishers link directly to Figshare for open data requirements.

Comparison to Zenodo

FeatureZenodoFigshare
DOI
AuthBearer token (scoped)Personal token
File uploadSimple PUT to bucket4-step multipart
Metadata schemaZenodo-specificFigshare-specific
Storage limit50 GB per record (free)20 GB per item (free)
Primary usePreprints, datasets, softwareDatasets, figures, code
Publisher integrationsStrong (CERN/EUDAT/OpenAIRE)Strong (Taylor & Francis, etc.)
Best for SCIENTIAFormal preprintsSupplementary data, corpora

Recommendation

Implement as Wave 2 scholarly target, after Zenodo. Priority: Zenodo > ORCID > Figshare.

SSoT Fields Required

figshare.access_token: resolved via Clavis VoxFigshareAccessToken
figshare.sandbox: bool                         // default true
figshare.title: Option<String>                 // overrides item.title
figshare.description: Option<String>           // overrides body
figshare.categories: Vec<u32>                  // Figshare taxonomy category IDs
figshare.tags: Vec<String>
figshare.defined_type: "dataset" | "figure" | "media" | "presentation" | "poster" | "software" | "preprint"
figshare.files: Vec<String>                    // repo-relative paths to upload

7. Priority Matrix (Updated)

PlatformCode StatusPosting Works?EP IDsMaint. BurdenAudience ValueAction
DiscordImplemented ✅YesEP-001,014,015LowHighShip + EP-001
RSSImplemented ✅YesNear-zeroMediumShip
GitHubImplemented ✅Yes (needs audit)EP-009LowHighAudit EP-009, Ship
BlueskyBroken ⚠️No (silent fail)EP-012,023,026Low-MedHigh (academics)Fix EP-012 first
MastodonStub ❌NoEP-027,029LowHigh (academics)Implement now
RedditPartial ⚠️Yes (bugs)EP-017,019Med-HighHigh (CS)Fix + human gate
Twitter/XCode OK ⚠️Needs paid planEP-021,022Very HighMediumbilling gate only
Open CollectivePartial ⚠️PartialEP-005,006Low-MedLowQuick fix
HNManualAssist ✅Manual onlyEP-010ZeroHigh (viral)Add comment_draft
YouTubePartial ⚠️Private-onlyEP-011,036MediumHigh (demos)Compliance audit gate
LinkedInStub ❌NoEP-031–035HighMediumDefer after Mastodon
ZenodoPartial ⚠️UnknownEP-038–041Low-MedCriticalAudit + complete
OpenReviewImplemented ⚠️MFA riskEP-042,043Med-HighCritical (TMLR)MFA workaround
arXivManualAssist ✅Manual onlyEP-044–046HighCriticalBuild export + preflight
ORCIDMissing ❌Not builtMediumCriticalImplement Wave 1 scholarly
FigshareMissing ❌Not builtLowHigh (datasets)Implement Wave 2 scholarly
CrossrefMetadata only ❌NoEP-047–050MediumCritical (DOI graph)Defer until membership
ResearchGateN/ANo API existsZeroHigh (auto via DOI)Passive only, doc only
Academia.eduN/ANo API existsZeroLowDo not implement

8. Hallucination Inventory (Updated)

IDClaimRealityRoot Cause
H-001"Discord adapter is a hard stub"Discord is implemented (52 lines)Community playbook written before code landed
H-002"Reddit User-Agent missing on submit POST"User-Agent correctly sent on submit (line 107)v1 audit error — wrong line was read
H-003"LinkedIn uses UGC Posts API"ugcPosts API is deprecatedPlaybook references 2022-era docs
H-004"Twitter free tier allows posting"Free tier: no write access since early 2026API pricing changed February 2026
H-005"Bluesky field access_token"Correct field: accessJwtAT Protocol uses JWT naming, not OAuth
H-006"arXiv API automation feasible soon"Client Registry registration required; endorsement tightened Jan 2026Optimistic research docs
H-007"Crossref uses JSON REST API"Crossref deposit: HTTPS POST multipart/form-data with XMLConfused with Crossref metadata retrieval API
H-008"ResearchGate has an API"ResearchGate has NO public API; ToS prohibits automationWishful planning; API does not exist
H-009"OpenCollective header is Api-Key"Official docs use Personal-TokenHeader worked but is legacy form
H-010"YouTube adapter needs retry wiring only"Missing dry_run guard; will perform disk I/O and OAuth on dry runsDry-run path not encoded in adapter signature
H-011"social_retry.rs is wired into dispatch"Zero call sites for run_with_retries in dispatch pathsInfrastructure exists but code was never integrated
H-012"Bluesky, Mastodon, Discord, LinkedIn are in retry/allowlist system"These four channels are absent from switching.rs allowlist and retry infrastructureChannels added to types without updating switching.rs
H-013"Academia.edu has a developer API"No public API; ToS prohibits automationConfusion with academic institution management systems sharing the name

9. Unified SSoT Data Model Requirements

The core model (UnifiedNewsItem + SyndicationConfig) is structurally sound but has specific gaps:

9.1 Missing Fields in SyndicationConfig

#![allow(unused)]
fn main() {
pub struct SyndicationConfig {
    // ... existing ...
    pub orcid: Option<OrcidConfig>,            // NEW — Wave 1 scholarly
    pub figshare: Option<FigshareConfig>,       // NEW — Wave 2 scholarly
    // researchgate: intentionally ABSENT — passive discovery only
}
}

9.2 Missing Fields in Existing Channel Configs

#![allow(unused)]
fn main() {
// MastodonConfig — MISSING:
pub instance_url: String,                      // REQUIRED — no default
pub language: Option<String>,                  // ISO 639 code

// LinkedInConfig — MISSING:
pub author_urn: String,                        // "urn:li:person:{id}" — REQUIRED
pub api_version: String,                       // e.g. "202604" — REQUIRED

// HackerNewsConfig — MISSING:
pub comment_draft: Option<String>,             // first comment text

// BlueskyConfig — BROKEN:
pub pds_url: Option<String>,                   // explicit PDS override (for non-bsky.social users)
// link_facet: bool — already exists but unimplemented
}

9.3 Missing Fields in SyndicationResult

#![allow(unused)]
fn main() {
pub struct SyndicationResult {
    // ... existing ...
    pub orcid: ChannelOutcome,                 // NEW
    pub figshare: ChannelOutcome,              // NEW
    pub researchgate_doi_queued: bool,         // NEW — passive tracking only (not a ChannelOutcome)
}
}

9.4 switching.rs Channel Registry Additions Needed

All of the following must be added to:

  • apply_channel_allowlist
  • failed_channels / successful_channels
  • outcome_for_channel match arms
  • normalize_distribution_json_value_with_warnings contract-shape expansion block
bluesky, mastodon, linkedin, discord, orcid, figshare

9.5 Content Hash Fix

Separate content_sha3_256 from routing config to prevent unnecessary dual-approval re-triggers:

#![allow(unused)]
fn main() {
pub fn content_sha3_256(&self) -> String {
    // Hash ONLY: id, title, author, published_at, tags, content_markdown
    // Do NOT include: syndication, topic_pack — routing is not content
}
}

9.6 Scholarly SSoT Publication Record

A new ScholarlyPublicationRecord struct should track the scholarly lifecycle separately from the news syndication model:

#![allow(unused)]
fn main() {
pub struct ScholarlyPublicationRecord {
    pub publication_id: Uuid,
    pub doi: Option<String>,                       // minted after Zenodo publish
    pub zenodo_deposit_id: Option<String>,
    pub zenodo_doi: Option<String>,
    pub orcid_put_code: Option<String>,            // for future updates
    pub figshare_article_id: Option<String>,
    pub arxiv_submission_id: Option<String>,
    pub openreview_forum_id: Option<String>,
    pub crossref_deposit_id: Option<String>,
    pub researchgate_confirmed: bool,              // manual confirmation tracked
    pub published_at: Option<DateTime<Utc>>,
    pub status: ScholarlyPublicationStatus,
}

pub enum ScholarlyPublicationStatus {
    Draft,
    Deposited,          // Zenodo created, not published
    Published,          // DOI minted
    Retracted,          // requires human action
}
}

10. Implementation Policy

This section defines the binding rules for adding, modifying, or removing publication channels from the Scientia pipeline. All future development must conform.

10.1 Channel Classification

Every publication target must be classified at design time:

ClassMeaningExamplesCode Required
ActivePushSCIENTIA posts content via HTTP APIDiscord, Reddit, Mastodon, BlueskyYes — adapter in adapters/*.rs
ScholarlyDepositFormal archival with DOI/IDZenodo, ORCID, Figshare, OpenReviewYes — adapter in scholarly/*.rs
ManualAssistSCIENTIA generates draft; human submitsHN, arXiv (for now), LinkedIn (organizational)Yes — draft generator only
PassiveDiscoveryPlatform ingests automatically via DOI/metadata feeds; no codeResearchGate, Academia.eduNo adapter code
DeferredAPI exists but org/billing blockerCrossref (membership), YouTube (compliance), LinkedIn (App Review)Stub with TOESTUB only

10.2 Gate Requirements Per Class

Classdry_run guardrun_with_retriesvox clavis doctor checkDual approvalHuman gate
ActivePushMandatoryMandatoryRequired for secretsRequired for liveRecommended for social
ScholarlyDepositMandatoryMandatoryRequired for secretsRequiredRequired (publish is irreversible)
ManualAssistN/A (no HTTP)N/AOptionalOptionalInherent (human submits)
PassiveDiscoveryN/AN/AOptionalN/AOptional
DeferredN/A (stub returns Err)N/AGate must explain blockerN/AN/A

10.3 New Channel Checklist

Before merging any new publication channel:

  • Classification assigned and documented.
  • Adapter file: adapters/{channel}.rs or scholarly/{channel}.rs.
  • Config struct added to types.rs with all required fields.
  • Config added to SyndicationConfig.
  • Outcome field added to SyndicationResult.
  • Channel added to switching.rs: apply_channel_allowlist, failed_channels, successful_channels, outcome_for_channel, normalize_distribution_json_value_with_warnings.
  • run_with_retries wired from dispatch path.
  • dry_run guard in adapter before any I/O.
  • Clavis secrets registered in spec.rs with correct SecretId variants.
  • vox clavis doctor probe added for required secrets.
  • TOESTUB compliance: no pub use in frozen modules, no god objects.
  • Integration test added with mock server (at minimum, a dry_run: true compile test).

10.4 Volatile API Policy

Platforms with rapidly changing APIs require explicit maintenance triggers:

PlatformTriggerCadence
LinkedIn Linkedin-Version headerNew quarterly API versionQuarterly check
Twitter/X billingAPI pricing changesOn each billing cycle
OpenReview API versionOpenReview migration announcementsMonitor changelog
arXiv endorsement policyarXiv policy announcementsMonitor arXiv blog
Crossref XML schemaCrossref schema releasesOn schema version bump

These should be added as calendar reminders in contributor documentation, not just in this research doc.

10.5 Data Retention and Audit Trail

Every ActivePush and ScholarlyDeposit call must write to the syndication_events table (currently missing — PROBLEM-24 from gap analysis) before returning. Schema:

CREATE TABLE IF NOT EXISTS syndication_events (
    id              TEXT PRIMARY KEY,     -- uuid
    publication_id  TEXT NOT NULL,
    channel         TEXT NOT NULL,        -- "discord", "zenodo", etc.
    outcome         TEXT NOT NULL,        -- JSON: ChannelOutcome
    external_id     TEXT,                 -- platform-specific ID/URL
    attempt_number  INTEGER NOT NULL DEFAULT 1,
    attempted_at    TEXT NOT NULL,        -- ISO 8601 UTC
    created_at      TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))
);

Without this table: no audit trail, no KPI computation, no feedback loop.

10.6 Do Not Implement List

The following platforms have been researched, confirmed to have no public API for programmatic posting, and should never have adapter code written:

PlatformReason
ResearchGateNo public API. ToS prohibits automation. Passive via DOI.
Academia.eduNo public API. ToS prohibits automation. Low scientific value.
Google ScholarNo API. Passive indexing only.
Semantic ScholarNo write API. Read API only. Passive via DOI.
Web of ScienceSubscription-gated. No submission API.
ScopusSubscription-gated. No submission API.

11. Task Backlog (Updated)

Tasks are organized by dependency order. EP-NNN references correlate to §2-§6.

Wave 0 — Critical Fixes (No Dependencies)

TaskEPFileEst. Lines
Fix accessJwt field name in bluesky.rsEP-012adapters/bluesky.rs:141
Add instance_url to MastodonConfigEP-029types.rs2
Fix makePublicOn to use config.scheduled_publish_atEP-006adapters/opencollective.rs:373
Add dry_run guard to youtube.rs::upload_videoEP-011adapters/youtube.rs5
Update OC auth header to Personal-TokenEP-005adapters/opencollective.rs:461
Document Reddit AI training prohibitionEP-020AGENTS.md + docs/src/reference/clavis-ssot.md

Wave 1 — Infrastructure (Parallel, No Feature Dependencies)

TaskEPFileEst. Lines
Wire run_with_retries into Discord dispatchEP-001switching.rs or publisher dispatch~10
Wire run_with_retries into Reddit dispatchEP-001dispatch~10
Wire run_with_retries into Bluesky dispatchEP-001dispatch~10
Wire run_with_retries into Twitter dispatchEP-001dispatch~10
Wire run_with_retries into YouTube dispatchEP-001dispatch~10
Add bluesky/mastodon/linkedin/discord to apply_channel_allowlistEP-002switching.rs:285~8
Add these channels to failed_channelsEP-003/4switching.rs:315~8
Add these channels to outcome_for_channelEP-004switching.rs:378~8
Add these channels to contract-shape expanderEP-003switching.rs:193~8
Create syndication_events DB table migrationEP-001 parentvox-db~30
Fix content_sha3_256 to exclude syndicationEP-008types.rs:470~10
Add comment_draft to HackerNewsConfigEP-010types.rs:2112

Wave 2 — Mastodon Implementation

TaskEPNotes
Implement adapters/mastodon.rsEP-027~60 lines
Add language: Option<String> to MastodonConfigEP-0281 line
Register VoxMastodonAccessToken in Clavis (verify exists)spec.rs
Add Mastodon to switching.rs channel registryEP-002Wire allowlist, retry, outcome
Add vox clavis doctor Mastodon secret probevox-cli

Wave 3 — Bluesky Hardening

TaskEPNotes
Implement resolve_pds(handle) -> StringEP-023~30 lines, separate function
Add in-memory session cache with TTL for accessJwt/refreshJwtEP-024~40 lines
Implement link card embed ($type: app.bsky.embed.external)EP-007~30 lines
Add grapheme cluster count validationEP-025unicode-segmentation crate
Fix dry_run plumbing through Bluesky dispatchEP-026Adapter signature change

Wave 4 — Zenodo Completion

TaskEPNotes
Audit scholarly/zenodo.rs — confirm HTTP calls exist or implementEP-038Inspect ~20 KB file
Add --sandbox routing flagEP-039VoxZenodoSandbox Clavis entry
Add async deposit status pollingEP-040~40 lines
Add publish confirmation gate (irreversibility warning)EP-041UX + gate logic
Write to syndication_events on Zenodo deposit and publishParentDB write

Wave 5 — ORCID Implementation

TaskEPNotes
Create scholarly/orcid.rs adapter~80 lines
Add OrcidConfig struct to types.rs5 fields
Add orcid: Option<OrcidConfig> to SyndicationConfig1 line
Add orcid: ChannelOutcome to SyndicationResult1 line
Register Clavis entries for ORCID client credentialsspec.rs
Add to switching.rs channel registryAllowlist, retry, outcome

Wave 6 — Twitter Gate, YouTube Gate

TaskEPNotes
Add Twitter billing status check to vox clavis doctorEP-021Document as status: billing_required
Add YouTube compliance audit status to vox clavis doctorEP-036Document as status: compliance_audit_required
Add per-session tweet budget to TwitterConfigEP-022tweet_budget_per_session: usize

Wave 7 — arXiv Preflight + Export

TaskEPNotes
Create arXiv format preflight profileEP-044PreflightProfile::ArxivFormat
Add arXiv endorsement requirements to Clavis doctorEP-045Documentation check
Integrate AI content policy gate into arXiv preflightEP-046Socrates confidence threshold

Wave 8 — Figshare (Optional, Data-Focused)

TaskNotes
Create scholarly/figshare.rs adapter4-step multipart upload
Add FigshareConfig to types.rs7 fields
Register VoxFigshareAccessToken in Clavis

Deferred (Org Blockers)

TaskBlocker
LinkedIn implementationApp Review + author_urn identity decision
Crossref XML depositCrossref membership required
OpenReview MFA workaroundMarch 2026 MFA rollout — document only for now

Do Not Implement

TargetDecision
ResearchGate adapterNo API. PassiveDiscovery via DOI.
Academia.edu adapterNo API. Low value.
Google Scholar adapterNo write API. Passive only.
Semantic Scholar adapterNo write API.

Research v2 — web searches and code audit conducted 2026-04-13. Code files audited: adapters/*, scholarly/*, switching.rs, syndication_outcome.rs, types.rs, gate.rs, social_retry.rs, scientia_heuristics.rs. ResearchGate: confirmed no public API via multiple sources. ORCID and Figshare: confirmed public APIs with REST/token access.