"Language surface SSOT (keywords, decorators, manifests)"

Language surface SSOT

Problem

The same keyword, decorator, and surface-syntax information is maintained in multiple places, which causes drift and duplicate review burden:

ConsumerLocationRole
LSP completionscrates/vox-lsp/src/completions.rsSnippets + docs for editor
MCP introspectioncrates/vox-orchestrator/src/mcp_tools/tools/introspection_tools.rsvox_language_surface, vox_decorator_registry
Website / searchdocs/src/api/decorators.json, docs/src/api/keywords.jsonStructured API search
Eval heuristicscrates/vox-eval/src/lib.rsRegex-based construct detection
Speech / constrained decodingcontracts/speech-to-code/vox_grammar_artifact.jsonMachine-readable lexer hints
Compiler (ground truth)crates/vox-compiler/src/lexer/token.rs, parser docs in parser/mod.rsWhat the language actually accepts

Implemented SSOT (code)

Decision: authoritative source

Ground truth remains the compiler lexer and parser (vox-compiler). Any manifest that lists keywords or decorators must either:

  1. Be generated from compiler metadata (preferred long-term), or
  2. Be validated in CI against a single checked-in contract under contracts/ that is itself generated or diff-tested against the compiler.

Recommended contract location (phased):

  • Add contracts/language/vox-language-surface.json (or .yaml + JSON Schema) as the machine-readable SSOT for minimal surface lists (keywords, decorator names, punctuators) used by speech and MCP.
  • Generate decorators.json rich fields (descriptions, docUrl, codegen hints) from a merge of: generated name list + hand-authored overlay file (e.g. contracts/language/decorator-overlays.yaml) so editorial content stays intentional.

Consumer map (target state)

vox-compiler (lexer/parser) ──► codegen / build.rs or `vox ci` step
        │
        ├──► contracts/language/* (committed)
        ├──► docs/src/api/*.json (generated)
        ├──► vox-lsp (include! or generated module)
        ├──► vox-mcp introspection (calls into vox-compiler or includes generated JSON)
        ├──► vox-eval (optional: generate regex table from same list, or call compiler)
        └──► contracts/speech-to-code/vox_grammar_artifact.json (generated)
  • Replacing the recursive-descent parser or logos lexer with external parser frameworks solely to deduplicate lists.

Syntax Modernization (Path C)

As part of the legacy codebase retirement (OP-0179, OP-0158), surface definitions are being realigned towards Path C syntax (component Name() { ... }). The legacy @component fn surface is formally deprecated and will be removed from the canonical SSOT generator once all downstream UI surfaces conform to Path C.

  • Deleting decorators.json editorial fields without an overlay story.

Implementation order

  1. Add a single generator entrypoint (crate binary or vox ci subcommand) that emits the minimal JSON contract from Token / parser tables.
  2. Wire one consumer (speech artifact or MCP) -> the generated file; keep the old file until diff is zero.
  3. Migrate LSP and eval last (highest churn in snippets vs plain names).

See also: Outbound HTTP policy, OpenAPI contract SSOT.