Research: Fuzzy & Partial Parsing for Iterative LLM Generation
Date: April 2026
Status: Emerging (Wave 12 Foundation)
Context: Optimizing the inner loop of LLM-native development
The Problem: Binary Failure in Classic Parsers
Traditional compilers operate on a "green/red" binary. If a file has a single missing brace at the end, the entire AST is lost. For LLMs, which often generate code incrementally (streamed) or stop prematurely due to context limits, this binary failure destroys the feedback loop.
The Vox Strategy: Resilient ASTs
1. Partial Skeletons
The Vox recursive-descent parser (0.4) is being hardened to emit a "Skeleton AST" even under parse failure.
- Graceful Termination: If EOF is reached inside a block, the parser "synthetically" closes the block and markers the resulting node as
stub/eof-terminated. - Diagnostic Anchoring: Diagnostics are attached to the partially formed nodes, allowing the LLM to see where the parser lost track without discarding the preceding 90% of valid code.
2. Fuzzy Token Matchers
Lexing in Vox 0.4 now supports "Phonetic Similarity" for keywords.
- Intent Detection: If an LLM emits
compnentinstead ofcomponent, the lexer identifies the high-probability intent and emits aWarninstead of anError(enabled only inmens-trainingmode). - Benefit: Reduces "stupid" hallucination failures that would otherwise trigger a full re-generation cycle.
3. Incremental Verification
- AST Eval: Integrating the parser into
vox-eval(Wave 8) allows for verifying expressions as they are generated, even if the surrounding module is yet incomplete. - Micro-Feedback: Provides the model with a "Self-Correction Gate" at the statement level.
Future Work (Wave 13)
- Probabilistic Grammars: Integrating the
vox-grammar-exportcrate with constrained decoding engines (e.g., Guidance, Outlines) to prevent syntax errors entirely at the sampling layer.
References
vox-grammar-export/README.mdparser/descent/mod.rsresearch-grpo-ast-reward-hacking-2026.md