"Compiler Testing Research Synthesis"

Compiler Architecture Verification & Oracles

1. Context

Methodologies for validating an LLM-targeted, strongly-typed statically compiled DSL (Vox language), specifically focusing on Property-Based Testing (PBT), snapshot depth, and Oracle frameworks for LLM test generation.

2. Empirical Findings & Tradeoffs

Proptest vs. Quickcheck for ASTs

Quickcheck (Stateless, Trait-bound) has massive input-rejection rates when generating recursive algebraic datatypes (like ASTs).
Proptest (Stateful Strategies) is mandatory for AST coverage due to its capability for deterministic shrinking of massive, complex syntax trees.

Snapshot Brittleness

Deep snapshotting (capturing AST, HIR, and Codegen files for every test) induces unmanageable developer friction during early syntax iteration.
Shallow UI snapshotting (stderr/stdout) normalized for paths is highly stable, but obscures exact optimization layer regressions.

The LLM "Oracle Problem"

Relying on LLMs to generate both the complex fuzzing input and the expected assertion (the Oracle) for an undocumented, custom DSL yields an unacceptable false-positive rate (hallucination).
Pure Grammar Fuzzers reliably find parser crashes but fail to exercise the middle-end because their outputs rarely pass polymorphic type-checkers.

Mutation "Arid Nodes"

Performing source-level mutation creates noise. IR-level mutation testing generates "Arid Nodes" (e.g., mutating a debug logging statement), causing developer trust to plummet.

3. Validated Architectural Adjustments (4 Waves)

Wave 1 (Boundary Defense): Implement shallow, normalized UI snapshot tests. Enforce the primary parser invariant: parse(unparse(ast)) == ast.
Wave 2 (Frontend PBT): Deploy the @forall macro backed by the proptest framework to strictly enforce structural boundaries via stateful recursive shrinking.
Wave 3 (Semantic Contracts & MRs): Integrate lightweight @spec(requires, ensures) block constraints. These act as runtime assertion oracles (not SMT blockings), sidestepping the LLM Oracle problem.
Wave 4 (Differential Fuzzing): Use LLVM IR-layer equivalents (mutation on arithmetic/relational operators). Filter mutation operators strictly away from standard-out/logging paths to prevent Arid Node rejection.

Vox: The AI-Native Programming Language