Continual Learning Flywheel Risks
Executive Summary
Deploying an autonomous dogfood or self-play training flywheel—in which a model continuously fine-tunes itself on its own generated outputs—carries a critical baseline risk of systemic degradation. Three interacting failure modes threaten the Vox MENS architecture:
- Recursive ingestion of synthetic data drives Model Autophagy Disorder (MAD), leading to irreversible variance loss and mode collapse.
- Reliance on a binary compile-pass oracle without semantic execution checks exposes the system to reward hacking and severe semantic drift.
- Repeated QLoRA fine-tuning cycles on limited data volumes induce catastrophic forgetting, mechanically overwriting the base model's generalized reasoning and natural language capabilities.
Contemporary research offers empirically validated countermeasures: transitioning from a "replace" to an "accumulate" synthetic data strategy; integrating execution-based verification or oracle-less proxy metrics; and deploying advanced PEFT stabilization techniques such as CURLoRA, O-LoRA, or FAPM. Agent-generated prose (Schola/Scientia) remains the most volatile element and requires stringent external filtering.
Detailed Research Pages
- Quality and Mode Collapse in Self-Play LLM Loops
- The Compile-Pass Oracle and Semantic Degradation
- Catastrophic Forgetting in QLoRA Fine-Tuning
- The Risks of Agent-Generated Prose (Schola & Scientia)
- Minimum Viable Corpus Size for QLoRA Domain Adaptation
- Utilizing Parse Failures as Negative Examples
- Risk Taxonomy, Monitoring Design, and Open Research Questions
- Works Cited: Continual Learning Flywheel Risks