"MENS Research Track Blueprint 2026"

MENS Research Track Blueprint (2026)

1. Lane G: `research-expert` Specification

The research-expert lane is a dedicated training track focused on evidence synthesis, multi-hop reasoning, and contradiction resolution.

1.1 Objective

Unlike Lane A (code generation), Lane G is optimized for:

Evidence Synthesis: Merging RRF hit lists into coherent reasoning.
Multi-hop Logic: Chaining facts A + B to answer query C.
Abstention Calibration: Refusing to answer when evidence quality is below 0.3 or contradictory.

2. Training Paradigm

2.1 Base Model

Base: Qwen/Qwen3.5-4B.
Target: 16GB VRAM (Consumer GPU invariant).

2.2 Stage 1: SFT

Data: 10,000 synthetic multi-hop chains from vox-corpus research-gen.
Format: Instruction-pair with structured synthesis.

2.3 Stage 2: GRPO Fine-Tuning

Utilizes Group Relative Policy Optimization (GRPO) with Reinforcement Learning with Verifiable Rewards (RLVR).

Reward	Signal	Failure Penalty
Citation Groundedness	Cited URL exists in input	-1.0
Synthesis Completeness	All sub-questions answered	0.0
Format Adherence	Valid JSON/Structure	-0.5
Contradiction Res	Downstream gate consistency	0.0

To avoid data exhaustion and privacy leakage, we use rule-based synthetic generation of fictional knowledge graphs. This forces the model to learn the logic of composition rather than memorizing facts.

{
  "lane": "vox_research_expert",
  "task_family": "retrieve_and_synthesize",
  "hop_count": 3
}

4. Integration into Socrates

Local synthesis results are injected into the SocratesTaskContext. When research_model_enabled is true, the orchestrator delegates to this specific adapter rather than using the generic code model for research summaries.

Vox: The AI-Native Programming Language