"Populi GPU truth probe specification (NVML Layer A)"

Populi GPU truth probe specification (NVML Layer A)

This document implements the probe slice of ADR 018: Populi GPU truth layering: Layer A fields on NodeRecord (crates/vox-populi/src/node_registry.rs) populated from the driver when NVML is available.

Build / runtime

Surface	Behavior
Default builds	No NVML link. `vox_repository::probe_nvidia_gpu_inventory_best_effort` (`crates/vox-repository/src/gpu_inventory.rs`) returns `None`; join/heartbeat behave as before (env advertisement only).
`vox-repository` feature `nvml-probe`	Links `nvml-wrapper`. At runtime, `Nvml::init()` must succeed (NVIDIA driver + NVML present).
`vox-populi` feature `nvml-gpu-probe`	Enables `vox-repository/nvml-probe`.
`vox-cli` feature `mesh-nvml-probe`	Pulls `vox-populi` with NVML probe for operators who want inventory on `node_record_for_current_process`.

Typical build:

cargo build -p vox-cli --features populi,mesh-nvml-probe

Fields populated

When the probe succeeds, node_record_for_current_process (crates/vox-populi/src/lib.rs) sets:

gpu_total_count, gpu_healthy_count, gpu_allocatable_count — from NVML device enumeration (v1: healthy/allocatable match enumerated devices; refine with reservations in a later phase).
gpu_inventory_source — "nvml".
gpu_truth_layer — "layer_a_verified".
capabilities.min_vram_mb — minimum total VRAM in MiB across devices, only if not already set by config.

Heartbeat reconciliation

Operators should send the same [NodeRecord] shape on join and heartbeat (existing Populi HTTP contract). Rebuilding the record each tick via node_record_for_current_process (or equivalent) automatically refreshes Layer A after GPU hotplug, driver restart, or VM attach — subject to NVML visibility.

Layer B (allocatable after local reservations) and Layer C (labels/policy) remain separate; this spec does not merge operator lies with probe facts — ADR 018 precedence still applies when schedulers consume both.

ADR 018
Populi GPU mesh implementation plan 2026
Mens cloud GPU strategy (boundary vs Populi)

Vox: The AI-Native Programming Language

Populi GPU truth probe specification (NVML Layer A)

Build / runtime

Fields populated

Heartbeat reconciliation

Related