A deterministic Virtual Machine Monitor (VMM) for x86_64 built with KVM and the rust-vmm crate ecosystem. Designed for simulation testing of distributed systems where reproducibility is essential.
This is just an experiment with Claude + Pi.dev. Use at your own risk
- CPUID filtering: Comprehensive filtering removes RDRAND, RDSEED, RDTSCP, optionally AVX2/AVX-512, and hides hypervisor presence
- Pinned TSC: Fixed time stamp counter frequency (default 3.0 GHz) for reproducible timing across hosts
- Virtual TSC: Software TSC counter that advances only on VM exits, enabling fully deterministic time progression
- Fixed processor identity: Optional model/family/stepping override for cross-host reproducibility
- SMP support: Multi-vCPU VMs with serialized execution (Antithesis-style), deterministic round-robin or randomized scheduling
- x86_64 boot: Full long mode setup with GDT, identity-mapped page tables (1 GB via 2 MB pages), and Linux boot protocol support
- In-kernel IRQ chip: PIC, IOAPIC, and LAPIC via KVM
- Serial console: COM1 with interrupt-driven I/O and output capture
- Linux kernel support: Loads ELF kernels via linux-loader
- ACPI tables: RSDP/RSDT/MADT for SMP CPU topology
- Complete state capture: CPU registers, FPU, debug registers, LAPIC, XCRs, IRQ chip (PIC master/slave, IOAPIC), PIT, KVM clock, and full guest memory
- Instant restore: Resume execution from any captured checkpoint
- Fork support: Create divergent execution paths from a single snapshot point
- Copy-on-write block device: Snapshots share the base disk image
via
Arc; only dirty 4 KB pages are cloned — a 512 MB disk with 1 MB of writes costs ~1 MB per snapshot, not 512 MB
- Entropy: Seeded ChaCha20 PRNG replacing hardware RNG, with snapshot/restore and reseed for exploration
- Block: Copy-on-write block device with optional disk image file
backing (
--disk-image). Supports fault injection (read errors, write errors, torn writes, corruption) - Network: Simulated network with RX/TX queues, latency, jitter, bandwidth limiting, packet loss/corruption/reorder/duplication for fully controlled packet delivery between VMs
- Coverage-guided exploration: AFL-style edge coverage bitmaps, fork-from-snapshot branching, frontier-based search
- Three exploration modes: fault-schedule mutation, input-tree
branching at
random_choice()points, or hybrid - Fault schedule minimization: Delta debugging (ddmin) to find the smallest schedule that triggers a bug
- Bug reproduction: Replay a bug report to verify it still triggers
- Assertion catalog: Compile-time registration of all assertion sites
via
linkme; reports show which assertions are exercised/unexercised - Per-round history: Coverage growth curves, plateau detection, bug discovery timeline
- Binary dlog format: Per-exit event log for diagnosing non-determinism
- Structural diff: Compare two runs ignoring data payloads
- Register dumps: Periodic full-register snapshots in dlog
- Memory hashing: CRC32 page hashes at snapshot boundaries
chaoscontrol/
├── flake.nix # Nix development environment
├── Cargo.toml # Workspace root
└── crates/
├── chaoscontrol-protocol/ # SDK ↔ VMM wire protocol (no_std)
├── chaoscontrol-sdk/ # Guest-side SDK (Antithesis-style)
├── chaoscontrol-fault/ # Host-side fault injection engine
├── chaoscontrol-vmm/ # VMM implementation
├── chaoscontrol-explore/ # Coverage-guided exploration engine
├── chaoscontrol-replay/ # Recording, replay, time-travel debugger
├── chaoscontrol-trace/ # eBPF-based KVM tracing
├── chaoscontrol-guest/ # Minimal SDK-instrumented guest binary
├── chaoscontrol-raft-guest/ # 3-node Raft consensus guest (35 assertions)
├── chaoscontrol-guest-net/ # Network guest library (smoltcp)
└── chaoscontrol-net-guest/ # Network demo guest binary
When the guest kernel is built with CONFIG_KCOV=y, the SDK
automatically collects kernel code coverage and merges it into the
same AFL-style bitmap used by userspace SanCov. This gives the
explorer visibility into kernel code paths exercised by different
fault schedules — filesystem error handling, network stack branches,
scheduler decisions, etc.
# Build KCOV-enabled kernel (first time takes ~20 min)
nix build .#kcov-vmlinux -o result-kcov
# Run exploration with kernel coverage
cargo run --release --bin chaoscontrol-explore -- run \
--kernel result-kcov/vmlinux --initrd guest/initrd-raft.gz \
--vms 3 --rounds 200 --branches 16
# Guest SDK auto-detects KCOV — no code changes neededOn a standard kernel (without CONFIG_KCOV), the SDK gracefully
falls back to userspace-only coverage — no crash, no error.
# Enter development environment
nix develop
# Build VMM + tools
cargo build
# Run tests (827 unit + doc tests)
cargo test
# Build guest binaries (statically linked, musl)
nix build .#guest-sdk # → result/bin/chaoscontrol-guest
nix build .#guest-raft # → result/bin/chaoscontrol-raft-guest
nix build .#guest-net # → result/bin/chaoscontrol-net-guest
# Build initrd images (from guest binaries)
nix build .#initrd-sdk # → result (gzipped cpio)
nix build .#initrd-raft
nix build .#initrd-net
# Build custom kernels
nix build .#net-vmlinux # virtio-net enabled
nix build .#kcov-vmlinux # KCOV coverage
nix build .#kcov-net-vmlinux # both
# Boot a kernel
cargo run --bin boot -- <kernel-path> [initrd-path]
# Snapshot demo
cargo run --release --bin snapshot_demo -- <kernel-path> <initrd-path># Run Raft exploration with one command (builds kernel + guest + initrd)
nix run .#explore-raft
# Run with custom args (appended after defaults)
nix run .#explore-raft -- --output results/ --rounds 200# Coverage-guided exploration
cargo run --release --bin chaoscontrol-explore -- run \
--kernel <kernel-path> --initrd <initrd-path> \
--vms 3 --rounds 200 --branches 16 --output results/
# With persistent disk image
cargo run --release --bin chaoscontrol-explore -- run \
--kernel <kernel-path> --initrd <initrd-path> \
--disk-image <path-to-ext4.img> \
--vms 3 --rounds 200 --branches 16 --output results/
# Input-tree mode (branch at random_choice() points)
cargo run --release --bin chaoscontrol-explore -- run \
--kernel <kernel-path> --initrd <initrd-path> \
--mode input-tree --output results/
# Resume from checkpoint
cargo run --release --bin chaoscontrol-explore -- resume \
--corpus results/ --rounds 500Output directory contains:
checkpoint.json— resumable exploration statereport.txt— human-readable report with per-round historyassertions.json— per-assertion verdicts and hit countsbug_N.json— bug reports (consumable by minimize/reproduce)
# 1. Explore — find bugs
cargo run --release --bin chaoscontrol-explore -- run \
--kernel vmlinux --initrd initrd.gz \
--vms 3 --rounds 100 --output results/
# 2. Minimize — shrink the fault schedule
cargo run --release --bin chaoscontrol-explore -- minimize \
--kernel vmlinux --initrd initrd.gz \
--bug results/bug_0.json --output minimized.json
# 3. Reproduce — verify the bug
cargo run --release --bin chaoscontrol-explore -- reproduce \
--kernel vmlinux --initrd initrd.gz \
--bug minimized.json --serial# Replay a recorded session
cargo run --release --bin chaoscontrol-replay -- replay \
--recording session.json --ticks 5000
# Triage — generate bug report from recording
cargo run --release --bin chaoscontrol-replay -- triage \
--recording session.json --bug-id 1 --format markdown
# Show recording metadata
cargo run --release --bin chaoscontrol-replay -- info \
--recording session.json
# Determinism log tools
cargo run --release --bin chaoscontrol-replay -- dlog diff a.dlog b.dlog
cargo run --release --bin chaoscontrol-replay -- dlog dump run.dlog
cargo run --release --bin chaoscontrol-replay -- dlog stats run.dlog# Run exploration with live web dashboard
cargo run --release --bin chaoscontrol-explore --features dashboard -- run \
--kernel vmlinux --initrd initrd.gz \
--vms 3 --rounds 100 --dashboard
# Custom dashboard port
cargo run --release --bin chaoscontrol-explore --features dashboard -- run \
--kernel vmlinux --initrd initrd.gz \
--dashboard --dashboard-port 9090
# Review past results (standalone mode)
cargo run --release --bin chaoscontrol-dashboard -- serve --corpus results/The dashboard shows:
- Coverage growth chart with bug discovery markers
- Per-assertion status table (failed/passed/unexercised)
- Round-by-round progress table
- Network fabric statistics
- Live updates via Server-Sent Events
Open http://localhost:8080 in a browser while exploration runs.
# Live KVM trace (requires sudo)
sudo chaoscontrol-trace live --pid <VMM_PID> --output trace.json
# Verify determinism between two traces
chaoscontrol-trace verify --trace-a run1.json --trace-b run2.jsonDeterministicVm is the main entry point, configured via VmConfig:
use chaoscontrol_vmm::vm::{DeterministicVm, VmConfig};
use chaoscontrol_vmm::cpu::CpuConfig;
let config = VmConfig {
memory_size: 256 * 1024 * 1024,
cpu: CpuConfig {
tsc_khz: 3_000_000,
seed: 42,
..CpuConfig::default()
},
..VmConfig::default()
};
let mut vm = DeterministicVm::new(config)?;
vm.load_kernel("vmlinux", Some("initrd.gz"))?;
vm.run()?;Comprehensive CPUID filtering:
| CPUID Leaf | What's Filtered | Why |
|---|---|---|
| 0x1 | RDRAND, TSC-Deadline, hypervisor bit | Hardware RNG, timer jitter |
| 0x7 | RDSEED, AVX2, AVX-512 | Hardware RNG, ISA variation |
| 0x15 | TSC frequency info | Fixed crystal clock ratio |
| 0x16 | Processor frequency | Consistent MHz reporting |
| 0x40000000+ | KVM paravirt leaves | Hide hypervisor presence |
| 0x80000001 | RDTSCP | Bypasses MSR-trap path |
| 0x80000007 | Invariant TSC | Guest shouldn't assume host TSC |
Virtual TSC for fully deterministic time:
use chaoscontrol_vmm::cpu::VirtualTsc;
let mut vtsc = VirtualTsc::new(3_000_000, 1_000);
vtsc.tick(); // Advance by 1000 counts
let ns = vtsc.elapsed_ns(); // Convert to nanoseconds
let snap = vtsc.snapshot(); // Serialize for checkpointsThe chaoscontrol-sdk crate provides a guest-side testing API inspired by
Antithesis. Guest code uses these to annotate
properties and receive guided random values:
use chaoscontrol_sdk::prelude::*;
chaoscontrol_init();
// Signal setup complete — faults may begin
lifecycle::setup_complete(&[("nodes", "3")]);
// Safety property: must always hold
cc_assert_always!(leader < num_nodes, "valid leader");
// Liveness property: must hold at least once across all runs
cc_assert_sometimes!(write_ok, "write succeeded");
// Reachability
cc_assert_reachable!("leader elected");
cc_assert_unreachable!("split brain");
// Guided random choice for exploration
let action = random::random_choice(3);All assertion sites are registered at compile time via linkme and
reported to the VMM at startup. The exploration report shows which
assertions were exercised, passed, failed, or never reached.
use chaoscontrol_fault::schedule::FaultScheduleBuilder;
use chaoscontrol_fault::faults::Fault;
let schedule = FaultScheduleBuilder::new()
.at_ns(1_000_000_000, Fault::NetworkPartition {
side_a: vec![0],
side_b: vec![1, 2],
})
.at_ns(5_000_000_000, Fault::NetworkHeal)
.at_ns(8_000_000_000, Fault::ProcessKill { target: 1 })
.at_ns(10_000_000_000, Fault::InjectInterrupt { target: 0, irq: 5 })
.build();27 fault types across 6 categories: network (partition, latency, jitter, bandwidth, loss, corruption, reorder, duplication, heal), disk (I/O errors, torn writes, corruption, full), process (kill, pause, restart), clock (skew, jump), resource (memory pressure), interrupt (IRQ injection, NMI).
The VM run loop handles exits and advances the virtual TSC deterministically:
- IoIn/IoOut: Serial port I/O, device access, SDK hypercalls
- Hlt: VM halted — fast-forward TSC + inject timer IRQ
- MmioRead/MmioWrite: Virtio MMIO, HPET, ACPI PM timer
- Hypercall: VMCALL-based SDK transport (preferred over port I/O)
- Every exit increments the virtual TSC by a fixed amount
Execution modes:
run()— run until halt/shutdownrun_until(pattern)— run until serial output matchesrun_bounded(max_exits)— run for N exits (deterministic scheduling)
kvm-ioctls = "0.19" # KVM API
kvm-bindings = "0.10" # KVM structures
vm-memory = "0.17" # Guest memory management
linux-loader = "0.13" # Kernel loading (ELF)
vm-superio = "0.8" # Serial port emulation
vmm-sys-util = "0.12" # EventFd, utilities
rand_chacha = "0.3" # Seeded PRNG
linkme = "0.3" # Compile-time assertion catalog
snafu = "0.8" # Error handling- Boot Linux kernel in single-vCPU KVM VM
- CPUID filtering (RDRAND, RDSEED, RDTSCP, AVX, hypervisor)
- TSC pinning + virtual TSC tracking
- Complete snapshot/restore (CPU + memory + devices)
- Deterministic entropy (seeded ChaCha20)
- Deterministic block device with fault injection
- Deterministic network (simulated queues)
- Guest SDK (Antithesis-style assertions + guided randomness)
- Fault injection engine (network, disk, process, clock faults)
- Property oracle (cross-run assertion tracking + verdicts)
- VMM ↔ SDK hypercall integration (VMCALL + port I/O fallback)
- Virtio transport layer (MMIO-based, blk + net + rng)
- Multi-VM simulation controller with network fabric
- Deterministic scheduling across VMs
- SMP — multi-vCPU with serialized execution
- Coverage-guided exploration (AFL-style edge bitmaps)
- Input tree exploration — branch at random_choice() decision points
- Network simulation fidelity (jitter, bandwidth, duplication)
- Kernel coverage (KCOV) — kernel code path visibility
- Assertion catalog — compile-time registration via linkme
- Fault schedule minimization — delta debugging
- Bug reproduction from JSON reports
- Determinism logging (dlog) — binary event log + diff + stats
- Time-travel debugger with counterfactual analysis
- Per-round exploration history and plateau detection
- Per-assertion detail reports with JSON export
- Multi-VM networking (virtio-net + smoltcp TCP/IP)
- Interrupt injection faults (IRQ + NMI)
- Core pinning for reduced scheduling jitter
- Nix-native build pipeline (guest packages, initrd builder, kernel composer)
- Declarative simulation tests via
mkChaosTest
Add ChaosControl as a flake input and define simulation tests for your own guest binaries:
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
chaoscontrol.url = "github:user/chaoscontrol";
};
outputs = { self, nixpkgs, chaoscontrol, ... }:
let
system = "x86_64-linux";
cc = chaoscontrol.lib.${system};
pkgs = nixpkgs.legacyPackages.${system};
in {
# Define a simulation test as a flake check
checks.${system}.my-consensus-test = cc.mkChaosTest {
name = "my-consensus";
kernel = cc.mkChaosKernel { virtioNet = true; };
initrd = cc.mkChaosInitrd {
init = self.packages.${system}.my-guest;
};
vms = 3;
rounds = 100;
branches = 8;
seed = 42;
};
# Use pre-built kernels to skip kernel compilation
checks.${system}.quick-test = cc.mkChaosTest {
name = "quick";
kernel = chaoscontrol.packages.${system}.net-vmlinux;
initrd = cc.mkChaosInitrd {
init = self.packages.${system}.my-guest;
};
rounds = 10;
};
};
}Run with nix flake check (requires system-features = kvm in
nix.conf for the builder).