|
2 | 2 |
|
3 | 3 | Minimal-overhead I/O benchmarking tool using [libblkio](https://gitlab.com/libblkio/libblkio). Designed for benchmarking vhost-user-blk backends (e.g., ubiblk) with busy-loop polling and direct libblkio calls. |
4 | 4 |
|
| 5 | +Also includes `io-profile`, a bpftrace-based IO + CPU profiling wrapper that can wrap any command (including blkbench) to produce a system-level profile. |
| 6 | + |
5 | 7 | ## Build |
6 | 8 |
|
7 | 9 | Prerequisites: libblkio (installed with pkg-config support), GCC, pthreads. |
@@ -128,6 +130,76 @@ blkbench: rw=randread, bs=4k, iodepth=32, numjobs=4, runtime=10s |
128 | 130 | ios: total=3200000, errors=0, flushes=0 |
129 | 131 | ``` |
130 | 132 |
|
| 133 | +## io-profile |
| 134 | + |
| 135 | +A reusable IO + CPU profiling wrapper. Runs any command and produces a standardized report with block IO metrics, syscall tracking, CPU utilization, and per-thread breakdown. |
| 136 | + |
| 137 | +Requires: `bpftrace` (root), `iostat`, `mpstat`, `awk`. |
| 138 | + |
| 139 | +### Usage |
| 140 | + |
| 141 | +```bash |
| 142 | +sudo ./io-profile [options] -- command [args...] |
| 143 | +``` |
| 144 | + |
| 145 | +### Options |
| 146 | + |
| 147 | +| Arg | Default | Description | |
| 148 | +|-----|---------|-------------| |
| 149 | +| `-o, --output DIR` | `./io-profile-results` | Output directory for reports | |
| 150 | +| `-d, --device DEV` | auto-detect | Block device to monitor | |
| 151 | +| `-p, --pid PID` | | Only trace this PID and children | |
| 152 | +| `--json` | off | Also emit machine-readable JSON summary | |
| 153 | +| `-v, --verbose` | off | Show progress messages | |
| 154 | + |
| 155 | +### Examples |
| 156 | + |
| 157 | +Profile a blkbench run: |
| 158 | +```bash |
| 159 | +sudo ./io-profile --json -o results/ -- \ |
| 160 | + ./blkbench --driver io_uring --path /tmp/test-disk.raw --rw randread --bs 4k --iodepth 32 --runtime 10 |
| 161 | +``` |
| 162 | + |
| 163 | +Profile any command: |
| 164 | +```bash |
| 165 | +sudo ./io-profile -d nvme0n1 -- fio job.fio |
| 166 | +sudo ./io-profile -- dd if=/dev/zero of=/tmp/test bs=1M count=1000 |
| 167 | +``` |
| 168 | + |
| 169 | +### What it collects |
| 170 | + |
| 171 | +- **Block IO**: throughput, IOPS, queue depth distribution, block size distribution, IO latency percentiles, sequential vs random ratio, read/write split |
| 172 | +- **Syscalls**: fsync/fdatasync/sync_file_range counts and rates, O_DIRECT detection |
| 173 | +- **CPU**: utilization percentiles, iowait, user/system split, context switch rate |
| 174 | +- **Per-thread**: IOPS, read/write bytes, fsync count, sequential percentage per thread |
| 175 | + |
| 176 | +### Example output |
| 177 | + |
| 178 | +``` |
| 179 | +=== IO Profile: ./blkbench --driver io_uring --path /tmp/test-disk.raw --rw randread ... === |
| 180 | +Duration: 10.2s | Device: nvme0n1 | Kernel: 6.8.0-94-generic |
| 181 | +
|
| 182 | +IO Summary: |
| 183 | + Throughput: Read 1250.3 MB/s | Write 0.0 MB/s |
| 184 | + IOPS: Read 320,012 | Write 0 |
| 185 | + IO Threads: 4 |
| 186 | + R/W Ratio: 100% read / 0% write |
| 187 | + Sequential: 12% |
| 188 | + fsync calls: 0 (0.0/s) |
| 189 | + O_DIRECT: Yes (4 opens) |
| 190 | +
|
| 191 | +Histograms: |
| 192 | + Queue Depth: p25=28 p50=31 p75=32 p99=32 max=32 |
| 193 | + Block Size: p25=4K p50=4K p75=4K p99=4K max=4K |
| 194 | + IO Latency: p25=8us p50=12us p75=18us p99=45us max=850us |
| 195 | +
|
| 196 | +CPU Summary: |
| 197 | + CPU Usage: p25=22% p50=25% p75=28% p99=35% |
| 198 | + IOWait: p25=0% p50=0% p75=1% p99=2% |
| 199 | + User/System: 25% user / 0% system |
| 200 | + Ctx Switches: 12,340/s |
| 201 | +``` |
| 202 | + |
131 | 203 | ## Architecture |
132 | 204 |
|
133 | 205 | - **Threading**: One thread per job, each with its own libblkio queue — no shared mutable state during the benchmark. |
|
0 commit comments