librx888: extract streaming engine; firmware pinning, CI tests, gr-rx888 design docs#23
Merged
Merged
Conversation
Captures the design for an out-of-tree GNU Radio 3.10 module wrapping rx888_stream as an int16-real source block (no DSP, no SoapySDR, HF path only). The recovery doc spells out which failure modes belong in the libusb driver layer vs. which belong in the GR block, so a wedged FX3 can be brought back without tearing down the flowgraph. https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
Splits rx888_stream into librx888 (libusb + threading + descriptor parsing + queue handoff + watchdog) and a thin CLI wrapper. The wire-protocol path is unchanged; only the output target moves from a hard-coded stdout write to a user-supplied sample callback. This is what gr-rx888 (https://github.com/ringof/gr-rx888) will link against — same code path for the CLI tool and the GNU Radio block, so there's one place to fix bugs and one place to add features. Makefile builds librx888.so, librx888.a, and librx888.pc; rx888_stream links the shared lib via \$ORIGIN rpath in-tree and via LIBDIR after install. Trims the now-orphaned stream_setup/teardown declarations from include/rx888.h. CLI surface narrows: the queue-depth, request-packets, watchdog, xtal, and verbose-stats flags from the old streamer are dropped in favour of the library's compile-time defaults. Same defaults the CLI used, so command lines without those flags keep working. https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
The previous commit dropped the CLI flags for queue depth, request
size, control/stream/watchdog timeouts, and verbose stats — those
were useful diagnostic and tuning knobs, not just clutter.
Library changes:
- rx888_config_t gains queue_depth, req_packets, ctrl_timeout_ms,
stream_timeout_ms, watchdog_ms. Library still owns the defaults
(32 / 1024 / 5000 / 0 / 3000) but callers can override.
- rx888_config_init_default() populates the struct. rx888_open()
rejects zero queue_depth / req_packets / ctrl_timeout_ms so
callers can't accidentally pass an uninitialised config.
- rx888_stats_t + rx888_get_stats() expose ok_xfers, bad_xfers,
bytes_out, in_flight, last_cb_ms atomics for monitoring.
CLI changes:
- -v restored: prints stats to stderr once per second.
- -q, -p restored: queue depth and request size.
- --ctrl-timeout, --stream-timeout, --watchdog-timeout restored.
- --fixup restored as independent flag (was bundled with -r).
https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
Three things in one commit because they share the same lifecycle:
1. No-hardware tests (run in CI on every PR):
- tests/librx888_api.c exercises the public API: NULL safety,
argument validation, default-config invariants, and the
no-device path. First run already caught one real bug:
rx888_open(&r, NULL) wasn't honouring the "*out cleared on
error" contract. Fixed in librx888.c.
- tests/cli_smoke.sh exercises rx888_stream's argparse: -h works,
every documented flag is present in --help, unknown flags exit
non-zero, the no-device run exits 1 (not 0, not segfault).
- make check builds and runs both.
2. Hardware acceptance scripts (gated on RX888_HW_TEST=1):
- tests/hw_smoke.sh: sustained capture for N seconds, verify
byte rate within tolerance.
- tests/hw_stop_start.sh: N consecutive open/start/stop/close
cycles, verify each produces data and exits cleanly.
- tests/hw_sample_check.py: post-capture sanity (no constant
runs, DC offset bounded, stddev non-trivial, not saturated).
- make hw-check runs all three.
These are inspired by the equivalent scenarios in
rx888-firmware/tests/fw_test.sh; the implementation is fresh.
3. GitHub Actions:
- .github/workflows/ci.yml: apt-installs deps, builds, validates
librx888's exported symbol set against an explicit allowlist,
runs make check.
- .github/workflows/firmware-bump.yml: cron + repository_dispatch
trigger that runs make firmware-latest, builds against the new
blob, runs the non-hardware tests, opens a draft PR.
https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
- README.md: rewrites build/install/quick-start around librx888 + make firmware + make check + make hw-check. Drops the silently- unreliable modprobe.d advice in favour of tmpfiles.d. Adds gr-rx888 pointer. Updates project layout and license note. - doc/librx888.md (new): API summary, threading model, configuration reference, stats, linking, v0.0 limits. - doc/rx888_stream.md: rewritten as a thin-CLI doc that points at librx888 for the engine. Trimmed historical sections. - doc/rx888_stream_testplan.md: rebuilt around make check / make hw-check. References the actual scripts under tests/. - doc/REAL_TEST.md: prepends a "make hw-check" quick-path section; preserves the existing pipeline-level scenarios. - doc/gr-rx888/development-plan.md and stream-division-of-recovery.md: fix stale src/rx888_stream.c line refs that became wrong after the librx888 split. Most pointers now name functions in librx888.c (writer_main, event_main, rx888_open, etc.) so they don't rot again on the next refactor. Verified: make all and make check both green after the doc changes. https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
The previous version gave the capture a SECS+3 second window and then compared the captured bytes against the bytes for SECS seconds. Result: a healthy streamer at 32 MS/s produced ~770 MB in ~12s and got flagged as "byte count above upper bound (impossible? something is wrong)" — but nothing was wrong, my arithmetic was. Fix: cap the capture with `head -c \$((RATE * 2 * SECS))`. head closes the pipe at exactly the target, rx888_stream sees EPIPE and exits cleanly. The captured file is exactly target_bytes if throughput was met, or less if it wasn't — which is the real failure mode we care about. Wall-clock deadline is enforced by an outer `timeout` as a safety net. Also prints elapsed time and an average MiB/s figure so the log shows the actual rate even on the pass path. https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
set -u + reading PIPESTATUS into two scalars on separate lines is a
known footgun: the first assignment is itself a command that resets
PIPESTATUS, so the second line sees an unbound array element and
aborts. Reported by `make hw-check` on real hardware:
line 55: PIPESTATUS[1]: unbound variable
Fix: copy "${PIPESTATUS[@]}" into a regular array in one assignment,
then read indices off that. Also tolerate missing elements with
:-0 as a belt-and-braces for any bash version that surfaces unset
indices differently.
https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
The 100-LSB minimum was a guess. A real RX888 capture at 32 MS/s with no antenna / quiet front-end legitimately lands around stddev = 5-10 LSB and tripped the test on first bench run. The actual failure this check is meant to catch is "buffer stuck at a constant value" (stddev == 0). The longest_run check already catches stuck-buffer reuse much more reliably, so this is a belt-and-braces "is anything moving" bound, not a signal-quality gate. Default 1 lets it serve that role without false positives on quiet captures. Override via RX888_MIN_STDDEV if you have a known signal source and want a tighter check. Also changes stddev field to float and bumps the printed precision to two decimals so values in the low single digits are readable. https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
Adds three flavours of dynamic checking on top of the existing
`make check`:
make check-asan # rebuild with -fsanitize=address,undefined
# and run the no-hardware tests
make check-valgrind # rebuild without -march=native and run
# tests/librx888_api under valgrind memcheck
make SAN=<spec> ... # underlying knob: any clang/gcc fsanitize
# spec. SAN=thread for race detection,
# SAN=memory for clang MSan, etc.
The SAN knob also propagates to the streaming binaries, so the bench
command `RX888_HW_TEST=1 make SAN=address,undefined hw-check`
exercises librx888's transfer ring + writer/event threads under
ASan against real hardware. That's where ASan earns its keep —
the no-hw tests exit before any USB allocation.
Build details:
- SAN strips -O3 and -fstack-protector-all (sanitizers want low
opt and ship their own stack guards).
- Both SAN and VALGRIND strip -march=native. Found by trial:
on this build host `memset` got vectorised to AVX-512 and
valgrind's VM SIGILL'd inside rx888_config_init_default.
- SAN_LDFLAGS threads through library + binary + test link rules
so libasan/libubsan are pulled in everywhere.
- tests/valgrind.supp silences the standard libusb static-init
and ld.so noise; real leaks in librx888 still surface. Not
actually triggered on Ubuntu 24.04 / libusb 1.0.27; left in
place for portability to older glibc/libusb.
CI workflow gains two jobs (asan-ubsan, valgrind) running in
parallel with the existing build-and-test job. Both blocking;
neither needs hardware.
Testplan doc updated; new exit criteria include all three
no-hardware passes plus a bench ASan run before merging to main.
Verified locally: plain check, check-asan, and check-valgrind all
green; ABI symbol surface unchanged.
https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c0aa5ab0d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
PR #23 review caught three real bugs. Fixing all three together because two are in the same teardown function. 1. Don't free buffers when transfers are leaked (src/librx888.c:rx888_stop) The cancellation drain in rx888_stop waits up to 2 seconds for in-flight transfers to complete. If they don't, the existing code correctly skipped libusb_free_transfer() on the still-owned-by- libusb transfer objects (use-after-free if a late callback fires). But it then unconditionally freed every r->buf[i] in the next loop — leaving libusb with a dangling t->buffer pointer. Gate buffer-free on the same !leaked condition. Matches the old monolithic streamer's behaviour, which intentionally leaked both transfers and buffers in this case. 2. Don't pthread_join the writer thread from its own callback (src/librx888.c:rx888_stop) The header documents rx888_stop() as safe to call from inside the user sample callback (which runs on writer_thread). The implementation tried to pthread_join(writer_thread) on every call, which from the writer's own context returns EDEADLK; we ignored the error, cleared writer_started, then xq_destroy'd the queue that the writer was about to come back and pop from. Result: UB when the callback returned. Detect pthread_self() == writer_thread, take a deferred path: set stop_flag, cancel transfers, xq_shutdown the queue so the writer's xq_pop_blocking() exits, and return. Caller must invoke rx888_close() (or rx888_stop() again) from a different thread to finish teardown. Updated header docstring to match the new contract. 3. Notice internal stops in the CLI (src/rx888_stream.c) librx888's stop_flag and rx888_stream's g_stop are separate. When the library terminates streaming because of NO_DEVICE / watchdog / libusb error, only stop_flag flips. The CLI was sleeping on g_stop alone, so a hardware failure left rx888_stream blocked in nanosleep forever instead of exiting and reporting the failure (which is what the old in-file event loop did on the same paths). Add a small public API: rx888_is_running(). CLI polls it once per 200ms tick alongside g_stop; on internal stop it prints a clear message and exits 1. Added to CI's exported-symbol allowlist and exercised in tests/librx888_api.c. Verified locally: make check, make check-asan, and make check-valgrind all green. librx888.so exports exactly the documented 9 symbols. Resolves: - #23 (comment) - #23 (comment) - #23 (comment) https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Started life as two design docs for a GNU Radio out-of-tree source block; grew into the full set of foundations needed before that block can be written.
librx888 (the streaming engine, now a shared library)
src/rx888_stream.cwas 1,297 lines of libusb + threading with a CLI bolted on. Split into:librx888.so/.a/.pc— the engine: device open, FX3 firmware upload, control plane, async transfer ring, event/writer threads, watchdog. Public API:rx888_open/rx888_start/rx888_stop/rx888_close/rx888_get_stats/rx888_config_init_default/rx888_strerror/rx888_version.rx888_stream— ~200 lines: argparse + signal handling + "write the sample callback to stdout". Same CLI surface as before (-v,-q,-p,--ctrl-timeout,--stream-timeout,--watchdog-timeout,--fixupall preserved). Verbose stats now driven byrx888_get_stats().gr-rx888(the GR source block from the design docs below) — one place to fix bugs, one place to add features.Firmware pinning to
ringof/rx888-firmwarereleasesfirmware/SDDC_FX3.imgwas vendored as an opaque blob. Replaced with:firmware/VERSIONpinned tov0.1.0-rc1firmware/SHA256SUMSwith the expected checksumfirmware/.gitignoreso the blob never gets committedmake firmwarecurls the asset, verifies the checksummake firmware-latestbumps VERSION + SHA256SUMS to whatever rx888-firmware just released (used by CI)make installrefuses to run without the blob fetchedWorth flagging: the previously-vendored blob (
824c5…) is not the same file as the v0.1.0-rc1 release (50ce6…). It had silently drifted. Fixed going forward by pinning.Tests
Critically: there were no tests for the streamer. Now there are.
No-hardware (CI):
make checktests/librx888_api.c— C program againstlibrx888.so. NULL safety, argument validation, default-config invariants, no-device path. Caught one real bug on its first run (rx888_open(&r, NULL)wasn't clearing*out; fixed inlibrx888.c).tests/cli_smoke.sh—rx888_stream -h, asserts every documented flag is still in--help, error paths.Hardware acceptance (bench):
RX888_HW_TEST=1 make hw-checktests/hw_smoke.sh— sustained capture, byte-rate within tolerance.tests/hw_stop_start.sh— N consecutive open/start/stop/close cycles.tests/hw_sample_check.py— content sanity (no constant runs, DC offset bounded, stddev non-trivial, not saturated).Inspired by
ringof/rx888-firmware/tests/fw_test.shscenarios; implementation is fresh.GitHub Actions
.github/workflows/ci.yml—apt installdeps, build under-Werror, validate librx888's exported symbol surface against an explicit allowlist, runmake check..github/workflows/firmware-bump.yml— cron +repository_dispatchtrigger that runsmake firmware-latest, builds and tests against the new blob, opens a draft PR.gr-rx888 design docs (unchanged scope from original PR)
doc/gr-rx888/development-plan.md— scope, repo layout, HF-only control surface (no R82XX/VHF since current firmware drops those paths), 6 phases (~4½ days), Ubuntu 24.04 target.doc/gr-rx888/stream-division-of-recovery.md— companion doc on which failure modes belong in librx888 (mechanism:restart()) vs. in the GR block layer (policy:recovery_mode+ retries).Line references in both updated to point at
librx888.csymbol names rather thanrx888_stream.cline numbers so they don't rot again.Docs
README.md,doc/rx888_stream.md,doc/rx888_stream_testplan.md,doc/REAL_TEST.mdall refreshed for librx888 + firmware pinning + the new test flow. Newdoc/librx888.mdcovers the library API, threading model, configuration, stats, and v0.0 limits.Test plan
Non-hardware (CI runs these automatically):
make allsucceeds under-Werrormake checkpasses (tests/librx888_api+tests/cli_smoke.sh)librx888.soexports exactly the documented symbol setHardware (bench only, with RX888 plugged in):
make firmwarefetchesSDDC_FX3-v0.1.0-rc1.imgand checksum-verifiesRX888_HW_TEST=1 make hw-checkpasses at 32 MS/srx888_stream -v -s 135000000 | pv -rab > /dev/nullsustains ≥ 90% theoretical throughputrx888_stream | rx888_dsp | iqrecordproduces valid SigMFDesign review (no execution needed):
apt install gnuradio-devis OK as the v1 GR target (radioconda noted as future)https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq