Skip to content

librx888: extract streaming engine; firmware pinning, CI tests, gr-rx888 design docs#23

Merged
ringof merged 10 commits into
mainfrom
claude/rx888-gnuradio-module-302or
May 15, 2026
Merged

librx888: extract streaming engine; firmware pinning, CI tests, gr-rx888 design docs#23
ringof merged 10 commits into
mainfrom
claude/rx888-gnuradio-module-302or

Conversation

@ringof
Copy link
Copy Markdown
Owner

@ringof ringof commented May 9, 2026

Summary

Started life as two design docs for a GNU Radio out-of-tree source block; grew into the full set of foundations needed before that block can be written.

librx888 (the streaming engine, now a shared library)

src/rx888_stream.c was 1,297 lines of libusb + threading with a CLI bolted on. Split into:

  • librx888.so / .a / .pc — the engine: device open, FX3 firmware upload, control plane, async transfer ring, event/writer threads, watchdog. Public API: rx888_open / rx888_start / rx888_stop / rx888_close / rx888_get_stats / rx888_config_init_default / rx888_strerror / rx888_version.
  • rx888_stream — ~200 lines: argparse + signal handling + "write the sample callback to stdout". Same CLI surface as before (-v, -q, -p, --ctrl-timeout, --stream-timeout, --watchdog-timeout, --fixup all preserved). Verbose stats now driven by rx888_get_stats().
  • Same code path will be reused by gr-rx888 (the GR source block from the design docs below) — one place to fix bugs, one place to add features.

Firmware pinning to ringof/rx888-firmware releases

firmware/SDDC_FX3.img was vendored as an opaque blob. Replaced with:

  • firmware/VERSION pinned to v0.1.0-rc1
  • firmware/SHA256SUMS with the expected checksum
  • firmware/.gitignore so the blob never gets committed
  • make firmware curls the asset, verifies the checksum
  • make firmware-latest bumps VERSION + SHA256SUMS to whatever rx888-firmware just released (used by CI)
  • make install refuses to run without the blob fetched

Worth flagging: the previously-vendored blob (824c5…) is not the same file as the v0.1.0-rc1 release (50ce6…). It had silently drifted. Fixed going forward by pinning.

Tests

Critically: there were no tests for the streamer. Now there are.

No-hardware (CI): make check

  • tests/librx888_api.c — C program against librx888.so. NULL safety, argument validation, default-config invariants, no-device path. Caught one real bug on its first run (rx888_open(&r, NULL) wasn't clearing *out; fixed in librx888.c).
  • tests/cli_smoke.shrx888_stream -h, asserts every documented flag is still in --help, error paths.

Hardware acceptance (bench): RX888_HW_TEST=1 make hw-check

  • tests/hw_smoke.sh — sustained capture, byte-rate within tolerance.
  • tests/hw_stop_start.sh — N consecutive open/start/stop/close cycles.
  • tests/hw_sample_check.py — content sanity (no constant runs, DC offset bounded, stddev non-trivial, not saturated).

Inspired by ringof/rx888-firmware/tests/fw_test.sh scenarios; implementation is fresh.

GitHub Actions

  • .github/workflows/ci.ymlapt install deps, build under -Werror, validate librx888's exported symbol surface against an explicit allowlist, run make check.
  • .github/workflows/firmware-bump.yml — cron + repository_dispatch trigger that runs make firmware-latest, builds and tests against the new blob, opens a draft PR.

gr-rx888 design docs (unchanged scope from original PR)

  • doc/gr-rx888/development-plan.md — scope, repo layout, HF-only control surface (no R82XX/VHF since current firmware drops those paths), 6 phases (~4½ days), Ubuntu 24.04 target.
  • doc/gr-rx888/stream-division-of-recovery.md — companion doc on which failure modes belong in librx888 (mechanism: restart()) vs. in the GR block layer (policy: recovery_mode + retries).

Line references in both updated to point at librx888.c symbol names rather than rx888_stream.c line numbers so they don't rot again.

Docs

README.md, doc/rx888_stream.md, doc/rx888_stream_testplan.md, doc/REAL_TEST.md all refreshed for librx888 + firmware pinning + the new test flow. New doc/librx888.md covers the library API, threading model, configuration, stats, and v0.0 limits.

Test plan

Non-hardware (CI runs these automatically):

  • make all succeeds under -Werror
  • make check passes (tests/librx888_api + tests/cli_smoke.sh)
  • librx888.so exports exactly the documented symbol set

Hardware (bench only, with RX888 plugged in):

  • make firmware fetches SDDC_FX3-v0.1.0-rc1.img and checksum-verifies
  • RX888_HW_TEST=1 make hw-check passes at 32 MS/s
  • 30-min rx888_stream -v -s 135000000 | pv -rab > /dev/null sustains ≥ 90% theoretical throughput
  • Existing pipeline still works: rx888_stream | rx888_dsp | iqrecord produces valid SigMF

Design review (no execution needed):

  • HF-only control surface in the gr-rx888 plan matches current firmware
  • In-librx888 (mechanism) vs. in-GR-block (policy) recovery split is right
  • Ubuntu 24.04 + apt install gnuradio-dev is OK as the v1 GR target (radioconda noted as future)

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq

claude added 5 commits May 9, 2026 00:08
Captures the design for an out-of-tree GNU Radio 3.10 module wrapping
rx888_stream as an int16-real source block (no DSP, no SoapySDR, HF
path only). The recovery doc spells out which failure modes belong in
the libusb driver layer vs. which belong in the GR block, so a wedged
FX3 can be brought back without tearing down the flowgraph.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
Splits rx888_stream into librx888 (libusb + threading + descriptor
parsing + queue handoff + watchdog) and a thin CLI wrapper.  The
wire-protocol path is unchanged; only the output target moves from a
hard-coded stdout write to a user-supplied sample callback.

This is what gr-rx888 (https://github.com/ringof/gr-rx888) will link
against — same code path for the CLI tool and the GNU Radio block,
so there's one place to fix bugs and one place to add features.

Makefile builds librx888.so, librx888.a, and librx888.pc; rx888_stream
links the shared lib via \$ORIGIN rpath in-tree and via LIBDIR after
install.  Trims the now-orphaned stream_setup/teardown declarations
from include/rx888.h.

CLI surface narrows: the queue-depth, request-packets, watchdog,
xtal, and verbose-stats flags from the old streamer are dropped in
favour of the library's compile-time defaults.  Same defaults the
CLI used, so command lines without those flags keep working.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
The previous commit dropped the CLI flags for queue depth, request
size, control/stream/watchdog timeouts, and verbose stats — those
were useful diagnostic and tuning knobs, not just clutter.

Library changes:
  - rx888_config_t gains queue_depth, req_packets, ctrl_timeout_ms,
    stream_timeout_ms, watchdog_ms.  Library still owns the defaults
    (32 / 1024 / 5000 / 0 / 3000) but callers can override.
  - rx888_config_init_default() populates the struct.  rx888_open()
    rejects zero queue_depth / req_packets / ctrl_timeout_ms so
    callers can't accidentally pass an uninitialised config.
  - rx888_stats_t + rx888_get_stats() expose ok_xfers, bad_xfers,
    bytes_out, in_flight, last_cb_ms atomics for monitoring.

CLI changes:
  - -v restored: prints stats to stderr once per second.
  - -q, -p restored: queue depth and request size.
  - --ctrl-timeout, --stream-timeout, --watchdog-timeout restored.
  - --fixup restored as independent flag (was bundled with -r).

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
Three things in one commit because they share the same lifecycle:

1. No-hardware tests (run in CI on every PR):
   - tests/librx888_api.c exercises the public API: NULL safety,
     argument validation, default-config invariants, and the
     no-device path.  First run already caught one real bug:
     rx888_open(&r, NULL) wasn't honouring the "*out cleared on
     error" contract.  Fixed in librx888.c.
   - tests/cli_smoke.sh exercises rx888_stream's argparse: -h works,
     every documented flag is present in --help, unknown flags exit
     non-zero, the no-device run exits 1 (not 0, not segfault).
   - make check builds and runs both.

2. Hardware acceptance scripts (gated on RX888_HW_TEST=1):
   - tests/hw_smoke.sh: sustained capture for N seconds, verify
     byte rate within tolerance.
   - tests/hw_stop_start.sh: N consecutive open/start/stop/close
     cycles, verify each produces data and exits cleanly.
   - tests/hw_sample_check.py: post-capture sanity (no constant
     runs, DC offset bounded, stddev non-trivial, not saturated).
   - make hw-check runs all three.
   These are inspired by the equivalent scenarios in
   rx888-firmware/tests/fw_test.sh; the implementation is fresh.

3. GitHub Actions:
   - .github/workflows/ci.yml: apt-installs deps, builds, validates
     librx888's exported symbol set against an explicit allowlist,
     runs make check.
   - .github/workflows/firmware-bump.yml: cron + repository_dispatch
     trigger that runs make firmware-latest, builds against the new
     blob, runs the non-hardware tests, opens a draft PR.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
- README.md: rewrites build/install/quick-start around librx888 +
  make firmware + make check + make hw-check.  Drops the silently-
  unreliable modprobe.d advice in favour of tmpfiles.d.  Adds gr-rx888
  pointer.  Updates project layout and license note.
- doc/librx888.md (new): API summary, threading model, configuration
  reference, stats, linking, v0.0 limits.
- doc/rx888_stream.md: rewritten as a thin-CLI doc that points at
  librx888 for the engine.  Trimmed historical sections.
- doc/rx888_stream_testplan.md: rebuilt around make check / make
  hw-check.  References the actual scripts under tests/.
- doc/REAL_TEST.md: prepends a "make hw-check" quick-path section;
  preserves the existing pipeline-level scenarios.
- doc/gr-rx888/development-plan.md and stream-division-of-recovery.md:
  fix stale src/rx888_stream.c line refs that became wrong after the
  librx888 split.  Most pointers now name functions in librx888.c
  (writer_main, event_main, rx888_open, etc.) so they don't rot
  again on the next refactor.

Verified: make all and make check both green after the doc changes.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
@ringof ringof changed the title doc/gr-rx888: development plan + stream-division-of-recovery librx888: extract streaming engine; firmware pinning, CI tests, gr-rx888 design docs May 15, 2026
claude added 4 commits May 15, 2026 03:01
The previous version gave the capture a SECS+3 second window and then
compared the captured bytes against the bytes for SECS seconds.
Result: a healthy streamer at 32 MS/s produced ~770 MB in ~12s and
got flagged as "byte count above upper bound (impossible? something
is wrong)" — but nothing was wrong, my arithmetic was.

Fix: cap the capture with `head -c \$((RATE * 2 * SECS))`.  head
closes the pipe at exactly the target, rx888_stream sees EPIPE and
exits cleanly.  The captured file is exactly target_bytes if
throughput was met, or less if it wasn't — which is the real failure
mode we care about.  Wall-clock deadline is enforced by an outer
`timeout` as a safety net.

Also prints elapsed time and an average MiB/s figure so the log
shows the actual rate even on the pass path.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
set -u + reading PIPESTATUS into two scalars on separate lines is a
known footgun: the first assignment is itself a command that resets
PIPESTATUS, so the second line sees an unbound array element and
aborts.  Reported by `make hw-check` on real hardware:

    line 55: PIPESTATUS[1]: unbound variable

Fix: copy "${PIPESTATUS[@]}" into a regular array in one assignment,
then read indices off that.  Also tolerate missing elements with
:-0 as a belt-and-braces for any bash version that surfaces unset
indices differently.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
The 100-LSB minimum was a guess.  A real RX888 capture at 32 MS/s
with no antenna / quiet front-end legitimately lands around stddev
= 5-10 LSB and tripped the test on first bench run.

The actual failure this check is meant to catch is "buffer stuck
at a constant value" (stddev == 0).  The longest_run check already
catches stuck-buffer reuse much more reliably, so this is a
belt-and-braces "is anything moving" bound, not a signal-quality
gate.  Default 1 lets it serve that role without false positives on
quiet captures.  Override via RX888_MIN_STDDEV if you have a known
signal source and want a tighter check.

Also changes stddev field to float and bumps the printed precision
to two decimals so values in the low single digits are readable.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
Adds three flavours of dynamic checking on top of the existing
`make check`:

  make check-asan       # rebuild with -fsanitize=address,undefined
                        # and run the no-hardware tests
  make check-valgrind   # rebuild without -march=native and run
                        # tests/librx888_api under valgrind memcheck
  make SAN=<spec> ...   # underlying knob: any clang/gcc fsanitize
                        # spec.  SAN=thread for race detection,
                        # SAN=memory for clang MSan, etc.

The SAN knob also propagates to the streaming binaries, so the bench
command `RX888_HW_TEST=1 make SAN=address,undefined hw-check`
exercises librx888's transfer ring + writer/event threads under
ASan against real hardware.  That's where ASan earns its keep —
the no-hw tests exit before any USB allocation.

Build details:

  - SAN strips -O3 and -fstack-protector-all (sanitizers want low
    opt and ship their own stack guards).
  - Both SAN and VALGRIND strip -march=native.  Found by trial:
    on this build host `memset` got vectorised to AVX-512 and
    valgrind's VM SIGILL'd inside rx888_config_init_default.
  - SAN_LDFLAGS threads through library + binary + test link rules
    so libasan/libubsan are pulled in everywhere.
  - tests/valgrind.supp silences the standard libusb static-init
    and ld.so noise; real leaks in librx888 still surface.  Not
    actually triggered on Ubuntu 24.04 / libusb 1.0.27; left in
    place for portability to older glibc/libusb.

CI workflow gains two jobs (asan-ubsan, valgrind) running in
parallel with the existing build-and-test job.  Both blocking;
neither needs hardware.

Testplan doc updated; new exit criteria include all three
no-hardware passes plus a bench ASan run before merging to main.

Verified locally: plain check, check-asan, and check-valgrind all
green; ABI symbol surface unchanged.

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
@ringof ringof marked this pull request as ready for review May 15, 2026 03:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c0aa5ab0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/librx888.c Outdated
Comment thread src/librx888.c
Comment thread src/rx888_stream.c
PR #23 review caught three real bugs.  Fixing all three together
because two are in the same teardown function.

1. Don't free buffers when transfers are leaked (src/librx888.c:rx888_stop)

   The cancellation drain in rx888_stop waits up to 2 seconds for
   in-flight transfers to complete.  If they don't, the existing code
   correctly skipped libusb_free_transfer() on the still-owned-by-
   libusb transfer objects (use-after-free if a late callback fires).
   But it then unconditionally freed every r->buf[i] in the next
   loop — leaving libusb with a dangling t->buffer pointer.  Gate
   buffer-free on the same !leaked condition.  Matches the old
   monolithic streamer's behaviour, which intentionally leaked both
   transfers and buffers in this case.

2. Don't pthread_join the writer thread from its own callback
   (src/librx888.c:rx888_stop)

   The header documents rx888_stop() as safe to call from inside the
   user sample callback (which runs on writer_thread).  The
   implementation tried to pthread_join(writer_thread) on every call,
   which from the writer's own context returns EDEADLK; we ignored
   the error, cleared writer_started, then xq_destroy'd the queue
   that the writer was about to come back and pop from.  Result:
   UB when the callback returned.

   Detect pthread_self() == writer_thread, take a deferred path: set
   stop_flag, cancel transfers, xq_shutdown the queue so the writer's
   xq_pop_blocking() exits, and return.  Caller must invoke
   rx888_close() (or rx888_stop() again) from a different thread to
   finish teardown.  Updated header docstring to match the new
   contract.

3. Notice internal stops in the CLI (src/rx888_stream.c)

   librx888's stop_flag and rx888_stream's g_stop are separate.  When
   the library terminates streaming because of NO_DEVICE / watchdog /
   libusb error, only stop_flag flips.  The CLI was sleeping on g_stop
   alone, so a hardware failure left rx888_stream blocked in nanosleep
   forever instead of exiting and reporting the failure (which is what
   the old in-file event loop did on the same paths).

   Add a small public API: rx888_is_running().  CLI polls it once per
   200ms tick alongside g_stop; on internal stop it prints a clear
   message and exits 1.  Added to CI's exported-symbol allowlist and
   exercised in tests/librx888_api.c.

Verified locally: make check, make check-asan, and make check-valgrind
all green.  librx888.so exports exactly the documented 9 symbols.

Resolves:
- #23 (comment)
- #23 (comment)
- #23 (comment)

https://claude.ai/code/session_01BFQXgCjyoBx7aGJ5hE5Xeq
@ringof ringof merged commit 0259b14 into main May 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants