Skip to content

feat: counter-based RNG for deterministic JER smearing#1535

Open
alefisico wants to merge 4 commits intoscikit-hep:masterfrom
alefisico:feat/jer_smear_cbrng
Open

feat: counter-based RNG for deterministic JER smearing#1535
alefisico wants to merge 4 commits intoscikit-hep:masterfrom
alefisico:feat/jer_smear_cbrng

Conversation

@alefisico
Copy link
Copy Markdown
Contributor

Summary

The current JER smearing in CorrectedJetsFactory uses numpy.random, which produces different random numbers depending on how events are partitioned across chunks. This means the same jet can get different smearing corrections depending on the processing setup, making results non-reproducible.

This PR adds an opt-in counter-based RNG (CBRNG) using the Squares algorithm (arXiv:2004.06278) that generates deterministic random numbers based on per-jet physics quantities (event number, phi, eta). The same jet always gets the same random number regardless of partitioning, chunk size, or parallelization strategy.

This developement was created by @chuyuanliu in our framework, and I am just putting it in the central coffea because I consider it important for other users.

Key points:

  • New event_number parameter on build() — when provided, uses CBRNG; when omitted (None), falls back to the legacy RNG. Fully backwards compatible.
  • Per-jet 128-bit counter: [event_number(64), phi_bits(32) << 32 | eta_bits(32)]
  • Works with both eager awkward arrays and dask-awkward

Files changed

  • src/coffea/jetmet_tools/CorrectedJetsFactory.py — added event_number/seeds parameters to build(), CBRNG path alongside legacy RNG
  • src/coffea/jetmet_tools/cbrng.py — new module implementing the Squares CBRNG algorithm
  • tests/test_jetmet_correctionlib_adapters.py — added test_corrected_jets_factory_cbrng verifying determinism and partition independence

Test plan

  • test_corrected_jets_factory_cbrng — verifies same jets produce same smearing across calls
  • All 17 existing tests pass unchanged (no regressions)

…ctory

- Added a new Squares class for deterministic random number generation based on event number, phi, and eta.
- Updated rand_gauss function to utilize the new RNG for partition-independent results.
- Modified build method to accept event numbers for RNG seeding.
- Introduced tests to verify reproducibility and partition independence of the RNG outputs.
)

# Build 128-bit counter: [event_number(64), phi_bits(32) << 32 | eta_bits(32)]
counter = numpy.empty((len(phi_arr), 2), dtype=numpy.uint64)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this matters but you could implement building the counter this in fewer operations using array.byteswap() (it inverts the phi bits, but that doesn't matter much).

something like:

counter[:, 0] = event_number
counter[:, 1] = numpy.round(phi_arr, 3).view(numpy.uint32).astype(numpy.uint64).byteswap()
counter[:, 1] |= numpy.round(eta_arr, 3).view(numpy.uint32).astype(numpy.uint64)

I'm unsure if there's any major performance benefit or drawback to doing it this way, mostly just fewer lines eaten by the interpreter.

@lgray
Copy link
Copy Markdown
Collaborator

lgray commented Mar 26, 2026

Looks like I need to fix the macos CI environment.

@alefisico alefisico force-pushed the feat/jer_smear_cbrng branch from 3f295ae to 6d48502 Compare March 26, 2026 20:42
@ikrommyd
Copy link
Copy Markdown
Collaborator

This might be a dumb question, but can't you just seed already available PRNGs (numpy for example) with physics quantities so that you get deterministic random numbers?

@chuyuanliu
Copy link
Copy Markdown

This might be a dumb question, but can't you just seed already available PRNGs (numpy for example) with physics quantities so that you get deterministic random numbers?

This is actually a great question. When using stateful standard PRNGs (like MT19937 or PCG64), you can either create an instance for each partition or for each event/object.

In the former case, the seed will be partition-dependent, and so is the random sequence.

In the latter case, creating a huge number of RNG instances with different seeds is not vectorized in numpy, so you would have to introduce a Python-level for loop over the objects. Though, in practice, the quality of the random sequence generated by changing the seed and getting the first value is usually okay, the statistical properties are only guaranteed along a sequence generated from a single seed. In the worst case, it may introduce unexpected correlations. CBRNGs are designed to solve these exact issues.

Note that although numpy has an implementation of CBRNGs like Philox, the API only supports being used in the same schema as other stateful PRNGs (generating the next n values instead of mapping specific counters to values).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants