feat: counter-based RNG for deterministic JER smearing#1535
feat: counter-based RNG for deterministic JER smearing#1535alefisico wants to merge 4 commits intoscikit-hep:masterfrom
Conversation
…ctory - Added a new Squares class for deterministic random number generation based on event number, phi, and eta. - Updated rand_gauss function to utilize the new RNG for partition-independent results. - Modified build method to accept event numbers for RNG seeding. - Introduced tests to verify reproducibility and partition independence of the RNG outputs.
a6b1fd0 to
ce1a5b3
Compare
| ) | ||
|
|
||
| # Build 128-bit counter: [event_number(64), phi_bits(32) << 32 | eta_bits(32)] | ||
| counter = numpy.empty((len(phi_arr), 2), dtype=numpy.uint64) |
There was a problem hiding this comment.
Not sure if this matters but you could implement building the counter this in fewer operations using array.byteswap() (it inverts the phi bits, but that doesn't matter much).
something like:
counter[:, 0] = event_number
counter[:, 1] = numpy.round(phi_arr, 3).view(numpy.uint32).astype(numpy.uint64).byteswap()
counter[:, 1] |= numpy.round(eta_arr, 3).view(numpy.uint32).astype(numpy.uint64)I'm unsure if there's any major performance benefit or drawback to doing it this way, mostly just fewer lines eaten by the interpreter.
|
Looks like I need to fix the macos CI environment. |
3f295ae to
6d48502
Compare
for more information, see https://pre-commit.ci
|
This might be a dumb question, but can't you just seed already available PRNGs (numpy for example) with physics quantities so that you get deterministic random numbers? |
This is actually a great question. When using stateful standard PRNGs (like MT19937 or PCG64), you can either create an instance for each partition or for each event/object. In the former case, the seed will be partition-dependent, and so is the random sequence. In the latter case, creating a huge number of RNG instances with different seeds is not vectorized in Note that although |
Summary
The current JER smearing in
CorrectedJetsFactoryusesnumpy.random, which produces different random numbers depending on how events are partitioned across chunks. This means the same jet can get different smearing corrections depending on the processing setup, making results non-reproducible.This PR adds an opt-in counter-based RNG (CBRNG) using the Squares algorithm (arXiv:2004.06278) that generates deterministic random numbers based on per-jet physics quantities (event number, phi, eta). The same jet always gets the same random number regardless of partitioning, chunk size, or parallelization strategy.
This developement was created by @chuyuanliu in our framework, and I am just putting it in the central coffea because I consider it important for other users.
Key points:
event_numberparameter onbuild()— when provided, uses CBRNG; when omitted (None), falls back to the legacy RNG. Fully backwards compatible.[event_number(64), phi_bits(32) << 32 | eta_bits(32)]Files changed
src/coffea/jetmet_tools/CorrectedJetsFactory.py— addedevent_number/seedsparameters tobuild(), CBRNG path alongside legacy RNGsrc/coffea/jetmet_tools/cbrng.py— new module implementing the Squares CBRNG algorithmtests/test_jetmet_correctionlib_adapters.py— addedtest_corrected_jets_factory_cbrngverifying determinism and partition independenceTest plan
test_corrected_jets_factory_cbrng— verifies same jets produce same smearing across calls