Skip to content

feat: add BufferManager for more fine-grained control over VirtualArrays#1530

Open
pfackeldey wants to merge 2 commits intoscikit-hep:masterfrom
pfackeldey:pfackeldey/buffer_manager
Open

feat: add BufferManager for more fine-grained control over VirtualArrays#1530
pfackeldey wants to merge 2 commits intoscikit-hep:masterfrom
pfackeldey:pfackeldey/buffer_manager

Conversation

@pfackeldey
Copy link
Copy Markdown
Collaborator

@pfackeldey pfackeldey commented Mar 10, 2026

This PR adds a BufferManager class that one can opt-in to use for more fine-grained control over virtual arrays and their buffers/generators. This only works when constructing events with a buffer_cache and mode="virtual".

The BufferManager currently allows 2 things:

  1. You can "unmaterialize" at any time arrays and thus free memory. Otherwise, those buffers would only be cleaned up in the moment events gets cleaned up. Accessing them again is no problem, but likely involves re-reading from disk.
def process(self, events):
  manager = coffea.nanoevents.BufferManager(events)

  jets = events.Jet[events.Jet.pt > 50.0]

  # frees the underlying memory of the materialized buffers of `events.Jet.pt` 
  manager.clear(events.Jet.pt)

This is useful for more fine-grained control over memory usage. Users can free memory more aggressively and basically get rid of memory that, e.g., is only used once and never again.

  1. You can start prefetching buffers in background thread(s) similar to batch prefetching of ML dataloaders, just with buffers for our awkward array fields. This is useful to run IO and CPU bound steps concurrently.
def process(self, events):
  manager = coffea.nanoevents.BufferManager(events)

  # starts prefetching all `events.Jet` columns in background threads
  with manager.prefetch(events.Jet, nthreads=4):
    out = my_analysis(events)
    
  return out

The prefetching seems to be pretty efficient for runtime, see the following benchmarks:

Benchmark setup:

  • Code: https://github.com/ikrommyd/virtual-array-agc/blob/main/ttbar_analysis_pipeline.ipynb
  • Input file (copied locally): root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/130000/009086DB-1E42-7545-9A35-1433EC89D04B.root
  • Single Core 'emulation' with: cpulimit -l 100 -- python ...
  • Benchmarked with hyperfine --warmup 3 ...

Results:

Chunk size Default (no prefetch) NanoEvents' preload= 1 thread prefetch 4 threads prefetch
100k 4.758 s ± 0.073 s 4.741 s ± 0.050 s 4.471 s ± 0.068 s 4.349 s ± 0.055 s
500k 8.193 s ± 0.190 s 8.084 s ± 0.049 s 6.863 s ± 0.123 s 6.180 s ± 0.098 s
1M 12.076 s ± 0.107 s 12.400 s ± 0.332 s 9.961 s ± 0.232 s 8.549 s ± 0.175 s

NanoEventsFactory's preload= is mainly beneficial for streaming over network, so this is why this isn't so beneficial here (input file is local).

Could not try the prefetching yet with free-threaded python because not all dependencies support it yet, but it should only get better than these results.

To make proper use of this it needs the buffer cache PR #1508 first in.

@ikrommyd
Copy link
Copy Markdown
Collaborator

@lgray any takes on the API here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants