feat: add BufferManager for more fine-grained control over VirtualArrays by pfackeldey · Pull Request #1530 · scikit-hep/coffea

pfackeldey · 2026-03-10T14:30:58Z

This PR adds a BufferManager class that one can opt-in to use for more fine-grained control over virtual arrays and their buffers/generators. This only works when constructing events with a buffer_cache and mode="virtual".

The BufferManager currently allows 2 things:

You can "unmaterialize" at any time arrays and thus free memory. Otherwise, those buffers would only be cleaned up in the moment events gets cleaned up. Accessing them again is no problem, but likely involves re-reading from disk.

def process(self, events):
  manager = coffea.nanoevents.BufferManager(events)

  jets = events.Jet[events.Jet.pt > 50.0]

  # frees the underlying memory of the materialized buffers of `events.Jet.pt` 
  manager.clear(events.Jet.pt)

This is useful for more fine-grained control over memory usage. Users can free memory more aggressively and basically get rid of memory that, e.g., is only used once and never again.

You can start prefetching buffers in background thread(s) similar to batch prefetching of ML dataloaders, just with buffers for our awkward array fields. This is useful to run IO and CPU bound steps concurrently.

def process(self, events):
  manager = coffea.nanoevents.BufferManager(events)

  # starts prefetching all `events.Jet` columns in background threads
  with manager.prefetch(events.Jet, nthreads=4):
    out = my_analysis(events)
    
  return out

The prefetching seems to be pretty efficient for runtime, see the following benchmarks:

Benchmark setup:

Code: https://github.com/ikrommyd/virtual-array-agc/blob/main/ttbar_analysis_pipeline.ipynb
Input file (copied locally): root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/130000/009086DB-1E42-7545-9A35-1433EC89D04B.root
Single Core 'emulation' with: cpulimit -l 100 -- python ...
Benchmarked with hyperfine --warmup 3 ...

Results:

Chunk size	Default (no prefetch)	NanoEvents' `preload=`	1 thread prefetch	4 threads prefetch
100k	4.758 s ± 0.073 s	4.741 s ± 0.050 s	4.471 s ± 0.068 s	4.349 s ± 0.055 s
500k	8.193 s ± 0.190 s	8.084 s ± 0.049 s	6.863 s ± 0.123 s	6.180 s ± 0.098 s
1M	12.076 s ± 0.107 s	12.400 s ± 0.332 s	9.961 s ± 0.232 s	8.549 s ± 0.175 s

NanoEventsFactory's preload= is mainly beneficial for streaming over network, so this is why this isn't so beneficial here (input file is local).

Could not try the prefetching yet with free-threaded python because not all dependencies support it yet, but it should only get better than these results.

To make proper use of this it needs the buffer cache PR #1508 first in.

ikrommyd · 2026-03-30T14:58:06Z

@lgray any takes on the API here?

feat: add BufferManager for more fine-grained control over VirtualArrays

0304ef6

ikrommyd mentioned this pull request Mar 11, 2026

feat: add buffer caches #1508

Merged

Merge branch 'master' into pfackeldey/buffer_manager

10afcf3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add BufferManager for more fine-grained control over VirtualArrays#1530

feat: add BufferManager for more fine-grained control over VirtualArrays#1530
pfackeldey wants to merge 2 commits intoscikit-hep:masterfrom
pfackeldey:pfackeldey/buffer_manager

pfackeldey commented Mar 10, 2026 •

edited

Loading

Uh oh!

ikrommyd commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pfackeldey commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikrommyd commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pfackeldey commented Mar 10, 2026 •

edited

Loading