Skip to content

refactor: increased code sharing between CPU and GPU interpretation in RNTuple reading #1470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fstrug
Copy link
Contributor

@fstrug fstrug commented Jul 17, 2025

Much of the code used to read RNTuple data via cpu and gpu interpretation is common but contained in separate functions. This means any changes made in one workflow will not be automatically reflected in the other. This pr increases function sharing between the two interpretation modes such that any code improvements are automatically shared. A FieldClusterMetadata class has been added to contain metadata used in the reading and deserializing of raw page data.

I have also changed the arrays() argument use_GDS to interpreter since it is not strictly necessary to have GDS support for gpu interpretation of RNTuple data and is more reflective of what the argument is controlling.

@fstrug fstrug changed the title fix: increased code sharing between CPU and GPU interpretation of RNTuple reading fix: increased code sharing between CPU and GPU interpretation in RNTuple reading Jul 17, 2025
…tation of RNTuple data. Changed arrays() argument 'use_GDS' to 'interpreter'.
@ianna ianna added the next-release Required for the next release label Jul 24, 2025
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fstrug - nice! Please check some minor comments. Thanks.

@fstrug fstrug changed the title fix: increased code sharing between CPU and GPU interpretation in RNTuple reading refactor: increased code sharing between CPU and GPU interpretation in RNTuple reading Jul 24, 2025
Copy link
Collaborator

@ariostas ariostas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Frank! This looks great to me. I like how you factored out some pieces to make the code cleaner.

The only comment I have is that I'm not very sure about the interpreter keyword argument, since I think it's a bit ambiguous. It might make sense to implement decompression_executor and interpretation_executor like how is done for TTrees, but I'm not familiar with those, so I'd have to look into that. So that can just be a follow-up.

At some point we'll have to sit down and fix all the keyword arguments for the common methods (arrays, iterate, etc). But I think it's still okay if we change some of the lesser used keyword arguments while we figure things out and people are not regularly working with RNTuples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next-release Required for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants