Skip to content

feat: add tree to virtual array conversion #1393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

pfackeldey
Copy link
Collaborator

This waits for scikit-hep/awkward#3364 (and a corresponding awkward release).

I may likely not have attempted the most optimal solution here. Happy for feedback & input.

@ianna
Copy link
Collaborator

ianna commented Apr 11, 2025

@pfackeldey - what is the plan for this PR? thanks!

@ikrommyd
Copy link
Contributor

@pfackeldey - what is the plan for this PR? thanks!

I think we were discussing with Peter to add some caching support. Uproot will deserialize each electron branch for example separately with its own offsets. All those will have the same count_branch however so it's probably best to not deserialize the same offsets dozens of times. It's probably best to cache count_branch deserialization result (length) and use it for the other branches that have the same count_branch.

@ariostas
Copy link
Collaborator

I think we were discussing with Peter to add some caching support.

Isn't there already some caching being done in Uproot? When trying to read the count_branch multiple times it should already be hitting the cache

@ikrommyd
Copy link
Contributor

ikrommyd commented Apr 11, 2025

I think we were discussing with Peter to add some caching support.

Isn't there already some caching being done in Uproot? When trying to read the count_branch multiple times it should already be hitting the cache

Yeah I need to try if it's hitting it, haven't done that yet. Will do today. Do you the best way to log that (the number of deserializations per branch)?

@ariostas
Copy link
Collaborator

Do you the best way to log that (the number of deserializations per branch)?

I'm not sure. I've just skimmed the code since at some point I'll have to do that for RNTuples

@pfackeldey
Copy link
Collaborator Author

@pfackeldey - what is the plan for this PR? thanks!

I'm not sure. I'm not a big fan of this implementation, but I also don't know how it can be done in a better way. I was hoping for some input here.

@ianna ianna added inactive A pull request that hasn't been touched in a long time help wanted Extra attention is needed and removed inactive A pull request that hasn't been touched in a long time labels Apr 17, 2025
pre-commit-ci bot and others added 7 commits April 23, 2025 14:37
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.4 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.4...v0.11.5)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Updated Pyodide version

* Pinned chrome version

* Changed chrome version

* Try using node instead of chrome

* Remove chrome-specific setup

* Actually use Node

* Go back to chrome
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.6](astral-sh/ruff-pre-commit@v0.11.5...v0.11.6)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
fix issue with empty big_endian array

Co-authored-by: Ianna Osborne <[email protected]>
* safer branch title access

* empty str -> None

---------

Co-authored-by: Ianna Osborne <[email protected]>
* docs: add contributing guide

* style: pre-commit fixes

* Update CONTRIBUTING.md

Co-authored-by: Andres Rios Tascon <[email protected]>

* Update CONTRIBUTING.md

Co-authored-by: Andres Rios Tascon <[email protected]>

* Update CONTRIBUTING.md

Co-authored-by: Andres Rios Tascon <[email protected]>

* use pre-commit

* build local documentation howto

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Andres Rios Tascon <[email protected]>
@pfackeldey pfackeldey requested a review from ianna April 23, 2025 18:37
@pfackeldey
Copy link
Collaborator Author

Hi @ianna,
Finally, I found a good implementation!
This is now handling every awkward Content case in a programmatic way. I could reuse a similar logic as for uproot.dask.
Could you have a look?

@pfackeldey pfackeldey marked this pull request as ready for review April 23, 2025 18:40
@pfackeldey pfackeldey removed the help wanted Extra attention is needed label Apr 24, 2025
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfackeldey - Thanks! I would rather avoid duplicating the code. What was the reason for copying the form_with_unique_keys utility function here? Thanks.

@pfackeldey
Copy link
Collaborator Author

pfackeldey commented Apr 25, 2025

Needs: scikit-hep/awkward#3482 and thus a new awkward release

@pfackeldey pfackeldey marked this pull request as draft April 25, 2025 17:24
@pfackeldey pfackeldey marked this pull request as ready for review April 27, 2025 18:36
@pfackeldey pfackeldey marked this pull request as draft April 27, 2025 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants