Skip to content

fix: allow ATLAS-style file names for ROOT files #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

zlmarshall
Copy link

The ATLAS production system produces ROOT files with names like "something.root.1" (or "something.root.2" in case the job is the second attempt at producing the file, and so on). The suffix is always "." followed by an integer. In order to ensure that these files are also correctly treated by uproot when they're opened using the file:object style of calling open(), we need either a more general regex or an alternative regex pattern. Here I'm adding an alternative to be as specific as possible about what the alternative should be. There are other ways to solve this problem, but I don't think naming a bajillion files in our production system, or changing the behavior of PanDA, is going to be a viable solution, so better to parse things this way.

Very fundamentally, I don't quite understand why this regex is necessary. One could simply split on the colon and then try/except based on the first part being a file name and the second part being an object in that file. Since you look for the magic in the file anyway, I'm not sure what this function provides for you except perhaps tidier error messages, at the cost of this sort of difficulty in case file naming is abnormal (to be clear, I recognize that this file naming of PanDA's is really not nice, and ROOT itself has broken opening files on occasion because of it).

Zach Marshall and others added 2 commits July 22, 2025 00:44
The ATLAS production system produces ROOT files with names like
"something.root.1" (or "something.root.2" in case the job is the second
attempt at producing the file, and so on). The suffix is always "."
followed by an integer. In order to ensure that these files are also
correctly treated by uproot when they're opened using the file:object
style of calling open(), we need either a more general regex or an
alternative regex pattern. Here I'm adding an alternative to be as
specific as possible about what the alternative should be. There are
other ways to solve this problem, but I don't think naming a bajillion
files in our production system, or changing the behavior of PanDA, is
going to be a viable solution, so better to parse things this way.

Very fundamentally, I don't quite understand why this regex is
necessary. One could simply split on the colon and then try/except based
on the first part being a file name and the second part being an object
in that file. Since you look for the magic in the file anyway, I'm not
sure what this function provides for you except perhaps tidier error
messages, at the cost of this sort of difficulty in case file naming is
abnormal (to be clear, I recognize that this file naming of PanDA's is
really, really not nice, and ROOT itself has broken opening files on occasion
because of it).
@ikrommyd
Copy link
Contributor

@ariostas Indeed ATLAS files are like that. Uproot should treat this imo.

@ariostas
Copy link
Collaborator

Okay, sounds good! I think we should just accept any file extensions and not even bother checking the presence of ".root", as you mentioned. There might have been a reason behind it, but if all tests pass then it's probably fine.

@zlmarshall
Copy link
Author

Hi @ariostas ,

That'd be fine by me — is that something you'd like me to try in this or a separate PR, or something you'd like to do yourself?

Best,
Zach

@ariostas
Copy link
Collaborator

@zlmarshall you can try it in this PR if you're up for it! (and we'll just change the title if it works)

@zlmarshall
Copy link
Author

Ok, gave it a try; let me know if that works for you.

@zlmarshall zlmarshall changed the title Allowing ATLAS-style file names for ROOT files feat: Allowing ATLAS-style file names for ROOT files Jul 22, 2025
@ariostas
Copy link
Collaborator

Thank you @zlmarshall! That didn't work, which now makes sense to me, since I see that things like http://... are failing to be parsed correctly.

@ariostas ariostas changed the title feat: Allowing ATLAS-style file names for ROOT files fix: allow ATLAS-style file names for ROOT files Jul 22, 2025
Suggested change (including ".root" somewhere in the file name) works for me!

Co-authored-by: Andres Rios Tascon <[email protected]>
@ariostas
Copy link
Collaborator

@zlmarshall seems like there's more complex cases, so we should restrict it back to what you were originally proposing.

The regex can be

object_regex = re.compile(r"(.+\.root(\.[0-9]+)?):(.*$)", re.IGNORECASE)

and you just have to change match.group(2) to match.group(3) in line 320.

Also, you'll have to delete this line in the tests, since this would be a valid input now.

"http://xcacheserver:8762//https://originserver:4212/path/file.root.1:CollectionTree",

@ianna ianna added the next-release Required for the next release label Jul 24, 2025
@ariostas ariostas requested a review from ianna July 25, 2025 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next-release Required for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants