-
Notifications
You must be signed in to change notification settings - Fork 85
fix: allow ATLAS-style file names for ROOT files #1472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The ATLAS production system produces ROOT files with names like "something.root.1" (or "something.root.2" in case the job is the second attempt at producing the file, and so on). The suffix is always "." followed by an integer. In order to ensure that these files are also correctly treated by uproot when they're opened using the file:object style of calling open(), we need either a more general regex or an alternative regex pattern. Here I'm adding an alternative to be as specific as possible about what the alternative should be. There are other ways to solve this problem, but I don't think naming a bajillion files in our production system, or changing the behavior of PanDA, is going to be a viable solution, so better to parse things this way. Very fundamentally, I don't quite understand why this regex is necessary. One could simply split on the colon and then try/except based on the first part being a file name and the second part being an object in that file. Since you look for the magic in the file anyway, I'm not sure what this function provides for you except perhaps tidier error messages, at the cost of this sort of difficulty in case file naming is abnormal (to be clear, I recognize that this file naming of PanDA's is really, really not nice, and ROOT itself has broken opening files on occasion because of it).
@ariostas Indeed ATLAS files are like that. Uproot should treat this imo. |
Okay, sounds good! I think we should just accept any file extensions and not even bother checking the presence of ".root", as you mentioned. There might have been a reason behind it, but if all tests pass then it's probably fine. |
Hi @ariostas , That'd be fine by me — is that something you'd like me to try in this or a separate PR, or something you'd like to do yourself? Best, |
@zlmarshall you can try it in this PR if you're up for it! (and we'll just change the title if it works) |
Ok, gave it a try; let me know if that works for you. |
Thank you @zlmarshall! That didn't work, which now makes sense to me, since I see that things like |
Suggested change (including ".root" somewhere in the file name) works for me! Co-authored-by: Andres Rios Tascon <[email protected]>
@zlmarshall seems like there's more complex cases, so we should restrict it back to what you were originally proposing. The regex can be object_regex = re.compile(r"(.+\.root(\.[0-9]+)?):(.*$)", re.IGNORECASE) and you just have to change Also, you'll have to delete this line in the tests, since this would be a valid input now. uproot5/tests/test_0976_path_object_split.py Line 205 in 731e3ba
|
The ATLAS production system produces ROOT files with names like "something.root.1" (or "something.root.2" in case the job is the second attempt at producing the file, and so on). The suffix is always "." followed by an integer. In order to ensure that these files are also correctly treated by uproot when they're opened using the file:object style of calling open(), we need either a more general regex or an alternative regex pattern. Here I'm adding an alternative to be as specific as possible about what the alternative should be. There are other ways to solve this problem, but I don't think naming a bajillion files in our production system, or changing the behavior of PanDA, is going to be a viable solution, so better to parse things this way.
Very fundamentally, I don't quite understand why this regex is necessary. One could simply split on the colon and then try/except based on the first part being a file name and the second part being an object in that file. Since you look for the magic in the file anyway, I'm not sure what this function provides for you except perhaps tidier error messages, at the cost of this sort of difficulty in case file naming is abnormal (to be clear, I recognize that this file naming of PanDA's is really not nice, and ROOT itself has broken opening files on occasion because of it).