-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[io] Properly abort when buffer size overflows max integer or size > maxBufferSize #19606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Can you make a separate PR with only the change in variable name ( |
Test Results 20 files 20 suites 3d 6h 24m 9s ⏱️ For more details on these failures, see this check. Results for commit eb0dc82. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, many thanks!
I think generally we want to use std::size_t
for buffer sizes instead of Long64_t
.
We should probably also update TBuffer::[Read|Write]Buf
, TBuffer::ReadString
, TBuffer::MapObject
, TBuffer::[Check|Set]ByteCount
, TBuffer::SetBufferDisplacement
.
Maybe also TBuffer::[Read|Write]Clones
.
Sometimes we are checking for kMaxBufferSize
, sometimes for kMaxInt
. Shouldn't we always check for kMaxBufferSize
?
Regarding commit messages, I'd suggest
[NFC] remove unused headers
and
[io] accept and check long buffer size params
with an explanation why we (at this point) allow for long buffer sizes but then abort when they are actually used.
@pcanal: do we have an indication that the optimization of initializing the buffer size to the average buffer size seen so far in the file is actually useful? There are certainly write patterns where it hurts rather than helps. Removing this optimization would get us a fair amount of simplification in a number of read/write APIs.
static void SetFileReadCalls(Long64_t readcalls = 0); | ||
static void SetReadaheadSize(Long64_t bytes = 256000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put those in another commit/PR.
Even if that changes the data type from signed to unsigned? Also, wouldn't it be better to have ULong64_t instead of std::size_t? Since size_t is unsigned int for 32-bit targets and unsigned long long or unsigned long for 64-bit targets, depending on the C++ implementation and the compilation target. |
as suggested by jblomer
In my opinion, that's the point. The |
Sounds reasonable. What if a TTree in the future contains a big entry over 4GB ? Does it mean that it won't be read in 32-bits? How do we error out then, or it's silently cropped? Or is it going to be emulated as several buffers one after the other? |
I think generally we have to distinguish between the in-memory buffer and what's serialized to disk. If a big atomic object (e.g., a histogram) is serialized to disk, it can't be read back on 32bit platforms. I think that's fine and unavoidable. The 32bit machine is simply not capable enough. On disk, of course, we will need to represent the size of objects in a platform-independent way. I think that the deserialization of the object length will be the proper point to throw errors. Regarding the concrete on-disk representation, the plan is to chunk large objects in multiple keys to keep the changes to the TFile on-disk format minimal. |
This is hard to really measure for sure as it is of course very dependent of the actual workload. When this was introduced, this was in direct reaction to issues related to not only memory fragmentation (increase of process virtual size due to the inability to re-use some memory that just a tad too small) but also thread scaling (by reducing the amount of memory allocation which requires (in most cases) the system to take a global lock). |
On the other hand we need to also make sure we probably error-out when there is a request for it ... |
That is the current plan. |
@@ -170,8 +170,8 @@ class TObject { | |||
virtual void SetDrawOption(Option_t *option=""); // *MENU* | |||
virtual void SetUniqueID(UInt_t uid); | |||
virtual void UseCurrentStyle(); | |||
virtual Int_t Write(const char *name = nullptr, Int_t option = 0, Int_t bufsize = 0); | |||
virtual Int_t Write(const char *name = nullptr, Int_t option = 0, Int_t bufsize = 0) const; | |||
virtual Int_t Write(const char *name = nullptr, Int_t option = 0, Long64_t bufsize = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be necessary but is a serious problem. This function is overload a lot in both our code but also very possibly in user code. Unless those user have upgrade their code to use the override
keyword (which is unlikely in my opinion), their code will compile correctly but do the wrong thing (revert to use the default behavior rather than their customization .... )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we implement a dummy
Int_t Write(const char *name, Int_t option, Int_t bufsize) const final;
to trigger a compilation error, or at least a warning and avoid that silenced wrong behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would indeed provoke a compilation error ....
This Pull request:
Changes or fixes:
Fixes #14770
And is a first step towards #6734
Checklist: