Skip to content

Bug: "OverflowError: Python integer xxxxxxxx out of bounds for int32" for files with segment size > max int32 value #343

@krylian

Description

@krylian

Hello,

npTdms version is 1.9.0

For the sake of scalability/stability, I am doing Nidaq acquisitions with 100KHz sampling rate over 8 analog input channels over 1 hour. nidaqmx-python framework allows configuring a tdms logging and it creates a segment of the full acquisition for the 8 channels
So at the end, I have around 100K * 3600 * 8 samples in the segment. This is beyond max int32 value and npTdms won't read back the file.

I found the issue in _calculate_chunks(), data_size is a numpy int32, total_data_size is ~ 100K * 3600 8 * and there are computations like total_data_size % data_size and total_data_size // data_size. This is where the overflow error occurs

  • a first fix was to use data_size.astype(int) instead of data_size in the computations (but when the file is not DaqmxData, it would fail as data_size would not be a numpy int32 so astype fails)
  • another idea comes from the real reason: data_size is related to raw_data_widths slot from DaqMxMetadata class, which is defined as "np.zeros(raw_data_widths_length, dtype=np.int32)". Setting it to int64 is also fixing my issue

But I don't know the overall impact of any of these 2 changes.

Addendum: issue occurs when doing TdmsFile.open(file) or using tdmsinfo tool. I have seen people working with datasets of 10s of GBs so I assume they had big files but not big segments.
So reading by chunk should not help as it would fail at open

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions