-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Problem
Input data with scale_factor/add_offset attributes (zarr v2 format) loses encoding during conversion, resulting in uint16 → float64 data type promotion.
Root Cause
- Original data in Zarr sample service has
scale_factor: 0.0001,add_offset: -0.1,dtype: "<u2"(uint16) - Conversion process doesn't detect these v2 attributes to create zarr v3
numcodecs.fixedscaleoffsetcodec - Encoding propagation in
create_measurements_encoding()overwrites with simple compressor, losing scale/offset configuration
Expected Behavior
Automatically convert zarr v2 scale/offset attributes to zarr v3 FixedScaleOffset codec:
numcodecs.zarr3.FixedScaleOffset(
offset=-0.1,
scale=10000, # 1/0.0001
dtype='uint16',
astype='uint16'
)Files Affected
src/eopf_geozarr/s2_optimization/s2_multiscale.py(lines 319-329)src/eopf_geozarr/conversion/geozarr.py(encoding functions)
Impact
- Data type inflation (uint16 → float64)
- Loss of compression efficiency
- Incorrect data representation in output zarr v3 files
Notes
- for some reasons, xarray decodes the data properly, probably because it reads the attributes and apply the V2 conversion on V3.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels