Skip to content

FixedScaleOffset codec not preserved during zarr v2 to v3 conversion #106

@emmanuelmathot

Description

@emmanuelmathot

Problem

Input data with scale_factor/add_offset attributes (zarr v2 format) loses encoding during conversion, resulting in uint16 → float64 data type promotion.

Root Cause

  • Original data in Zarr sample service has scale_factor: 0.0001, add_offset: -0.1, dtype: "<u2" (uint16)
  • Conversion process doesn't detect these v2 attributes to create zarr v3 numcodecs.fixedscaleoffset codec
  • Encoding propagation in create_measurements_encoding() overwrites with simple compressor, losing scale/offset configuration

Expected Behavior

Automatically convert zarr v2 scale/offset attributes to zarr v3 FixedScaleOffset codec:

numcodecs.zarr3.FixedScaleOffset(
    offset=-0.1,
    scale=10000,  # 1/0.0001
    dtype='uint16',
    astype='uint16'
)

Files Affected

  • src/eopf_geozarr/s2_optimization/s2_multiscale.py (lines 319-329)
  • src/eopf_geozarr/conversion/geozarr.py (encoding functions)

Impact

  • Data type inflation (uint16 → float64)
  • Loss of compression efficiency
  • Incorrect data representation in output zarr v3 files

Notes

  • for some reasons, xarray decodes the data properly, probably because it reads the attributes and apply the V2 conversion on V3.

cc @vincentsarago

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions