-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi,
This is the follow on to #131. (and an updated #132)
In comparing the source and target zarr stores from my regression tests, I noticed that the fill_value changed between my source and target data. I guess that it's not preserved in the rechunk, but this can lead to much larger than needed output stores.
This is an updated script from my previous test script that creates a degenerate case of almost all the same data being rechunked.
If you run this script you will see the fillvalue of "foo/bar/.zarray" changes from "fill_value": 1.0, to "fill_value": null, between the source and target zarr stores. And the output disk size of the stores is significantly different, an order of magnitude.
Thanks,
Matt
❯ du -hs *
36K source.zarr
3.1M target.zarrHere's a script that demonstrates the issue.
import zarr
from rechunker import rechunk
import shutil
def run_create_input_store():
shutil.rmtree('testoutput/', ignore_errors=True)
store = zarr.DirectoryStore('testoutput/source.zarr')
root = zarr.group(store=store, overwrite=True)
foo = root.create_group('foo')
root.attrs['description'] = 'root description'
foo.attrs['description'] = 'foo description'
bar = foo.ones('bar', shape=(10000, 10000))
bar[5000, 5000] = 3
bar.attrs['description'] = 'foo description'
zarr.consolidate_metadata(store)
def rechunkit():
openstore = zarr.open_consolidated('testoutput/source.zarr')
array_plan = rechunk(openstore, {'foo/bar': (1000, 1000)},
'1GB',
'testoutput/target.zarr',
temp_store='testoutput/temp.zarr')
array_plan.execute()
zarr.consolidate_metadata('testoutput/target.zarr')
if __name__ == '__main__':
run_create_input_store()
rechunkit()
print('Compare the .zmetadata files in both your source.zarr and target.zarr directories')
print('You will see that the "fill_value" in the source is 1.0 and it is null in the target.')
source = zarr.open('testoutput/source.zarr')
target = zarr.open('testoutput/target.zarr')
print(source['foo']['bar'].fill_value)
print(target['foo']['bar'].fill_value)