A lightweight FastAPI service that exposes endpoints to interact with Zarr datasets stored in a Simple Storage Service (S3):
- a datasets endpoint to explore available multi-dimensional arrays,
- an extraction endpoint to slice and dice data,
- a probe endpoint to query specific values at given coordinates,
- an isoline endpoint to compute contour lines dynamically,
- a mesh endpoint to get support mesh
Tip
You can find auto-generated documentation about API at endpoints /docs or /redoc
Check for service's health, return a json object with a single member status.
Return a list of all available Zarr datasets with their id and description.
Return metadata (dimensions, variables, attributes) for a specific Zarr dataset.
The dataset parameter is expected to be the dataset id, that can be found with the previous endpoint.
Extracts a subset of the data based on a bounding box and a specific variable.
Warning
Large extractions may impact performance. Be mindful of the bounding box size for high-resolution datasets.
The extract endpoint accepts the following query parameters:
| Name | Description | Optional | Default |
|---|---|---|---|
variable |
The variable to extract. | ✗ | |
lon_min |
Minimum longitude of the bounding box. | ✓ | None |
lat_min |
Minimum latitude of the bounding box. | ✓ | None |
lon_max |
Maximum longitude of the bounding box. | ✓ | None |
lat_max |
Maximum latitude of the bounding box. | ✓ | None |
time |
The time value/slice to extract. | ✓ | None |
resolution_limit |
Limit the amount of data for lat/lon axis (decimate) | ✓ | None |
format |
Format of the extracted data (Supported: raw, geojson, mesh). |
✓ | raw |
mesh_tile_size |
When format=mesh, resample data with a grid of mesh_tile_sizexmesh_tile_size |
✓ | None |
mesh_data_mapping |
Whether the data of the mesh is on cells or on vertices. This will override the dataset configuration. (Supported values: 'vertices', 'cells') | ✓ | vertices |
interp_vars |
Variables to interpolate during extraction | ✓ | [] |
interp_vars_method |
Method for variable/time interpolation (e.g. linear, cubic, ...) |
✓ | nearest |
interp_vars_params |
Parameters for variable interpolation (e.g. method:linear) |
✓ | None |
interp_time |
Whether to interpolate values on time dimension or to get the closest time step. Shortcut to interp_vars=YOUR_TIME_DIMENSION |
✓ | False |
interp_spatial_method |
The method to use for spatial interpolation. (Supported: nearest, linear, cubic, idw, rbf) |
✓ | nearest |
interp_spatial_params |
Parameters for spatial interpolation (e.g. padding:1.0) |
✓ | padding:1.0 |
as_dims |
If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions | ✓ | [] |
Important
You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with &my_additional_variable={VALUE}
Retrieves the values of specified variables at a specific geographical location (point query).
The probe endpoint accepts the following query parameters:
| Name | Description | Optional | Default |
|---|---|---|---|
variables |
The list of variables to probe. | ✗ | |
lon |
The longitude coordinate to probe. | ✗ | |
lat |
The latitude coordinate to probe. | ✗ | |
height |
The height coordinate to probe (if 3D data). | ✓ | None |
time |
The time value/slice to probe. | ✓ | None |
interp_time |
Whether to interpolate values on time dimension | ✓ | False |
interp_spatial_method |
The method to use for spatial interpolation. (Supported: nearest, linear, cubic, idw, rbf) |
✓ | nearest |
interp_spatial_params |
Parameters for spatial interpolation (e.g. padding:1.0) |
✓ | padding:1.0 |
interp_vars |
Variables to interpolate during probe | ✓ | [] |
interp_vars_method |
Method for variable/time interpolation | ✓ | nearest |
as_dims |
If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions | ✓ | [] |
Important
You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with &my_additional_variable={VALUE}
Tip
You can request multiple variables at once by repeating the variables parameter in the query string (e.g., ?variables=temp&variables=wind).
Computes isolines (contour lines) for a given variable and specific levels.
The isoline endpoint accepts the following query parameters:
| Name | Description | Optional | Default |
|---|---|---|---|
variable |
The variable to generate isolines for. | ✗ | |
levels |
Comma-separated list of levels for isoline generation. | ✗ | |
time |
The time value to use for isoline generation. | ✓ | None |
format |
Format of the extracted data (Supported: raw, geojson). Ignored when mesh_tile_size is defined |
✓ | raw |
interp_time |
Whether to interpolate values on time dimension | ✓ | False |
interp_vars_method |
Method for variable/time interpolation | ✓ | nearest |
as_dims |
If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions | ✓ | [] |
Important
You may need to specify additional non-generic variables or dimensions according to your dataset. To do so, you can add query parameters with &my_additional_variable={VALUE}
You can select data in a generic way with this endpoint, as raw multi-dimensional arrays. If you just specify the variable parameter, you will get all the data, but you are free to add additional parameters to fix some variables/dimensions
The select endpoint accepts the following query parameters:
| Name | Description | Optional | Default |
|---|---|---|---|
variable |
The variable from which you want to select the data. | ✗ | |
interp_vars |
Variables to interpolate during selection | ✓ | [] |
interp_vars_method |
Method for variable/time interpolation | ✓ | nearest |
as_dims |
If a variable has the same name as a dim, force query parameters in this list to be treated as dimensions | ✓ | [] |
Get only the support mesh of the dataset
The mesh endpoint accepts the following query parameters:
| Name | Description | Optional | Default |
|---|---|---|---|
format |
The format of the extracted data (Currently supported: 'mesh', 'geojson'. Default to mesh) |
✓ | mesh |
mesh_data_mapping |
Whether the data of the mesh is on cells or on vertices. This will override the dataset configuration. (Supported values: 'vertices', 'cells') | ✓ | vertices |
Interpolation is applied in four different scenarios:
- Xarray Variable/Time Interpolation: Used when specifying
interp_varsorinterp_time.- Supported methods:
linear,nearest,zero,slinear,quadratic,cubic,quintic,polynomial,pchip,barycentric,krogh,akima,makima. - Parameters: See the Xarray documentation.
- Supported methods:
- Regular Grid Mesh Extraction: Triggered by the
extractendpoint on regular grid datasets, utilizing SciPy'sRegularGridInterpolator.- Supported methods:
linear,nearest,slinear,cubic,quintic,pchip. - Parameters: See the SciPy documentation.
- Supported methods:
- Irregular Grid Mesh Extraction: Triggered by the
extractendpoint on irregular grid datasets using SciPy or custom methods.- Supported methods:
linear,nearest,cubic,RBF,IDW. - Parameters: For RBF, see the SciPy documentation.
- Supported methods:
- Point Probing: Used by the
probeendpoint to retrieve values over time.- Supported methods: Currently, only
IDW(Inverse Distance Weighting) is supported. - Parameters:
radius(Maximum search radius for neighbors) andpower(Distance weighting power).
- Supported methods: Currently, only
For the extract and probe endpoints, you can pass interpolation options using the interp_spatial_params or interp_vars_params query parameters. Here is an example:
interp_spatial_params=padding:0.5,neighbors:5,smoothing:0.0,kernel:thin_plate_spline
padding: A coefficient extending the requested bounding box to include contextual data for interpolation. This helps prevent boundary artifacts near tiles.- Other parameters: Specific to the chosen interpolation method.
| Variable | Description | Default value |
|---|---|---|
| PORT | The port to be used when exposing the service | 8000 |
| HOSTNAME | The hostname to be used when exposing the service | localhost |
| AWS_ACCESS_KEY_ID | Access key ID of the S3 in which zarr data is stored | |
| AWS_SECRET_ACCESS_KEY | Secret access key of the S3 in which zarr data is stored | |
| AWS_DEFAULT_REGION | Region of the S3 in which zarr data is stored | |
| AWS_ENDPOINT_URL | Endpoint URL of the S3 in which zarr data is stored | |
| BUCKET_NAME | The name of the bucket in which zarr data is stored | |
| DATASETS_PATH | Path to the JSON file containing datasets description | datasets.json |
| CACHE_DIR | Path to the directory where cache will be stored. Cache will not be used if this variable is not provided | |
| CACHE_SIZE | Max size of cache folder (e.g. 1024KB, 512MB, 4GB) | 512MB |
Important
With some S3 provider, some errors about checksum calculation can occur (error: botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the PutObject operation: x-amz-content-sha256 must be UNSIGNED-PAYLOAD, or a valid sha256 value.). In that case, you should set AWS_REQUEST_CHECKSUM_CALCULATION environment variable to when_required
You can build the image with the following command:
docker build -t <your-image-name> .And then start the service with:
docker run -p 8000:8000 <your-image-name>You will need to install multiple Python packages to run this app. To simplify, you can install Anaconda (or micromamba) and run these commands :
conda create -y -n kazarr_env python=3.11conda install -y -n kazarr_env -c conda-forge \
fastapi \
uvicorn \
xarray \
zarr \
numpy \
pyproj \
dask \
s3fs \
matplotlib \
pyvista=0.47.1 \
vtk-base=9.5.2 \
scipy \
uvloop \
loguruconda activate kazarr_envpython main.pyYou can run a local object storage with S3-compliant API using garage with CLI access using s3cmd (pipx install s3cmd).
First, generate a secret with openssl rand -base64 32 and create a garage configuration file:
metadata_dir = "/home/luc/Development/GeoData/s3-meta"
data_dir = "/home/luc/Development/GeoData/s3"
db_engine = "sqlite"
replication_factor = 1
rpc_bind_addr = "[::]:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret = "your secret"
[s3_api]
s3_region = "localhost"
api_bind_addr = "[::]:3900"
root_domain = ".s3.garage.localhost"
[s3_web]
bind_addr = "[::]:3902"
root_domain = ".web.garage.localhost"
index = "index.html"Then launch the server with garage -c ./garage.toml server in a terminal and get your node ID in another terminal with garage -c ./garage.toml status.
Create the layout of your cluster with garage -c ./garage.toml layout assign -z localhost -c 500G nodeID && garage -c ./garage.toml layout apply --version 1.
Create a bucket with garage -c ./garage.toml bucket create zarr-data.
Create an access key with garage -c ./garage.toml key create zarr-data-key.
Allow the key to access your bucket garage -c ./garage.toml bucket allow --read --write --owner zarr-data --key zarr-data-key.
Create a s3cmd configuration file:
[default]
access_key = your-key-id
secret_key = your-key-secret
host_base = http://localhost:3900
host_bucket = http://localhost:3900
use_https = False
Then synchronize any data from your local file system to garage with s3cmd -c ./s3cmd.cfg sync ./zarr-data/ s3://zarr-data.
Please read the Contributing file for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
An extra tool allow you to generate Zarr datasets from NetCDF or GRIB2 files. For more detail, check the conversion tool
This project is sponsored by
This project is licensed under the MIT License - see the license file for details.
