Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,44 @@ eopf-geozarr validate output.zarr

# Get help
eopf-geozarr --help

### Pipeline-Oriented CLI

The package also ships a workflow-friendly entrypoint that mirrors the RabbitMQ Sensor payload used in
`data-model-pipeline`. Use it for dry runs or smoke tests that mimic the production event flow while
keeping the canonical payload contract in a single place. Refer to
[`docs/pipeline-integration.md`](docs/pipeline-integration.md) for the full cross-repository overview.

```bash
# Inspect available arguments (mirrors Sensor → WorkflowTemplate parameters)
eopf-geozarr-pipeline run --help

# Replay a payload.json (for example, from data-model-pipeline/workflows/payload.json)
eopf-geozarr-pipeline run \
--src-item "https://stac.core.eopf.eodc.eu/..." \
--output-zarr "s3://bucket/path/out.zarr" \
--groups measurements/reflectance/r10m,measurements/reflectance/r20m \
--register-collection sentinel-2-l2a \
--register-url https://api.explorer.eopf.copernicus.eu/stac \
--overwrite replace \
--metrics-out s3://bucket/metrics/latest.json

# Validate a produced GeoZarr store
python - <<'PY'
from eopf_geozarr.pipeline import validate_geozarr_store

report = validate_geozarr_store("s3://bucket/path/out.zarr")
print(report.summary())
for line in report.detailed():
print(" -", line)
PY

### Observability and Metrics

Both CLIs now support `--metrics-out`, allowing workflows to persist per-run diagnostics either to the
local filesystem or directly to `s3://` destinations. The payload helpers expose the same field so Argo
Workflow templates and RabbitMQ messages stay aligned.
```
```
Copy link

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two consecutive closing code fence blocks (```) on lines 107-108, which will cause markdown parsing errors. Remove one of the closing code fences.

Suggested change
```

Copilot uses AI. Check for mistakes.


### S3 Support
Expand Down
89 changes: 89 additions & 0 deletions docs/pipeline-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Pipeline Integration Overview

The `eopf-geozarr` package provides the conversion logic that powers the GeoZarr workflow in
`data-model-pipeline`. This guide highlights the contract between both repositories so that developer
workflows, Argo templates, and RabbitMQ messages remain synchronized.

## Shared Responsibilities

| Concern | Provided by `data-model` | Provided by `data-model-pipeline` |
| --- | --- | --- |
| GeoZarr conversion & validation | `eopf_geozarr` conversion engine, CLI, pipeline runner, payload schema | Invokes library within Argo Workflows, publishes AMQP payloads |
| Payload contract | `GeoZarrPayload` dataclass, JSON schema, bundled fixtures | Sensors and tests consume the shared helpers |
| Observability | `--metrics-out` flag routes metrics to local or S3 destinations | Workflows collect and forward metrics to long-term storage |
| STAC registration helpers | `validate_geozarr_store`, pipeline runner hooks | Orchestrates STAC Transactions based on payload flags |

## Command-Line Surfaces

### `eopf-geozarr`

The original CLI remains the most direct way to convert EOPF datasets. It now accepts
`--metrics-out` so you can persist run summaries alongside converted assets. Metrics targets support
both local paths and S3 URIs.

### `eopf-geozarr-pipeline`

The pipeline-specific entrypoint mirrors the RabbitMQ payload processed by production Argo sensors.
It is ideal for replaying payloads locally or verifying template changes:

```bash
# Validate payload flags before triggering the workflow
$ eopf-geozarr-pipeline run --help

# Replay a bundled example payload
$ eopf-geozarr-pipeline run --payload-file <(python - <<'PY'
from eopf_geozarr.pipeline import load_example_payload
import json
print(json.dumps(load_example_payload("minimal")))
PY
)
```

Both CLIs normalize group names, default to the Sentinel-2 reflectance groups, and respect the shared
payload schema described below.

## Python Helpers

The `eopf_geozarr.pipeline` package exposes helpers that keep repositories aligned:

```python
from eopf_geozarr.pipeline import (
GeoZarrPayload,
PAYLOAD_JSON_SCHEMA,
get_payload_schema,
load_example_payload,
run_pipeline,
validate_payload,
)

payload = GeoZarrPayload.from_payload(load_example_payload("full"))
payload.ensure_required()
validate_payload(payload.to_payload())
print(PAYLOAD_JSON_SCHEMA["required"]) # ["src_item", "output_zarr"]
```

- `GeoZarrPayload` parses CLI arguments or RabbitMQ payloads and produces normalized values.
- `PAYLOAD_JSON_SCHEMA` and `get_payload_schema()` deliver a canonical JSON schema for validation in
tests or runtime checks.
- `load_example_payload()` exposes fixtures that mirror the messages published by the AMQP tooling.
- `validate_payload()` wraps `jsonschema` with the library-managed schema, ensuring the same rules
apply everywhere.

## Bundled Fixtures

Two JSON fixtures live under `eopf_geozarr/pipeline/resources`:

- `payload-minimal.json` represents the baseline message with only required fields.
- `payload-full.json` exercises optional knobs such as STAC registration and metrics targets.

Use these fixtures to seed integration tests in `data-model-pipeline` or to document payload
expectations in other repositories. They are also accessible at runtime via `load_example_payload()`.

## Next Steps

- `data-model-pipeline` should import the schema helpers when validating AMQP payloads and update its
tests to rely on the shared fixtures.
- `platform-deploy` can reference the same schema when templating new WorkflowTemplates or Flux
overlays, ensuring environment values stay in sync.
- Future payload changes should originate in this repository so all downstream consumers inherit the
update automatically.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ nav:
- Installation: installation.md
- Quick Start: quickstart.md
- User Guide: converter.md
- Pipeline Integration: pipeline-integration.md
- API Reference: api-reference.md
- Examples: examples.md
- Architecture: architecture.md
Expand Down
21 changes: 16 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,21 @@ warn_unreachable = true
strict_equality = true

[[tool.mypy.overrides]]
module = ["zarr.*", "xarray.*", "rioxarray.*", "cf_xarray.*", "dask.*"]
module = [
"zarr.*",
"xarray.*",
"rioxarray.*",
"cf_xarray.*",
"dask.*",
"jsonschema",
"jsonschema.*",
"s3fs",
"s3fs.*",
"fsspec",
"fsspec.*",
"rasterio",
"rasterio.*",
]
ignore_missing_imports = true

[tool.pytest.ini_options]
Expand All @@ -141,10 +155,7 @@ markers = [

[tool.coverage.run]
source = ["src"]
omit = [
"tests/*",
"setup.py",
]
omit = ["tests/*", "setup.py"]

[tool.coverage.report]
exclude_lines = [
Expand Down
12 changes: 7 additions & 5 deletions src/eopf_geozarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,20 @@
setup_datatree_metadata_geozarr_spec_compliant,
validate_existing_band_data,
)
from .validator import validate_geozarr_store

__version__ = version("eopf-geozarr")

__all__ = [
"__version__",
"create_geozarr_dataset",
"setup_datatree_metadata_geozarr_spec_compliant",
"iterative_copy",
"consolidate_metadata",
"async_consolidate_metadata",
"downsample_2d_array",
"calculate_aligned_chunk_size",
"consolidate_metadata",
"create_geozarr_dataset",
"downsample_2d_array",
"is_grid_mapping_variable",
"iterative_copy",
"setup_datatree_metadata_geozarr_spec_compliant",
"validate_existing_band_data",
"validate_geozarr_store",
]
Loading