Feat/initial workflow #1

wietzesuijker · 2025-08-25T21:56:48Z

PR drops the mock-up and introduces a pipeline that works locally (for me).

It is still early stage, but hopefully this gives a baseline.

Getting the env right was quite fiddly (rasterio 1.4.3 / gdal).

This is the result of a couple of iterations. In an earlier stage I just bolted on top of the data-model repo. I separated it to this pipeline repo for clarity.

There are some reporting remnants that depend on / point to this unmerged data-model feature branch and related PR EOPF-Explorer/data-model#26.

…DME)

…file, README)

wietzesuijker · 2025-08-25T22:21:32Z

Hi! I would appreciate any feedback on this PR. I went ahead and already merged it, since this is quite early stage. I recommend reading the related issue first and then reviewing the code in the new data-model-pipeline repo. The README is a good point to understand what's going on.

j08lue

Overall nice setup with all the build tooling, and complete with docs.

Does this need to be this complex, though? I added a couple of comments below to understand the reasons or encourage simplification.

Since this is merged already, perhaps these can turn into issues for a revision, or so.

j08lue · 2025-08-26T06:59:16Z

docker/Dockerfile

+# Pull lockfiles from data-model@<ref> (optionally a subdir)
+RUN set -eux; \
+    mkdir -p /tmp/dm; \
+    curl -fsSL -o /tmp/dm/pyproject.toml "${DM_RAW_BASE}/${EOPF_GEOZARR_REF}/${EOPF_GEOZARR_SUBDIR}pyproject.toml"; \
+    curl -fsSL -o /tmp/dm/uv.lock        "${DM_RAW_BASE}/${EOPF_GEOZARR_REF}/${EOPF_GEOZARR_SUBDIR}uv.lock"; \
+    test -s /tmp/dm/pyproject.toml && test -s /tmp/dm/uv.lock


Could we instead not just uv pip install git+ from a tag of the data-model repo or so?

I have not seen this way of managing dependencies - to copy another project's project.toml.

j08lue · 2025-08-26T07:01:28Z

docker/Dockerfile

+ENV PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1
+ARG PORTABLE_BUILD=0
+ARG EOPF_GEOZARR_REF=main
+ARG EOPF_GEOZARR_SUBDIR=


Why do we need this logic to add the data-model project as a subdir?

I see this option is nicely mentioned under https://github.com/EOPF-Explorer/data-model-pipeline?tab=readme-ov-file#variants, but I'm not sure why it's needed.

j08lue · 2025-08-26T07:03:51Z

docker/Dockerfile

+RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+      libstdc++6 libgomp1 libexpat1 \
+      gdal-bin proj-bin curl \
+    && rm -rf /var/lib/apt/lists/*


Could we not instead use a parent image that has GDAL built and installed?

Or is the aim to keep dependencies simple?

j08lue · 2025-08-26T07:06:22Z

workflows/geozarr-convert-template.yaml

+          python - "$STAC" $GROUPS <<'PY'
+          import re, sys, xarray as xr
+
+          stac, *groups = sys.argv[1:]
+          if any(re.fullmatch(r"\d+", g.strip()) for g in groups):
+              print("ERROR: 'groups' must be paths like /measurements/reflectance/r20m (got numeric token).")
+              sys.exit(64)
+
+          def open_dt(url):
+              try:
+                  return xr.open_datatree(url, engine="zarr", consolidated=True)
+              except Exception:
+                  return xr.open_datatree(url, engine="zarr", consolidated=None)
+
+          dt = open_dt(stac)
+          missing = []
+          for g in groups:
+              key = g.strip("/")
+              try:
+                  _ = dt[key]
+                  print(f"[ok] {g}")
+              except Exception as e:
+                  print(f"[missing] {g}: {e}")
+                  missing.append(g)
+
+          if missing:
+              try:
+                  meas = dt["measurements"]
+                  names = set()
+                  children = getattr(meas, "groups", None) or getattr(meas, "children", None)
+                  if isinstance(children, dict):
+                      names.update(children.keys())
+                  elif isinstance(children, (list, tuple)):
+                      for s in children:
+                          if isinstance(s, str) and s:
+                              names.add(s.split("/", 1)[0])
+                  if names:
+                      print("candidates under /measurements:")
+                      for n in sorted(names)[:50]:
+                          print("  /measurements/" + n)
+              except Exception:
+                  pass
+              sys.exit(64)


Feels like at least this part should be a separate script, so it's easier to read and test. Maybe also the shell script part could be moved out of this yaml?

j08lue · 2025-08-26T07:06:42Z

.gitignore

 tests-output/

 # uv
 uv.lock


Why ignore the lockfile?

wietzesuijker added 3 commits August 25, 2025 09:04

chore: drop initial mockup

ac48458

feat: initial STAC Zarr→GeoZarr workflow (Docker, Argo, Makefile, REA…

6a769f5

…DME)

feat: reasonable local STAC Zarr→GeoZarr workflow (Docker, Argo, Make…

cc6d716

…file, README)

wietzesuijker force-pushed the feat/initial-workflow branch from 4ac2bef to cc6d716 Compare August 25, 2025 22:00

wietzesuijker merged commit 5da6511 into main Aug 25, 2025

wietzesuijker self-assigned this Aug 25, 2025

wietzesuijker requested review from emmanuelmathot, j08lue and ciaransweet August 25, 2025 22:17

j08lue deleted the feat/initial-workflow branch August 26, 2025 06:48

j08lue reviewed Aug 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/initial workflow #1

Feat/initial workflow #1

Uh oh!

wietzesuijker commented Aug 25, 2025 •

edited

Loading

Uh oh!

wietzesuijker commented Aug 25, 2025 •

edited

Loading

Uh oh!

j08lue left a comment

Uh oh!

j08lue Aug 26, 2025

Uh oh!

j08lue Aug 26, 2025

Uh oh!

j08lue Aug 26, 2025

Uh oh!

j08lue Aug 26, 2025

Uh oh!

j08lue Aug 26, 2025

Uh oh!

Uh oh!

Feat/initial workflow #1

Feat/initial workflow #1

Uh oh!

Conversation

wietzesuijker commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wietzesuijker commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

j08lue left a comment

Choose a reason for hiding this comment

Uh oh!

j08lue Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

j08lue Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

j08lue Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

j08lue Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

j08lue Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wietzesuijker commented Aug 25, 2025 •

edited

Loading

wietzesuijker commented Aug 25, 2025 •

edited

Loading