chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing by emmanuelmathot · Pull Request #81 · EOPF-Explorer/data-pipeline

emmanuelmathot · 2026-01-15T08:03:22Z

Update the item ID parsing to correctly handle JSON file extensions and improve source URL processing in the conversion script. Additionally, introduce a new script for product submission workflow.

…sion script; add product submission workflow script

… function

…ript; change eopf-geozarr dependency to a specific branch

Copilot

Pull request overview

This pull request updates the vdata-model dependency to version 0.7.1 and fixes item ID parsing to handle JSON file extensions correctly across multiple scripts. The changes primarily address URL parsing for STAC items that may be accessed as direct .json files rather than through traditional STAC API endpoints.

Changes:

Updated eopf-geozarr dependency from v0.7.0 to v0.7.1
Fixed item ID parsing to strip .json extensions in three scripts (register_v1.py, convert_v1_s2.py, change_storage_tier.py)
Enhanced source URL detection in convert_v1_s2.py to handle .json files
Added keep_scale_offset parameter to convert_s2_optimized function call
Introduced new bulk submission script for testing workflows
Modified test workflow configuration

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
pyproject.toml	Updates eopf-geozarr dependency to v0.7.1
scripts/register_v1.py	Adds .json extension stripping to item ID parsing
scripts/convert_v1_s2.py	Fixes item ID parsing, updates source URL detection, and adds keep_scale_offset parameter
scripts/change_storage_tier.py	Adds .json extension stripping to item ID parsing
operator-tools/submit_test_workflow_wh_list.py	New bulk workflow submission script with hardcoded product list
operator-tools/submit_test_workflow_wh.py	Updates test configuration with new URLs and port

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-15T15:59:38Z

operator-tools/submit_test_workflow_wh_list.py

+#!/usr/bin/env python3
+import json
+import time
+
+import requests
+
+# List of products to process (from S3 listing)
+products = [
+    "S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456",
+    "S2A_MSIL2A_20210920T120331_N0500_R023_T28RBS_20230108T205311",
+    "S2A_MSIL2A_20210927T115221_N0500_R123_T28RBS_20230112T171055",
+    "S2A_MSIL2A_20210930T120331_N0500_R023_T28RBS_20230109T180123",
+    "S2A_MSIL2A_20211007T115221_N0500_R123_T28RBS_20230102T185542",
+    "S2A_MSIL2A_20211010T120331_N0500_R023_T28RBS_20230106T001906",
+    "S2A_MSIL2A_20211017T115221_N0500_R123_T28RBS_20230104T081202",
+    "S2A_MSIL2A_20211020T120331_N0500_R023_T28RBS_20230105T095115",
+    "S2A_MSIL2A_20211027T115221_N0500_R123_T28RBS_20230105T031019",
+    "S2A_MSIL2A_20211030T120331_N0500_R023_T28RBS_20230105T031900",
+    "S2A_MSIL2A_20211106T115221_N0500_R123_T28RBS_20221229T073629",
+    "S2A_MSIL2A_20211109T120331_N0500_R023_T28RBS_20221229T073655",
+    "S2A_MSIL2A_20211116T115221_N0500_R123_T28RBS_20221230T200900",
+    "S2A_MSIL2A_20211119T120321_N0500_R023_T28RBS_20221229T191326",
+    "S2A_MSIL2A_20211126T115221_N0500_R123_T28RBS_20230102T132307",
+    "S2A_MSIL2A_20211129T120321_N0500_R023_T28RBS_20221230T041345",
+    "S2A_MSIL2A_20211206T115221_N0500_R123_T28RBS_20221224T140610",
+    "S2A_MSIL2A_20211209T120321_N0500_R023_T28RBS_20221225T010957",
+    "S2A_MSIL2A_20211216T115221_N0500_R123_T28RBS_20221224T043535",
+    "S2A_MSIL2A_20211219T120321_N0500_R023_T28RBS_20221224T151838",
+    "S2A_MSIL2A_20211226T115221_N0500_R123_T28RBS_20221224T161518",
+    "S2A_MSIL2A_20211229T120331_N0500_R023_T28RBS_20221227T100257",
+    "S2B_MSIL2A_20210922T115219_N0500_R123_T28RBS_20230113T174254",
+    "S2B_MSIL2A_20210925T120319_N0500_R023_T28RBS_20230109T005655",
+    "S2B_MSIL2A_20211002T115219_N0500_R123_T28RBS_20230102T210850",
+    "S2B_MSIL2A_20211005T120329_N0500_R023_T28RBS_20230105T092944",
+    "S2B_MSIL2A_20211012T115219_N0500_R123_T28RBS_20230103T222101",
+    "S2B_MSIL2A_20211015T120329_N0500_R023_T28RBS_20230104T221431",
+    "S2B_MSIL2A_20211022T115219_N0500_R123_T28RBS_20230104T151158",
+    "S2B_MSIL2A_20211025T120329_N0500_R023_T28RBS_20230105T094555",
+    "S2B_MSIL2A_20211101T115219_N0500_R123_T28RBS_20221228T204555",
+    "S2B_MSIL2A_20211104T120329_N0500_R023_T28RBS_20221230T183447",
+    "S2B_MSIL2A_20211111T115219_N0500_R123_T28RBS_20221229T220152",
+    "S2B_MSIL2A_20211114T120319_N0500_R023_T28RBS_20221229T121936",
+    "S2B_MSIL2A_20211121T115219_N0500_R123_T28RBS_20221231T033626",
+    "S2B_MSIL2A_20211124T120319_N0500_R023_T28RBS_20221229T221129",
+    "S2B_MSIL2A_20211201T115219_N0500_R123_T28RBS_20221224T083520",
+    "S2B_MSIL2A_20211204T120319_N0500_R023_T28RBS_20221222T175403",
+    "S2B_MSIL2A_20211211T115209_N0500_R123_T28RBS_20221225T033159",
+    "S2B_MSIL2A_20211214T120319_N0500_R023_T28RBS_20221226T065701",
+    "S2B_MSIL2A_20211221T115219_N0500_R123_T28RBS_20221224T053435",
+    "S2B_MSIL2A_20211224T120319_N0500_R023_T28RBS_20221227T125536",
+    "S2B_MSIL2A_20211231T115219_N0500_R123_T28RBS_20221226T114305",
+    "S2B_MSIL2A_20220103T120319_N0510_R023_T28RBS_20240423T015628",
+]
+
+# Process each product
+for i, product in enumerate(products, 1):
+    # Test STAC item submission
+    payload = {
+        "source_url": f"https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/{product}.json",
+        "collection": "sentinel-2-l2a",
+        "action": "convert-v1-s2-hp",  # specify the action to use the S2 high-priority trigger
+    }
+
+    message = json.dumps(payload)
+
+    # Submit via HTTP webhook endpoint
+    try:
+        response = requests.post(
+            "http://localhost:12000/samples",
+            data=message,
+            headers={"Content-Type": "application/json"},
+        )
+
+        print(f"[{i}/{len(products)}] ✅ Published workflow for item: {product}")
+        print(f"Response status: {response.status_code}")
+
+        if response.status_code != 200:
+            print(f"Warning: Non-200 response for {product}: {response.text}")
+
+    except Exception as e:
+        print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")
+
+    # Add small delay to avoid overwhelming the server
+    if i < len(products):
+        time.sleep(1)
+
+print(f"\n✅ Completed processing {len(products)} products.")


This new script is not documented in the operator-tools README.md or the main repository README.md. Consider adding documentation explaining the purpose of this script, when it should be used, and how it differs from submit_test_workflow_wh.py. The script appears to be for bulk submission of a predefined list of products, which would be helpful information for operators.

Copilot · 2026-01-15T15:59:39Z

operator-tools/submit_test_workflow_wh.py

 payload = {
-    "source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2C_MSIL2A_20251117T090251_N0511_R007_T35SMA_20251117T124014",
-    "collection": "sentinel-2-l2a-dp-test",
+    # "source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2C_MSIL2A_20251117T090251_N0511_R007_T35SMA_20251117T124014",
+    "source_url": "https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456.json",
+    "collection": "sentinel-2-l2a-staging",
    "action": "convert-v1-s2-hp",  # specify the action to use the S2 high-priority trigger
 }

 message = json.dumps(payload)

 # Submit via HTTP webhook endpoint
 response = requests.post(
-    "http://localhost:12000/samples",
+    "http://localhost:12001/samples",


This file contains hardcoded test configuration values that have been changed. The commented-out source URL and modified collection/port suggest these are temporary testing changes. Consider either: (1) reverting these to sensible defaults if this is meant to be a template/example, or (2) making these parameters configurable via command-line arguments or environment variables to avoid checking in test-specific configuration.

scripts/register_v1.py

Copilot · 2026-01-15T15:59:39Z

scripts/convert_v1_s2.py

+    zarr_url = (
+        get_zarr_url(source_url)
+        if ("/items/" in source_url or source_url.endswith(".json"))
+        else source_url
+    )


The condition for determining when to call get_zarr_url now checks both "/items/" in the URL and ".json" extension. However, these conditions are not mutually exclusive - a URL could have "/items/" without ".json" or vice versa. Consider if the logic should be adjusted to handle all cases properly. For example, a URL like "https://example.com/items/item-id" (without .json) would currently call get_zarr_url, but a URL like "https://example.com/stac-items/item-id.json" (without "/items/") would also call get_zarr_url. This seems correct, but verify this matches the intended behavior.

scripts/convert_v1_s2.py

Copilot · 2026-01-15T15:59:40Z

operator-tools/submit_test_workflow_wh_list.py

+#!/usr/bin/env python3
+import json
+import time
+
+import requests
+
+# List of products to process (from S3 listing)
+products = [
+    "S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456",
+    "S2A_MSIL2A_20210920T120331_N0500_R023_T28RBS_20230108T205311",
+    "S2A_MSIL2A_20210927T115221_N0500_R123_T28RBS_20230112T171055",
+    "S2A_MSIL2A_20210930T120331_N0500_R023_T28RBS_20230109T180123",
+    "S2A_MSIL2A_20211007T115221_N0500_R123_T28RBS_20230102T185542",
+    "S2A_MSIL2A_20211010T120331_N0500_R023_T28RBS_20230106T001906",
+    "S2A_MSIL2A_20211017T115221_N0500_R123_T28RBS_20230104T081202",
+    "S2A_MSIL2A_20211020T120331_N0500_R023_T28RBS_20230105T095115",
+    "S2A_MSIL2A_20211027T115221_N0500_R123_T28RBS_20230105T031019",
+    "S2A_MSIL2A_20211030T120331_N0500_R023_T28RBS_20230105T031900",
+    "S2A_MSIL2A_20211106T115221_N0500_R123_T28RBS_20221229T073629",
+    "S2A_MSIL2A_20211109T120331_N0500_R023_T28RBS_20221229T073655",
+    "S2A_MSIL2A_20211116T115221_N0500_R123_T28RBS_20221230T200900",
+    "S2A_MSIL2A_20211119T120321_N0500_R023_T28RBS_20221229T191326",
+    "S2A_MSIL2A_20211126T115221_N0500_R123_T28RBS_20230102T132307",
+    "S2A_MSIL2A_20211129T120321_N0500_R023_T28RBS_20221230T041345",
+    "S2A_MSIL2A_20211206T115221_N0500_R123_T28RBS_20221224T140610",
+    "S2A_MSIL2A_20211209T120321_N0500_R023_T28RBS_20221225T010957",
+    "S2A_MSIL2A_20211216T115221_N0500_R123_T28RBS_20221224T043535",
+    "S2A_MSIL2A_20211219T120321_N0500_R023_T28RBS_20221224T151838",
+    "S2A_MSIL2A_20211226T115221_N0500_R123_T28RBS_20221224T161518",
+    "S2A_MSIL2A_20211229T120331_N0500_R023_T28RBS_20221227T100257",
+    "S2B_MSIL2A_20210922T115219_N0500_R123_T28RBS_20230113T174254",
+    "S2B_MSIL2A_20210925T120319_N0500_R023_T28RBS_20230109T005655",
+    "S2B_MSIL2A_20211002T115219_N0500_R123_T28RBS_20230102T210850",
+    "S2B_MSIL2A_20211005T120329_N0500_R023_T28RBS_20230105T092944",
+    "S2B_MSIL2A_20211012T115219_N0500_R123_T28RBS_20230103T222101",
+    "S2B_MSIL2A_20211015T120329_N0500_R023_T28RBS_20230104T221431",
+    "S2B_MSIL2A_20211022T115219_N0500_R123_T28RBS_20230104T151158",
+    "S2B_MSIL2A_20211025T120329_N0500_R023_T28RBS_20230105T094555",
+    "S2B_MSIL2A_20211101T115219_N0500_R123_T28RBS_20221228T204555",
+    "S2B_MSIL2A_20211104T120329_N0500_R023_T28RBS_20221230T183447",
+    "S2B_MSIL2A_20211111T115219_N0500_R123_T28RBS_20221229T220152",
+    "S2B_MSIL2A_20211114T120319_N0500_R023_T28RBS_20221229T121936",
+    "S2B_MSIL2A_20211121T115219_N0500_R123_T28RBS_20221231T033626",
+    "S2B_MSIL2A_20211124T120319_N0500_R023_T28RBS_20221229T221129",
+    "S2B_MSIL2A_20211201T115219_N0500_R123_T28RBS_20221224T083520",
+    "S2B_MSIL2A_20211204T120319_N0500_R023_T28RBS_20221222T175403",
+    "S2B_MSIL2A_20211211T115209_N0500_R123_T28RBS_20221225T033159",
+    "S2B_MSIL2A_20211214T120319_N0500_R023_T28RBS_20221226T065701",
+    "S2B_MSIL2A_20211221T115219_N0500_R123_T28RBS_20221224T053435",
+    "S2B_MSIL2A_20211224T120319_N0500_R023_T28RBS_20221227T125536",
+    "S2B_MSIL2A_20211231T115219_N0500_R123_T28RBS_20221226T114305",
+    "S2B_MSIL2A_20220103T120319_N0510_R023_T28RBS_20240423T015628",
+]
+
+# Process each product
+for i, product in enumerate(products, 1):
+    # Test STAC item submission
+    payload = {
+        "source_url": f"https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/{product}.json",
+        "collection": "sentinel-2-l2a",
+        "action": "convert-v1-s2-hp",  # specify the action to use the S2 high-priority trigger
+    }
+
+    message = json.dumps(payload)
+
+    # Submit via HTTP webhook endpoint
+    try:
+        response = requests.post(
+            "http://localhost:12000/samples",
+            data=message,
+            headers={"Content-Type": "application/json"},
+        )
+
+        print(f"[{i}/{len(products)}] ✅ Published workflow for item: {product}")
+        print(f"Response status: {response.status_code}")
+
+        if response.status_code != 200:
+            print(f"Warning: Non-200 response for {product}: {response.text}")
+
+    except Exception as e:
+        print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")
+
+    # Add small delay to avoid overwhelming the server
+    if i < len(products):
+        time.sleep(1)
+
+print(f"\n✅ Completed processing {len(products)} products.")


This script contains hardcoded configuration values (localhost URLs, collection names, action types) that appear to be for testing/development purposes. Consider adding a comment at the top of the file indicating this is a development/testing script and should not be used in production, or better yet, make these configurable via command-line arguments or environment variables.

Copilot · 2026-01-15T15:59:40Z

operator-tools/submit_test_workflow_wh_list.py

+    except Exception as e:
+        print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")


The error handling catches a generic Exception but only logs the string representation of the error. Consider catching specific exceptions (e.g., requests.RequestException, requests.Timeout) and handling them differently, or at minimum log the full traceback for debugging purposes by adding exc_info=True to the print statement or using a proper logger.

Copilot · 2026-01-15T15:59:41Z

scripts/change_storage_tier.py

        return {"processed": 0, "succeeded": 0, "failed": 0}

-    item_id = urlparse(stac_item_url).path.split("/")[-1]
+    item_id = urlparse(stac_item_url).path.split("/")[-1].replace(".json", "")


The new item ID parsing logic that strips .json extensions lacks test coverage. Consider adding tests to verify that item_id extraction works correctly for both traditional STAC item URLs (without .json) and direct JSON file URLs (with .json extension), ensuring the .replace(".json", "") logic works as expected.

Copilot · 2026-01-15T15:59:41Z

scripts/convert_v1_s2.py

+    item_id = urlparse(source_url).path.rstrip("/").split("/")[-1].replace(".json", "")
    logger.info(f"🔄 Converting (S2 Optimized): {item_id}")
    logger.info(f"   Collection: {collection}")

    # Resolve source: STAC item or direct Zarr URL
-    zarr_url = get_zarr_url(source_url) if "/items/" in source_url else source_url
+    zarr_url = (
+        get_zarr_url(source_url)
+        if ("/items/" in source_url or source_url.endswith(".json"))
+        else source_url
+    )


The new item ID parsing logic that strips .json extensions and the updated source URL condition lack test coverage. Consider adding tests to verify that item_id extraction works correctly for URLs with and without .json extensions, and that the zarr_url resolution logic handles all combinations of "/items/" and ".json" correctly.

lhoupert

LGTM

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

emmanuelmathot added 5 commits January 15, 2026 09:03

fix: update item ID parsing and enhance source URL handling in conver…

d5696b5

…sion script; add product submission workflow script

fix: add missing required parameter 'keep_scale_offset' in conversion…

2ca3a82

… function

fix: remove '.json' extension from item ID in registration function

1c62ddb

fix: update source URL and collection name in STAC item submission sc…

850a3a1

…ript; change eopf-geozarr dependency to a specific branch

fix: update eopf-geozarr dependency to a specific version v0.7.1

1ee01ab

emmanuelmathot changed the title ~~Fix item ID parsing and enhance source URL handling~~ chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing Jan 15, 2026

Merge branch 'main' into static-json

02a6109

emmanuelmathot requested review from Copilot and lhoupert January 15, 2026 15:53

Copilot started reviewing on behalf of emmanuelmathot January 15, 2026 15:54 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

lhoupert approved these changes Jan 15, 2026

View reviewed changes

emmanuelmathot and others added 2 commits January 15, 2026 17:20

Update scripts/register_v1.py

9a6cbea

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/convert_v1_s2.py

f5a2441

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

emmanuelmathot merged commit 005e053 into main Jan 15, 2026
2 checks passed

github-actions bot mentioned this pull request Jan 15, 2026

chore: release 1.3.0 #86

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing#81

chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing#81
emmanuelmathot merged 8 commits intomainfrom
static-json

emmanuelmathot commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

Copilot AI Jan 15, 2026

Uh oh!

lhoupert left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		except Exception as e:
		print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")

Conversation

emmanuelmathot commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

lhoupert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants