Skip to content

chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing#81

Merged
emmanuelmathot merged 8 commits intomainfrom
static-json
Jan 15, 2026
Merged

chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing#81
emmanuelmathot merged 8 commits intomainfrom
static-json

Conversation

@emmanuelmathot
Copy link
Contributor

Update the item ID parsing to correctly handle JSON file extensions and improve source URL processing in the conversion script. Additionally, introduce a new script for product submission workflow.

@emmanuelmathot emmanuelmathot changed the title Fix item ID parsing and enhance source URL handling chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing Jan 15, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request updates the vdata-model dependency to version 0.7.1 and fixes item ID parsing to handle JSON file extensions correctly across multiple scripts. The changes primarily address URL parsing for STAC items that may be accessed as direct .json files rather than through traditional STAC API endpoints.

Changes:

  • Updated eopf-geozarr dependency from v0.7.0 to v0.7.1
  • Fixed item ID parsing to strip .json extensions in three scripts (register_v1.py, convert_v1_s2.py, change_storage_tier.py)
  • Enhanced source URL detection in convert_v1_s2.py to handle .json files
  • Added keep_scale_offset parameter to convert_s2_optimized function call
  • Introduced new bulk submission script for testing workflows
  • Modified test workflow configuration

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pyproject.toml Updates eopf-geozarr dependency to v0.7.1
scripts/register_v1.py Adds .json extension stripping to item ID parsing
scripts/convert_v1_s2.py Fixes item ID parsing, updates source URL detection, and adds keep_scale_offset parameter
scripts/change_storage_tier.py Adds .json extension stripping to item ID parsing
operator-tools/submit_test_workflow_wh_list.py New bulk workflow submission script with hardcoded product list
operator-tools/submit_test_workflow_wh.py Updates test configuration with new URLs and port

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +87
#!/usr/bin/env python3
import json
import time

import requests

# List of products to process (from S3 listing)
products = [
"S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456",
"S2A_MSIL2A_20210920T120331_N0500_R023_T28RBS_20230108T205311",
"S2A_MSIL2A_20210927T115221_N0500_R123_T28RBS_20230112T171055",
"S2A_MSIL2A_20210930T120331_N0500_R023_T28RBS_20230109T180123",
"S2A_MSIL2A_20211007T115221_N0500_R123_T28RBS_20230102T185542",
"S2A_MSIL2A_20211010T120331_N0500_R023_T28RBS_20230106T001906",
"S2A_MSIL2A_20211017T115221_N0500_R123_T28RBS_20230104T081202",
"S2A_MSIL2A_20211020T120331_N0500_R023_T28RBS_20230105T095115",
"S2A_MSIL2A_20211027T115221_N0500_R123_T28RBS_20230105T031019",
"S2A_MSIL2A_20211030T120331_N0500_R023_T28RBS_20230105T031900",
"S2A_MSIL2A_20211106T115221_N0500_R123_T28RBS_20221229T073629",
"S2A_MSIL2A_20211109T120331_N0500_R023_T28RBS_20221229T073655",
"S2A_MSIL2A_20211116T115221_N0500_R123_T28RBS_20221230T200900",
"S2A_MSIL2A_20211119T120321_N0500_R023_T28RBS_20221229T191326",
"S2A_MSIL2A_20211126T115221_N0500_R123_T28RBS_20230102T132307",
"S2A_MSIL2A_20211129T120321_N0500_R023_T28RBS_20221230T041345",
"S2A_MSIL2A_20211206T115221_N0500_R123_T28RBS_20221224T140610",
"S2A_MSIL2A_20211209T120321_N0500_R023_T28RBS_20221225T010957",
"S2A_MSIL2A_20211216T115221_N0500_R123_T28RBS_20221224T043535",
"S2A_MSIL2A_20211219T120321_N0500_R023_T28RBS_20221224T151838",
"S2A_MSIL2A_20211226T115221_N0500_R123_T28RBS_20221224T161518",
"S2A_MSIL2A_20211229T120331_N0500_R023_T28RBS_20221227T100257",
"S2B_MSIL2A_20210922T115219_N0500_R123_T28RBS_20230113T174254",
"S2B_MSIL2A_20210925T120319_N0500_R023_T28RBS_20230109T005655",
"S2B_MSIL2A_20211002T115219_N0500_R123_T28RBS_20230102T210850",
"S2B_MSIL2A_20211005T120329_N0500_R023_T28RBS_20230105T092944",
"S2B_MSIL2A_20211012T115219_N0500_R123_T28RBS_20230103T222101",
"S2B_MSIL2A_20211015T120329_N0500_R023_T28RBS_20230104T221431",
"S2B_MSIL2A_20211022T115219_N0500_R123_T28RBS_20230104T151158",
"S2B_MSIL2A_20211025T120329_N0500_R023_T28RBS_20230105T094555",
"S2B_MSIL2A_20211101T115219_N0500_R123_T28RBS_20221228T204555",
"S2B_MSIL2A_20211104T120329_N0500_R023_T28RBS_20221230T183447",
"S2B_MSIL2A_20211111T115219_N0500_R123_T28RBS_20221229T220152",
"S2B_MSIL2A_20211114T120319_N0500_R023_T28RBS_20221229T121936",
"S2B_MSIL2A_20211121T115219_N0500_R123_T28RBS_20221231T033626",
"S2B_MSIL2A_20211124T120319_N0500_R023_T28RBS_20221229T221129",
"S2B_MSIL2A_20211201T115219_N0500_R123_T28RBS_20221224T083520",
"S2B_MSIL2A_20211204T120319_N0500_R023_T28RBS_20221222T175403",
"S2B_MSIL2A_20211211T115209_N0500_R123_T28RBS_20221225T033159",
"S2B_MSIL2A_20211214T120319_N0500_R023_T28RBS_20221226T065701",
"S2B_MSIL2A_20211221T115219_N0500_R123_T28RBS_20221224T053435",
"S2B_MSIL2A_20211224T120319_N0500_R023_T28RBS_20221227T125536",
"S2B_MSIL2A_20211231T115219_N0500_R123_T28RBS_20221226T114305",
"S2B_MSIL2A_20220103T120319_N0510_R023_T28RBS_20240423T015628",
]

# Process each product
for i, product in enumerate(products, 1):
# Test STAC item submission
payload = {
"source_url": f"https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/{product}.json",
"collection": "sentinel-2-l2a",
"action": "convert-v1-s2-hp", # specify the action to use the S2 high-priority trigger
}

message = json.dumps(payload)

# Submit via HTTP webhook endpoint
try:
response = requests.post(
"http://localhost:12000/samples",
data=message,
headers={"Content-Type": "application/json"},
)

print(f"[{i}/{len(products)}] ✅ Published workflow for item: {product}")
print(f"Response status: {response.status_code}")

if response.status_code != 200:
print(f"Warning: Non-200 response for {product}: {response.text}")

except Exception as e:
print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")

# Add small delay to avoid overwhelming the server
if i < len(products):
time.sleep(1)

print(f"\n✅ Completed processing {len(products)} products.")
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new script is not documented in the operator-tools README.md or the main repository README.md. Consider adding documentation explaining the purpose of this script, when it should be used, and how it differs from submit_test_workflow_wh.py. The script appears to be for bulk submission of a predefined list of products, which would be helpful information for operators.

Copilot uses AI. Check for mistakes.
Comment on lines 7 to +18
payload = {
"source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2C_MSIL2A_20251117T090251_N0511_R007_T35SMA_20251117T124014",
"collection": "sentinel-2-l2a-dp-test",
# "source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2C_MSIL2A_20251117T090251_N0511_R007_T35SMA_20251117T124014",
"source_url": "https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456.json",
"collection": "sentinel-2-l2a-staging",
"action": "convert-v1-s2-hp", # specify the action to use the S2 high-priority trigger
}

message = json.dumps(payload)

# Submit via HTTP webhook endpoint
response = requests.post(
"http://localhost:12000/samples",
"http://localhost:12001/samples",
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains hardcoded test configuration values that have been changed. The commented-out source URL and modified collection/port suggest these are temporary testing changes. Consider either: (1) reverting these to sensible defaults if this is meant to be a template/example, or (2) making these parameters configurable via command-line arguments or environment variables to avoid checking in test-specific configuration.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +101
zarr_url = (
get_zarr_url(source_url)
if ("/items/" in source_url or source_url.endswith(".json"))
else source_url
)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition for determining when to call get_zarr_url now checks both "/items/" in the URL and ".json" extension. However, these conditions are not mutually exclusive - a URL could have "/items/" without ".json" or vice versa. Consider if the logic should be adjusted to handle all cases properly. For example, a URL like "https://example.com/items/item-id" (without .json) would currently call get_zarr_url, but a URL like "https://example.com/stac-items/item-id.json" (without "/items/") would also call get_zarr_url. This seems correct, but verify this matches the intended behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +87
#!/usr/bin/env python3
import json
import time

import requests

# List of products to process (from S3 listing)
products = [
"S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456",
"S2A_MSIL2A_20210920T120331_N0500_R023_T28RBS_20230108T205311",
"S2A_MSIL2A_20210927T115221_N0500_R123_T28RBS_20230112T171055",
"S2A_MSIL2A_20210930T120331_N0500_R023_T28RBS_20230109T180123",
"S2A_MSIL2A_20211007T115221_N0500_R123_T28RBS_20230102T185542",
"S2A_MSIL2A_20211010T120331_N0500_R023_T28RBS_20230106T001906",
"S2A_MSIL2A_20211017T115221_N0500_R123_T28RBS_20230104T081202",
"S2A_MSIL2A_20211020T120331_N0500_R023_T28RBS_20230105T095115",
"S2A_MSIL2A_20211027T115221_N0500_R123_T28RBS_20230105T031019",
"S2A_MSIL2A_20211030T120331_N0500_R023_T28RBS_20230105T031900",
"S2A_MSIL2A_20211106T115221_N0500_R123_T28RBS_20221229T073629",
"S2A_MSIL2A_20211109T120331_N0500_R023_T28RBS_20221229T073655",
"S2A_MSIL2A_20211116T115221_N0500_R123_T28RBS_20221230T200900",
"S2A_MSIL2A_20211119T120321_N0500_R023_T28RBS_20221229T191326",
"S2A_MSIL2A_20211126T115221_N0500_R123_T28RBS_20230102T132307",
"S2A_MSIL2A_20211129T120321_N0500_R023_T28RBS_20221230T041345",
"S2A_MSIL2A_20211206T115221_N0500_R123_T28RBS_20221224T140610",
"S2A_MSIL2A_20211209T120321_N0500_R023_T28RBS_20221225T010957",
"S2A_MSIL2A_20211216T115221_N0500_R123_T28RBS_20221224T043535",
"S2A_MSIL2A_20211219T120321_N0500_R023_T28RBS_20221224T151838",
"S2A_MSIL2A_20211226T115221_N0500_R123_T28RBS_20221224T161518",
"S2A_MSIL2A_20211229T120331_N0500_R023_T28RBS_20221227T100257",
"S2B_MSIL2A_20210922T115219_N0500_R123_T28RBS_20230113T174254",
"S2B_MSIL2A_20210925T120319_N0500_R023_T28RBS_20230109T005655",
"S2B_MSIL2A_20211002T115219_N0500_R123_T28RBS_20230102T210850",
"S2B_MSIL2A_20211005T120329_N0500_R023_T28RBS_20230105T092944",
"S2B_MSIL2A_20211012T115219_N0500_R123_T28RBS_20230103T222101",
"S2B_MSIL2A_20211015T120329_N0500_R023_T28RBS_20230104T221431",
"S2B_MSIL2A_20211022T115219_N0500_R123_T28RBS_20230104T151158",
"S2B_MSIL2A_20211025T120329_N0500_R023_T28RBS_20230105T094555",
"S2B_MSIL2A_20211101T115219_N0500_R123_T28RBS_20221228T204555",
"S2B_MSIL2A_20211104T120329_N0500_R023_T28RBS_20221230T183447",
"S2B_MSIL2A_20211111T115219_N0500_R123_T28RBS_20221229T220152",
"S2B_MSIL2A_20211114T120319_N0500_R023_T28RBS_20221229T121936",
"S2B_MSIL2A_20211121T115219_N0500_R123_T28RBS_20221231T033626",
"S2B_MSIL2A_20211124T120319_N0500_R023_T28RBS_20221229T221129",
"S2B_MSIL2A_20211201T115219_N0500_R123_T28RBS_20221224T083520",
"S2B_MSIL2A_20211204T120319_N0500_R023_T28RBS_20221222T175403",
"S2B_MSIL2A_20211211T115209_N0500_R123_T28RBS_20221225T033159",
"S2B_MSIL2A_20211214T120319_N0500_R023_T28RBS_20221226T065701",
"S2B_MSIL2A_20211221T115219_N0500_R123_T28RBS_20221224T053435",
"S2B_MSIL2A_20211224T120319_N0500_R023_T28RBS_20221227T125536",
"S2B_MSIL2A_20211231T115219_N0500_R123_T28RBS_20221226T114305",
"S2B_MSIL2A_20220103T120319_N0510_R023_T28RBS_20240423T015628",
]

# Process each product
for i, product in enumerate(products, 1):
# Test STAC item submission
payload = {
"source_url": f"https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/{product}.json",
"collection": "sentinel-2-l2a",
"action": "convert-v1-s2-hp", # specify the action to use the S2 high-priority trigger
}

message = json.dumps(payload)

# Submit via HTTP webhook endpoint
try:
response = requests.post(
"http://localhost:12000/samples",
data=message,
headers={"Content-Type": "application/json"},
)

print(f"[{i}/{len(products)}] ✅ Published workflow for item: {product}")
print(f"Response status: {response.status_code}")

if response.status_code != 200:
print(f"Warning: Non-200 response for {product}: {response.text}")

except Exception as e:
print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")

# Add small delay to avoid overwhelming the server
if i < len(products):
time.sleep(1)

print(f"\n✅ Completed processing {len(products)} products.")
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script contains hardcoded configuration values (localhost URLs, collection names, action types) that appear to be for testing/development purposes. Consider adding a comment at the top of the file indicating this is a development/testing script and should not be used in production, or better yet, make these configurable via command-line arguments or environment variables.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +81
except Exception as e:
print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}")
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling catches a generic Exception but only logs the string representation of the error. Consider catching specific exceptions (e.g., requests.RequestException, requests.Timeout) and handling them differently, or at minimum log the full traceback for debugging purposes by adding exc_info=True to the print statement or using a proper logger.

Copilot uses AI. Check for mistakes.
return {"processed": 0, "succeeded": 0, "failed": 0}

item_id = urlparse(stac_item_url).path.split("/")[-1]
item_id = urlparse(stac_item_url).path.split("/")[-1].replace(".json", "")
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new item ID parsing logic that strips .json extensions lacks test coverage. Consider adding tests to verify that item_id extraction works correctly for both traditional STAC item URLs (without .json) and direct JSON file URLs (with .json extension), ensuring the .replace(".json", "") logic works as expected.

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +101
item_id = urlparse(source_url).path.rstrip("/").split("/")[-1].replace(".json", "")
logger.info(f"🔄 Converting (S2 Optimized): {item_id}")
logger.info(f" Collection: {collection}")

# Resolve source: STAC item or direct Zarr URL
zarr_url = get_zarr_url(source_url) if "/items/" in source_url else source_url
zarr_url = (
get_zarr_url(source_url)
if ("/items/" in source_url or source_url.endswith(".json"))
else source_url
)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new item ID parsing logic that strips .json extensions and the updated source URL condition lack test coverage. Consider adding tests to verify that item_id extraction works correctly for URLs with and without .json extensions, and that the zarr_url resolution logic handles all combinations of "/items/" and ".json" correctly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@lhoupert lhoupert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

emmanuelmathot and others added 2 commits January 15, 2026 17:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@emmanuelmathot emmanuelmathot merged commit 005e053 into main Jan 15, 2026
2 checks passed
@github-actions github-actions bot mentioned this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants