chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing#81
chore(deps): Update vdata-model version converter to 0.7.1 and fix item ID parsing#81emmanuelmathot merged 8 commits intomainfrom
Conversation
…sion script; add product submission workflow script
…ript; change eopf-geozarr dependency to a specific branch
There was a problem hiding this comment.
Pull request overview
This pull request updates the vdata-model dependency to version 0.7.1 and fixes item ID parsing to handle JSON file extensions correctly across multiple scripts. The changes primarily address URL parsing for STAC items that may be accessed as direct .json files rather than through traditional STAC API endpoints.
Changes:
- Updated eopf-geozarr dependency from v0.7.0 to v0.7.1
- Fixed item ID parsing to strip .json extensions in three scripts (register_v1.py, convert_v1_s2.py, change_storage_tier.py)
- Enhanced source URL detection in convert_v1_s2.py to handle .json files
- Added keep_scale_offset parameter to convert_s2_optimized function call
- Introduced new bulk submission script for testing workflows
- Modified test workflow configuration
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updates eopf-geozarr dependency to v0.7.1 |
| scripts/register_v1.py | Adds .json extension stripping to item ID parsing |
| scripts/convert_v1_s2.py | Fixes item ID parsing, updates source URL detection, and adds keep_scale_offset parameter |
| scripts/change_storage_tier.py | Adds .json extension stripping to item ID parsing |
| operator-tools/submit_test_workflow_wh_list.py | New bulk workflow submission script with hardcoded product list |
| operator-tools/submit_test_workflow_wh.py | Updates test configuration with new URLs and port |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #!/usr/bin/env python3 | ||
| import json | ||
| import time | ||
|
|
||
| import requests | ||
|
|
||
| # List of products to process (from S3 listing) | ||
| products = [ | ||
| "S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456", | ||
| "S2A_MSIL2A_20210920T120331_N0500_R023_T28RBS_20230108T205311", | ||
| "S2A_MSIL2A_20210927T115221_N0500_R123_T28RBS_20230112T171055", | ||
| "S2A_MSIL2A_20210930T120331_N0500_R023_T28RBS_20230109T180123", | ||
| "S2A_MSIL2A_20211007T115221_N0500_R123_T28RBS_20230102T185542", | ||
| "S2A_MSIL2A_20211010T120331_N0500_R023_T28RBS_20230106T001906", | ||
| "S2A_MSIL2A_20211017T115221_N0500_R123_T28RBS_20230104T081202", | ||
| "S2A_MSIL2A_20211020T120331_N0500_R023_T28RBS_20230105T095115", | ||
| "S2A_MSIL2A_20211027T115221_N0500_R123_T28RBS_20230105T031019", | ||
| "S2A_MSIL2A_20211030T120331_N0500_R023_T28RBS_20230105T031900", | ||
| "S2A_MSIL2A_20211106T115221_N0500_R123_T28RBS_20221229T073629", | ||
| "S2A_MSIL2A_20211109T120331_N0500_R023_T28RBS_20221229T073655", | ||
| "S2A_MSIL2A_20211116T115221_N0500_R123_T28RBS_20221230T200900", | ||
| "S2A_MSIL2A_20211119T120321_N0500_R023_T28RBS_20221229T191326", | ||
| "S2A_MSIL2A_20211126T115221_N0500_R123_T28RBS_20230102T132307", | ||
| "S2A_MSIL2A_20211129T120321_N0500_R023_T28RBS_20221230T041345", | ||
| "S2A_MSIL2A_20211206T115221_N0500_R123_T28RBS_20221224T140610", | ||
| "S2A_MSIL2A_20211209T120321_N0500_R023_T28RBS_20221225T010957", | ||
| "S2A_MSIL2A_20211216T115221_N0500_R123_T28RBS_20221224T043535", | ||
| "S2A_MSIL2A_20211219T120321_N0500_R023_T28RBS_20221224T151838", | ||
| "S2A_MSIL2A_20211226T115221_N0500_R123_T28RBS_20221224T161518", | ||
| "S2A_MSIL2A_20211229T120331_N0500_R023_T28RBS_20221227T100257", | ||
| "S2B_MSIL2A_20210922T115219_N0500_R123_T28RBS_20230113T174254", | ||
| "S2B_MSIL2A_20210925T120319_N0500_R023_T28RBS_20230109T005655", | ||
| "S2B_MSIL2A_20211002T115219_N0500_R123_T28RBS_20230102T210850", | ||
| "S2B_MSIL2A_20211005T120329_N0500_R023_T28RBS_20230105T092944", | ||
| "S2B_MSIL2A_20211012T115219_N0500_R123_T28RBS_20230103T222101", | ||
| "S2B_MSIL2A_20211015T120329_N0500_R023_T28RBS_20230104T221431", | ||
| "S2B_MSIL2A_20211022T115219_N0500_R123_T28RBS_20230104T151158", | ||
| "S2B_MSIL2A_20211025T120329_N0500_R023_T28RBS_20230105T094555", | ||
| "S2B_MSIL2A_20211101T115219_N0500_R123_T28RBS_20221228T204555", | ||
| "S2B_MSIL2A_20211104T120329_N0500_R023_T28RBS_20221230T183447", | ||
| "S2B_MSIL2A_20211111T115219_N0500_R123_T28RBS_20221229T220152", | ||
| "S2B_MSIL2A_20211114T120319_N0500_R023_T28RBS_20221229T121936", | ||
| "S2B_MSIL2A_20211121T115219_N0500_R123_T28RBS_20221231T033626", | ||
| "S2B_MSIL2A_20211124T120319_N0500_R023_T28RBS_20221229T221129", | ||
| "S2B_MSIL2A_20211201T115219_N0500_R123_T28RBS_20221224T083520", | ||
| "S2B_MSIL2A_20211204T120319_N0500_R023_T28RBS_20221222T175403", | ||
| "S2B_MSIL2A_20211211T115209_N0500_R123_T28RBS_20221225T033159", | ||
| "S2B_MSIL2A_20211214T120319_N0500_R023_T28RBS_20221226T065701", | ||
| "S2B_MSIL2A_20211221T115219_N0500_R123_T28RBS_20221224T053435", | ||
| "S2B_MSIL2A_20211224T120319_N0500_R023_T28RBS_20221227T125536", | ||
| "S2B_MSIL2A_20211231T115219_N0500_R123_T28RBS_20221226T114305", | ||
| "S2B_MSIL2A_20220103T120319_N0510_R023_T28RBS_20240423T015628", | ||
| ] | ||
|
|
||
| # Process each product | ||
| for i, product in enumerate(products, 1): | ||
| # Test STAC item submission | ||
| payload = { | ||
| "source_url": f"https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/{product}.json", | ||
| "collection": "sentinel-2-l2a", | ||
| "action": "convert-v1-s2-hp", # specify the action to use the S2 high-priority trigger | ||
| } | ||
|
|
||
| message = json.dumps(payload) | ||
|
|
||
| # Submit via HTTP webhook endpoint | ||
| try: | ||
| response = requests.post( | ||
| "http://localhost:12000/samples", | ||
| data=message, | ||
| headers={"Content-Type": "application/json"}, | ||
| ) | ||
|
|
||
| print(f"[{i}/{len(products)}] ✅ Published workflow for item: {product}") | ||
| print(f"Response status: {response.status_code}") | ||
|
|
||
| if response.status_code != 200: | ||
| print(f"Warning: Non-200 response for {product}: {response.text}") | ||
|
|
||
| except Exception as e: | ||
| print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}") | ||
|
|
||
| # Add small delay to avoid overwhelming the server | ||
| if i < len(products): | ||
| time.sleep(1) | ||
|
|
||
| print(f"\n✅ Completed processing {len(products)} products.") |
There was a problem hiding this comment.
This new script is not documented in the operator-tools README.md or the main repository README.md. Consider adding documentation explaining the purpose of this script, when it should be used, and how it differs from submit_test_workflow_wh.py. The script appears to be for bulk submission of a predefined list of products, which would be helpful information for operators.
| payload = { | ||
| "source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2C_MSIL2A_20251117T090251_N0511_R007_T35SMA_20251117T124014", | ||
| "collection": "sentinel-2-l2a-dp-test", | ||
| # "source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2C_MSIL2A_20251117T090251_N0511_R007_T35SMA_20251117T124014", | ||
| "source_url": "https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456.json", | ||
| "collection": "sentinel-2-l2a-staging", | ||
| "action": "convert-v1-s2-hp", # specify the action to use the S2 high-priority trigger | ||
| } | ||
|
|
||
| message = json.dumps(payload) | ||
|
|
||
| # Submit via HTTP webhook endpoint | ||
| response = requests.post( | ||
| "http://localhost:12000/samples", | ||
| "http://localhost:12001/samples", |
There was a problem hiding this comment.
This file contains hardcoded test configuration values that have been changed. The commented-out source URL and modified collection/port suggest these are temporary testing changes. Consider either: (1) reverting these to sensible defaults if this is meant to be a template/example, or (2) making these parameters configurable via command-line arguments or environment variables to avoid checking in test-specific configuration.
| zarr_url = ( | ||
| get_zarr_url(source_url) | ||
| if ("/items/" in source_url or source_url.endswith(".json")) | ||
| else source_url | ||
| ) |
There was a problem hiding this comment.
The condition for determining when to call get_zarr_url now checks both "/items/" in the URL and ".json" extension. However, these conditions are not mutually exclusive - a URL could have "/items/" without ".json" or vice versa. Consider if the logic should be adjusted to handle all cases properly. For example, a URL like "https://example.com/items/item-id" (without .json) would currently call get_zarr_url, but a URL like "https://example.com/stac-items/item-id.json" (without "/items/") would also call get_zarr_url. This seems correct, but verify this matches the intended behavior.
| #!/usr/bin/env python3 | ||
| import json | ||
| import time | ||
|
|
||
| import requests | ||
|
|
||
| # List of products to process (from S3 listing) | ||
| products = [ | ||
| "S2A_MSIL2A_20210917T115221_N0500_R123_T28RBS_20230110T165456", | ||
| "S2A_MSIL2A_20210920T120331_N0500_R023_T28RBS_20230108T205311", | ||
| "S2A_MSIL2A_20210927T115221_N0500_R123_T28RBS_20230112T171055", | ||
| "S2A_MSIL2A_20210930T120331_N0500_R023_T28RBS_20230109T180123", | ||
| "S2A_MSIL2A_20211007T115221_N0500_R123_T28RBS_20230102T185542", | ||
| "S2A_MSIL2A_20211010T120331_N0500_R023_T28RBS_20230106T001906", | ||
| "S2A_MSIL2A_20211017T115221_N0500_R123_T28RBS_20230104T081202", | ||
| "S2A_MSIL2A_20211020T120331_N0500_R023_T28RBS_20230105T095115", | ||
| "S2A_MSIL2A_20211027T115221_N0500_R123_T28RBS_20230105T031019", | ||
| "S2A_MSIL2A_20211030T120331_N0500_R023_T28RBS_20230105T031900", | ||
| "S2A_MSIL2A_20211106T115221_N0500_R123_T28RBS_20221229T073629", | ||
| "S2A_MSIL2A_20211109T120331_N0500_R023_T28RBS_20221229T073655", | ||
| "S2A_MSIL2A_20211116T115221_N0500_R123_T28RBS_20221230T200900", | ||
| "S2A_MSIL2A_20211119T120321_N0500_R023_T28RBS_20221229T191326", | ||
| "S2A_MSIL2A_20211126T115221_N0500_R123_T28RBS_20230102T132307", | ||
| "S2A_MSIL2A_20211129T120321_N0500_R023_T28RBS_20221230T041345", | ||
| "S2A_MSIL2A_20211206T115221_N0500_R123_T28RBS_20221224T140610", | ||
| "S2A_MSIL2A_20211209T120321_N0500_R023_T28RBS_20221225T010957", | ||
| "S2A_MSIL2A_20211216T115221_N0500_R123_T28RBS_20221224T043535", | ||
| "S2A_MSIL2A_20211219T120321_N0500_R023_T28RBS_20221224T151838", | ||
| "S2A_MSIL2A_20211226T115221_N0500_R123_T28RBS_20221224T161518", | ||
| "S2A_MSIL2A_20211229T120331_N0500_R023_T28RBS_20221227T100257", | ||
| "S2B_MSIL2A_20210922T115219_N0500_R123_T28RBS_20230113T174254", | ||
| "S2B_MSIL2A_20210925T120319_N0500_R023_T28RBS_20230109T005655", | ||
| "S2B_MSIL2A_20211002T115219_N0500_R123_T28RBS_20230102T210850", | ||
| "S2B_MSIL2A_20211005T120329_N0500_R023_T28RBS_20230105T092944", | ||
| "S2B_MSIL2A_20211012T115219_N0500_R123_T28RBS_20230103T222101", | ||
| "S2B_MSIL2A_20211015T120329_N0500_R023_T28RBS_20230104T221431", | ||
| "S2B_MSIL2A_20211022T115219_N0500_R123_T28RBS_20230104T151158", | ||
| "S2B_MSIL2A_20211025T120329_N0500_R023_T28RBS_20230105T094555", | ||
| "S2B_MSIL2A_20211101T115219_N0500_R123_T28RBS_20221228T204555", | ||
| "S2B_MSIL2A_20211104T120329_N0500_R023_T28RBS_20221230T183447", | ||
| "S2B_MSIL2A_20211111T115219_N0500_R123_T28RBS_20221229T220152", | ||
| "S2B_MSIL2A_20211114T120319_N0500_R023_T28RBS_20221229T121936", | ||
| "S2B_MSIL2A_20211121T115219_N0500_R123_T28RBS_20221231T033626", | ||
| "S2B_MSIL2A_20211124T120319_N0500_R023_T28RBS_20221229T221129", | ||
| "S2B_MSIL2A_20211201T115219_N0500_R123_T28RBS_20221224T083520", | ||
| "S2B_MSIL2A_20211204T120319_N0500_R023_T28RBS_20221222T175403", | ||
| "S2B_MSIL2A_20211211T115209_N0500_R123_T28RBS_20221225T033159", | ||
| "S2B_MSIL2A_20211214T120319_N0500_R023_T28RBS_20221226T065701", | ||
| "S2B_MSIL2A_20211221T115219_N0500_R123_T28RBS_20221224T053435", | ||
| "S2B_MSIL2A_20211224T120319_N0500_R023_T28RBS_20221227T125536", | ||
| "S2B_MSIL2A_20211231T115219_N0500_R123_T28RBS_20221226T114305", | ||
| "S2B_MSIL2A_20220103T120319_N0510_R023_T28RBS_20240423T015628", | ||
| ] | ||
|
|
||
| # Process each product | ||
| for i, product in enumerate(products, 1): | ||
| # Test STAC item submission | ||
| payload = { | ||
| "source_url": f"https://s3.explorer.eopf.copernicus.eu/esa-zarr-sentinel-explorer-fra/cpm-manual/{product}.json", | ||
| "collection": "sentinel-2-l2a", | ||
| "action": "convert-v1-s2-hp", # specify the action to use the S2 high-priority trigger | ||
| } | ||
|
|
||
| message = json.dumps(payload) | ||
|
|
||
| # Submit via HTTP webhook endpoint | ||
| try: | ||
| response = requests.post( | ||
| "http://localhost:12000/samples", | ||
| data=message, | ||
| headers={"Content-Type": "application/json"}, | ||
| ) | ||
|
|
||
| print(f"[{i}/{len(products)}] ✅ Published workflow for item: {product}") | ||
| print(f"Response status: {response.status_code}") | ||
|
|
||
| if response.status_code != 200: | ||
| print(f"Warning: Non-200 response for {product}: {response.text}") | ||
|
|
||
| except Exception as e: | ||
| print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}") | ||
|
|
||
| # Add small delay to avoid overwhelming the server | ||
| if i < len(products): | ||
| time.sleep(1) | ||
|
|
||
| print(f"\n✅ Completed processing {len(products)} products.") |
There was a problem hiding this comment.
This script contains hardcoded configuration values (localhost URLs, collection names, action types) that appear to be for testing/development purposes. Consider adding a comment at the top of the file indicating this is a development/testing script and should not be used in production, or better yet, make these configurable via command-line arguments or environment variables.
| except Exception as e: | ||
| print(f"[{i}/{len(products)}] ❌ Error processing {product}: {str(e)}") |
There was a problem hiding this comment.
The error handling catches a generic Exception but only logs the string representation of the error. Consider catching specific exceptions (e.g., requests.RequestException, requests.Timeout) and handling them differently, or at minimum log the full traceback for debugging purposes by adding exc_info=True to the print statement or using a proper logger.
| return {"processed": 0, "succeeded": 0, "failed": 0} | ||
|
|
||
| item_id = urlparse(stac_item_url).path.split("/")[-1] | ||
| item_id = urlparse(stac_item_url).path.split("/")[-1].replace(".json", "") |
There was a problem hiding this comment.
The new item ID parsing logic that strips .json extensions lacks test coverage. Consider adding tests to verify that item_id extraction works correctly for both traditional STAC item URLs (without .json) and direct JSON file URLs (with .json extension), ensuring the .replace(".json", "") logic works as expected.
| item_id = urlparse(source_url).path.rstrip("/").split("/")[-1].replace(".json", "") | ||
| logger.info(f"🔄 Converting (S2 Optimized): {item_id}") | ||
| logger.info(f" Collection: {collection}") | ||
|
|
||
| # Resolve source: STAC item or direct Zarr URL | ||
| zarr_url = get_zarr_url(source_url) if "/items/" in source_url else source_url | ||
| zarr_url = ( | ||
| get_zarr_url(source_url) | ||
| if ("/items/" in source_url or source_url.endswith(".json")) | ||
| else source_url | ||
| ) |
There was a problem hiding this comment.
The new item ID parsing logic that strips .json extensions and the updated source URL condition lack test coverage. Consider adding tests to verify that item_id extraction works correctly for URLs with and without .json extensions, and that the zarr_url resolution logic handles all combinations of "/items/" and ".json" correctly.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Update the item ID parsing to correctly handle JSON file extensions and improve source URL processing in the conversion script. Additionally, introduce a new script for product submission workflow.