You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: operator-tools/README_collections.md
+230-2Lines changed: 230 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -325,7 +325,7 @@ uv run operator-tools/manage_collections.py delete sentinel-2-l2a-staging --clea
325
325
326
326
#### 5. `info` - Show Collection Information
327
327
328
-
Display detailed information about a collection, including item count. **NEW**: Optionally include comprehensive S3 storage statistics.
328
+
Display detailed information about a collection, including item count. **NEW**: Optionally include comprehensive S3 storage statistics and storage tier statistics from STAC metadata.
329
329
330
330
```bash
331
331
# Basic collection info
@@ -334,6 +334,12 @@ uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging
334
334
# Include S3 storage statistics (samples first 5 items)
335
335
uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-stats
336
336
337
+
# Include storage tier statistics from STAC metadata (all items)
338
+
uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-stac-info
339
+
340
+
# Combine both statistics
341
+
uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-stats --s3-stac-info
342
+
337
343
# With debug output (shows detailed URL extraction)
338
344
uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-stats --debug
339
345
@@ -344,6 +350,7 @@ uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-sta
344
350
345
351
**Options:**
346
352
-`--s3-stats`: **[NEW]** Include S3 storage statistics (object count, total size)
353
+
-`--s3-stac-info`: **[NEW]** Query STAC API and compute storage tier statistics for all assets of all items
347
354
-`--debug`: **[NEW]** Show detailed debug information about S3 URL extraction
348
355
-`--s3-endpoint`: S3 endpoint URL (optional, uses `AWS_ENDPOINT_URL` env var if not specified)
349
356
@@ -358,6 +365,11 @@ uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-sta
358
365
- Object count and total size for sampled items
359
366
- Estimated total storage for all items in collection
360
367
- Works with **any S3 asset structure** (individual files, Zarr stores, directories)
368
+
-**[NEW]** Storage tier statistics (when `--s3-stac-info` is used):
369
+
- Items/assets with tier info vs without tier info
370
+
- Distribution of storage tiers (STANDARD, STANDARD_IA, EXPRESS_ONEZONE, MIXED)
371
+
- Detailed breakdowns for mixed storage tiers
372
+
- Reads from STAC metadata (no S3 queries required)
361
373
362
374
**S3 Statistics Behavior:**
363
375
- Samples the first 5 items to avoid long wait times on large collections
@@ -367,6 +379,16 @@ uv run operator-tools/manage_collections.py info sentinel-2-l2a-staging --s3-sta
367
379
- Provides estimated total based on sample average
#### 6. `sync-storage-tiers` - Sync Storage Tier Metadata for Collection
472
+
473
+
Sync storage tier metadata for all items in a collection with S3. This command queries S3 for current storage classes at the **object level** and updates STAC item metadata to match. It compares object-level distributions (not just asset-level tiers) and shows a detailed summary of mismatches found and corrections made.
474
+
475
+
```bash
476
+
# Dry run (preview changes)
477
+
uv run operator-tools/manage_collections.py sync-storage-tiers sentinel-2-l2a-staging \
- Keeping STAC metadata in sync with actual S3 storage classes
588
+
- Finding and fixing storage tier mismatches across collections
589
+
- Adding storage tier metadata to legacy items
590
+
- Auditing storage tier accuracy before reporting
591
+
592
+
**Best practices:**
593
+
- Always use `--dry-run` first to preview changes
594
+
- Review the problems section to understand mismatches
595
+
- Use `--add-missing` for legacy items that don't have `alternate.s3`
596
+
- Test on a single item with `manage_item.py sync-storage-tiers` before running on entire collection
597
+
418
598
### Global Options
419
599
420
600
#### `--api-url`
@@ -433,7 +613,7 @@ The `manage_item.py` tool provides commands for working with individual STAC ite
433
613
434
614
### `info` - Show Item Information
435
615
436
-
Display detailed information about a specific STAC item, including optional S3 statistics.
616
+
Display detailed information about a specific STAC item, including optional S3 statistics and storage tier statistics.
437
617
438
618
```bash
439
619
# Basic item info
@@ -442,6 +622,12 @@ uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID
442
622
# Include S3 storage statistics
443
623
uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID --s3-stats
444
624
625
+
# Include storage tier statistics from STAC metadata
626
+
uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID --s3-stac-info
627
+
628
+
# Combine both statistics
629
+
uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID --s3-stats --s3-stac-info
630
+
445
631
# With debug output (shows detailed URL extraction)
446
632
uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID --s3-stats --debug
447
633
```
@@ -454,6 +640,11 @@ uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID --s3-st
454
640
- S3 URLs extracted from assets
455
641
- Object count
456
642
- Total size in GB
643
+
-**With `--s3-stac-info`:**
644
+
- Total assets and tier coverage statistics
645
+
- Storage tier distribution by asset count
646
+
- Distribution breakdowns for mixed storage tiers
647
+
- Reads from STAC metadata (no S3 queries required)
457
648
-**With `--debug`:**
458
649
- Exact S3 URLs found in each asset
459
650
- Which fields contain S3 URLs (`alternate.s3.href` vs main `href`)
@@ -463,6 +654,7 @@ uv run operator-tools/manage_item.py info sentinel-2-l2a-staging ITEM_ID --s3-st
463
654
- Debugging why an item's S3 data isn't being found
464
655
- Verifying S3 URLs are correctly formatted
465
656
- Understanding how much S3 storage an item uses
657
+
- Checking storage tier distribution for an item
466
658
- Investigating issues before batch operations
467
659
468
660
### `delete` - Delete a Single Item
@@ -512,6 +704,42 @@ DELETION SUMMARY:
512
704
- Verifying S3 cleanup works before scaling to collection
513
705
- Debugging deletion issues
514
706
707
+
### `sync-storage-tiers` - Sync Storage Tier Metadata for a Single Item
708
+
709
+
Sync storage tier metadata for a single STAC item with S3. This command queries S3 for current storage classes at the **object level** and updates STAC item metadata to match. It compares object-level distributions (not just asset-level tiers) and shows detailed mismatches.
710
+
711
+
```bash
712
+
# Dry run (preview changes)
713
+
uv run operator-tools/manage_item.py sync-storage-tiers sentinel-2-l2a-staging ITEM_ID \
0 commit comments