Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions docs/DatasetConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,119 @@ There is no data-source-specific configuration.
The vector features should have `EC_hcat_c` and `EC_hcat_n` properties indicating the
HCAT category code and name respectively.

### rslearn.data_sources.earthdaily.Sentinel2

Sentinel-2 L2A data on EarthDaily platform (collection: `sentinel-2-c1-l2a`).

This data source requires the optional `earthdaily[platform]` dependency and EarthDaily
credentials in the environment (see [EarthDaily documentation](https://earthdaily.github.io/earthdaily-python-client/main/) for supported methods).

By default, this data source applies per-asset scale/offset values from STAC
`raster:bands` metadata (`harmonize: true`). Set `harmonize: false` to keep raw values.

The additional data source configuration looks like this:
Only the keys documented below are supported in `init_args`; unknown keys will raise an error.

```jsonc
{
// Whether to apply STAC `raster:bands` scale/offset (default true). Set to false to
// keep raw values.
"harmonize": true,
// Optional: EarthDaily Sentinel-2 asset keys to use (default null). If null and the
// layer config is available, assets are inferred from the layer's band sets.
// Example: ["red", "green", "blue", "nir", "swir16", "swir22"]
"assets": null,
// Optional: maximum cloud cover (%) to filter items at search time.
// If set, it overrides any `eo:cloud_cover` filter in `query`.
"cloud_cover_max": null,
// Optional: default cloud cover threshold (%) to apply if cloud_cover_max is not set.
"cloud_cover_threshold": null,
// Maximum number of STAC items to fetch per window before rslearn grouping/matching.
"search_max_items": 500,
// Optional ordering of items before grouping (useful with SpaceMode.COMPOSITE +
// CompositingMethod.FIRST_VALID): "cloud_cover" (default), "datetime", or null.
"sort_items_by": "cloud_cover",
// Whether to apply an SCL-based cloud mask during ingest (default false). If true,
// cloudy pixels are set to mask_nodata_value before writing to the dataset tile store.
// Note: this only applies when ingest is enabled (the default).
"apply_cloud_mask": false,
// Which asset key to use as the mask band (default "scl").
"mask_band": "scl",
// SCL values treated as cloudy (default [3, 8, 9, 10]); set to override.
"exclude_scl_values": null,
// Nodata value to write into cloudy pixels.
"mask_nodata_value": 0,
// Optional: STAC API `query` filter passed to searches.
// Example: {"s2:product_type": {"eq": "S2MSI2A"}}
// Note: if cloud_cover_max/cloud_cover_threshold is set, the effective query also
// includes an `eo:cloud_cover` upper bound.
"query": null,
// Optional: STAC item property to sort by before grouping/matching (default null).
// If set, it takes precedence over sort_items_by.
"sort_by": null,
// Whether to sort ascending when sort_by is set (default true).
"sort_ascending": true,
// Optional cache directory for cached item metadata.
"cache_dir": null,
// Timeout for HTTP asset downloads.
"timeout": "10s",
// Retry settings for EarthDaily API client requests (search/get item).
"max_retries": 3,
"retry_backoff_factor": 5.0,
// EarthDaily service name (only "platform" is supported).
"service_name": "platform"
}
```

Example:

```jsonc
{
"layers": {
"sentinel2": {
"type": "raster",
"band_sets": [{
// When harmonize=true (default), reflectance bands are typically scaled to float.
// Use float32 (recommended) or set harmonize=false to keep raw integer values.
"dtype": "float32",
"bands": ["B02", "B03", "B04", "B08"]
}],
"compositing_method": "FIRST_VALID",
"data_source": {
"class_path": "rslearn.data_sources.earthdaily.Sentinel2",
"query_config": {
"space_mode": "COMPOSITE",
"time_mode": "WITHIN",
"max_matches": 1
},
"init_args": {
"cloud_cover_max": 15.0,
"sort_items_by": "cloud_cover",
"apply_cloud_mask": true,
"mask_nodata_value": 0
}
}
}
}
}
```

Available bands:
- B01
- B02
- B03
- B04
- B05
- B06
- B07
- B08
- B09
- B11
- B12
- B8A
- R, G, B (from the `visual` asset)
- scl, aot, wvp

### rslearn.data_sources.gcp_public_data.Sentinel2

This data source is for Sentinel-2 data on Google Cloud Storage.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ dependencies = [
extra = [
"accelerate>=1.10",
"cdsapi>=0.7.6",
"earthdaily[platform]>=1.0.7",
"earthdaily[platform]>=1.8.0",
"earthengine-api>=1.6.3",
"einops>=0.8",
# https://github.com/fsspec/universal_pathlib?tab=readme-ov-file#adding-universal_pathlib-to-your-project
Expand Down
Loading