allenai
diff --git a/‎README.md‎
Lines changed: 57 additions & 24 deletions b/‎README.md‎
Lines changed: 57 additions & 24 deletions
diff --git a/‎docs/DatasetConfig.md‎
Lines changed: 39 additions & 30 deletions b/‎docs/DatasetConfig.md‎
Lines changed: 39 additions & 30 deletions
diff --git a/‎docs/DatasetFormat.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/DatasetFormat.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/ModelConfig.md‎
Lines changed: 49 additions & 10 deletions b/‎docs/ModelConfig.md‎
Lines changed: 49 additions & 10 deletions
@@ -79,10 +79,12 @@ directory `/path/to/dataset` and corresponding configuration file at
                 "bands": ["R", "G", "B"]
             }],
             "data_source": {
-                "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
-                "index_cache_dir": "cache/sentinel2/",
-                "sort_by": "cloud_cover",
-                "use_rtree_index": false
+                "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
+                "init_args": {
+                    "index_cache_dir": "cache/sentinel2/",
+                    "sort_by": "cloud_cover",
+                    "use_rtree_index": false
+                }
             }
         }
     }
@@ -189,8 +191,10 @@ automate this process. Update the dataset `config.json` with a new layer:
         }],
         "resampling_method": "nearest",
         "data_source": {
-            "name": "rslearn.data_sources.local_files.LocalFiles",
-            "src_dir": "file:///path/to/world_cover_tifs/"
+            "class_path": "rslearn.data_sources.local_files.LocalFiles",
+            "init_args": {
+                "src_dir": "file:///path/to/world_cover_tifs/"
+            }
         }
     }
 },
@@ -252,8 +256,7 @@ model:
 data:
   class_path: rslearn.train.data_module.RslearnDataModule
   init_args:
-    # Replace this with the dataset path.
-    path: /path/to/dataset/
+    path: ${DATASET_PATH}
     # This defines the layers that should be read for each window.
     # The key ("image" / "targets") is what the data will be called in the model,
     # while the layers option specifies which layers will be read.
@@ -351,7 +354,9 @@ trainer:
       ...
     - class_path: rslearn.train.prediction_writer.RslearnWriter
       init_args:
-        path: /path/to/dataset/
+        # We need to include this argument, but it will be overridden with the dataset
+        # path from data.init_args.path.
+        path: placeholder
         output_layer: output
 ```
 
@@ -504,24 +509,43 @@ This will produce PNGs in the vis directory. The visualizations are produced by
 SegmentationTask and overriding the visualize function.
 
 
-### Logging to Weights & Biases
+### Checkpoint and Logging Management
+
+Above, we needed to configure the checkpoint directory in the model config (the
+`dirpath` option under `lightning.pytorch.callbacks.ModelCheckpoint`), and explicitly
+specify the checkpoint path when applying the model. Additionally, metrics are logged
+to the local filesystem and not well organized.
 
-We can log to W&B by setting the logger under trainer in the model configuration file:
+We can instead let rslearn automatically manage checkpoints, along with logging to
+Weights & Biases. To do so, we add project_name, run_name, and management_dir options
+to the model config. The project_name corresponds to the W&B project, and the run name
+corresponds to the W&B name. The management_dir is a directory to store project data;
+rslearn determines a per-project directory at `{management_dir}/{project_name}/{run_name}/`
+and uses it to store checkpoints.
 
 ```yaml
+model:
+  # ...
+data:
+  # ...
 trainer:
   # ...
-  logger:
-    class_path: lightning.pytorch.loggers.WandbLogger
-    init_args:
-      project: land_cover_model
-      name: version_00
+project_name: land_cover_model
+run_name: version_00
+# This sets the option via the MANAGEMENT_DIR environment variable.
+management_dir: ${MANAGEMENT_DIR}
 ```
 
-Now, runs with this model configuration should show on W&B. For `model fit` runs,
-the training and validation loss and accuracy metric will be logged. The accuracy
-metric is provided by SegmentationTask, and additional metrics can be enabled by
-passing the relevant init_args to the task, e.g. mean IoU and F1:
+Now, set the `MANAGEMENT_DIR` environment variable and run `model fit`:
+
+```
+export MANAGEMENT_DIR=./project_data
+rslearn model fit --config land_cover_model.yaml
+```
+
+The training and validation loss and accuracy metric should now be logged to W&B. The
+accuracy metric is provided by SegmentationTask, and additional metrics can be enabled
+by passing the relevant init_args to the task, e.g. mean IoU and F1:
 
 ```yaml
       class_path: rslearn.train.tasks.segmentation.SegmentationTask
@@ -532,6 +556,13 @@ passing the relevant init_args to the task, e.g. mean IoU and F1:
         enable_f1_metric: true
 ```
 
+When calling `model test` and `model predict` with management_dir set, rslearn will
+automatically load the best checkpoint from the project directory, or raise an error if
+no existing checkpoint exists. This behavior can be overridden with the
+`--load_checkpoint_mode` and `--load_checkpoint_required` options (see `--help` for
+details). Logging will be enabled during fit but not test/predict, and this can also
+be overridden, using `--log_mode`.
+
 
 ### Inputting Multiple Sentinel-2 Images
 
@@ -554,10 +585,12 @@ query_config section. This can replace the sentinel2 layer:
             "bands": ["R", "G", "B"]
         }],
         "data_source": {
-            "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
-            "index_cache_dir": "cache/sentinel2/",
-            "sort_by": "cloud_cover",
-            "use_rtree_index": false,
+            "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
+            "init_args": {
+              "index_cache_dir": "cache/sentinel2/",
+              "sort_by": "cloud_cover",
+              "use_rtree_index": false
+            },
             "query_config": {
                 "max_matches": 3
             }
 
@@ -83,10 +83,12 @@ duration of the layers is controlled by the duration of the window's time range.
         "bands": ["R", "G", "B"]
       }],
       "data_source": {
-        "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
-        "index_cache_dir": "cache/sentinel2/",
-        "sort_by": "cloud_cover",
-        "use_rtree_index": false
+        "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
+        "init_args": {
+          "index_cache_dir": "cache/sentinel2/",
+          "sort_by": "cloud_cover",
+          "use_rtree_index": false
+        }
       },
       "alias": "sentinel2"
     },
@@ -97,10 +99,12 @@ duration of the layers is controlled by the duration of the window's time range.
         "bands": ["R", "G", "B"]
       }],
       "data_source": {
-        "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
-        "index_cache_dir": "cache/sentinel2/",
-        "sort_by": "cloud_cover",
-        "use_rtree_index": false,
+        "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
+        "init_args": {
+          "index_cache_dir": "cache/sentinel2/",
+          "sort_by": "cloud_cover",
+          "use_rtree_index": false
+        },
         // The time offset is documented later.
         "time_offset": "60d"
       },
@@ -297,7 +301,7 @@ The data source specification looks like this:
 ```jsonc
 {
   // The class path of the data source.
-  "name": "rslearn.data_sources.gcp_public_data.Sentinel2",
+  "class_path": "rslearn.data_sources.gcp_public_data.Sentinel2",
   // The query configuration specifies how items should be matched to windows. It is
   // optional, and the values below are defaults.
   "query_config": {
@@ -314,9 +318,12 @@ The data source specification looks like this:
   "duration": null,
   // The ingest flag is optional, and defaults to true.
   "ingest": true,
-  // Data sources may expose additional configuration options. These would also be
-  // configured in this section.
-  // ...
+  // Data sources may expose additional configuration options, passed via init_args.
+  // class_path and init_args are handled by jsonargparse to instantiate the data
+  // source class.
+  "init_args": {
+    // ...
+  }
 }
 ```
 
@@ -886,29 +893,31 @@ attribute is "IW".
       }
     ],
     "data_source": {
-      "collection_name": "COPERNICUS/S1_GRD",
-      "dtype": "float32",
-      "filters": [
-        [
-          "transmitterReceiverPolarisation",
+      "class_path": "rslearn.data_sources.google_earth_engine.GEE",
+      "init_args": {
+        "collection_name": "COPERNICUS/S1_GRD",
+        "dtype": "float32",
+        "filters": [
+          [
+            "transmitterReceiverPolarisation",
+            [
+              "VV",
+              "VH"
+            ]
+          ],
           [
-            "VV",
-            "VH"
+            "instrumentMode",
+            "IW"
           ]
         ],
-        [
-          "instrumentMode",
-          "IW"
-        ]
-      ],
-      "gcs_bucket_name": "YOUR_BUCKET_NAME",
-      "index_fname": "cache/sentinel1_index",
-      "name": "rslearn.data_sources.google_earth_engine.GEE",
+        "gcs_bucket_name": "YOUR_BUCKET_NAME",
+        "index_fname": "cache/sentinel1_index",
+        "service_account_credentials": "/etc/credentials/gee_credentials.json",
+        "service_account_name": "YOUR_SERVICE_ACCOUNT_NAME"
+      },
       "query_config": {
         "max_matches": 1
-      },
-      "service_account_credentials": "/etc/credentials/gee_credentials.json",
-      "service_account_name": "YOUR_SERVICE_ACCOUNT_NAME"
+      }
     },
     "type": "raster"
   }
 
@@ -131,6 +131,9 @@ be merged/mosaicked together to form one raster or vector file for the window. I
 are multiple sub-lists, it typically corresponds to multi-temporal data, and each one
 will result in a different raster or vector file after the data is materialized.
 
+Materialization will use the first item group in `item_groups` to populate
+`layers/LAYER_NAME`, the second to populate `layers/LAYER_NAME.1`, and so on.
+
 For example, consider this query configuration for a data source
 (see [DatasetConfig](DatasetConfig.md) for details):
 
 
@@ -30,6 +30,10 @@ data:
     # other data related options
 trainer:
   # Lightning trainer options and callbacks.
+# Model management options.
+run_name: # ...
+project_name: # ...
+management_dir: ${MANAGEMENT_DIR}
 ```
 
 The YAML is parsed by jsonargparse, so each section directly configures a Python class
@@ -693,16 +697,51 @@ trainer:
         mode: max
         # We also keep the latest checkpoint.
         save_last: true
-  # The logger can be set to log to something other than the local filesystem, like
-  # W&B.
-  logger:
-    class_path: lightning.pytorch.loggers.WandbLogger
-    init_args:
-      # This is the W&B project name, and run name.
-      # You could set entity here as well, otherwise it will use the default based on
-      # the API key being used.
-      project: land_cover_model
-      name: version_00
+```
+
+## Model Management Options
+
+rslearn provides functionality to automatically manage checkpoints and logging. Without
+it, when running `model test` and `model predict`, the checkpoint needs to be
+explicitly specified using `--ckpt_path`.
+
+If enabled, model management will:
+1. Adjust the `dirpath` of any `ModelCheckpoint` callbacks to save checkpoints in
+   a project directory at `{management_dir}/{project_name}/{run_name}/`.
+2. If training is restarted, resume from the last checkpoint.
+3. During test/predict, automatically load the best checkpoint.
+4. Enable W&B logging and save the W&B run ID to the save project directory (so it can
+   be reused when resuming training).
+5. Save the model config with the W&B run.
+
+Common options are summarized below:
+
+```yaml
+# The management directory. Setting this (default null) enables model management. We
+# recommend setting it to ${MANAGEMENT_DIR} so that it can easily be changed in
+# different environments.
+management_dir: ${MANAGEMENT_DIR}
+# The project name; corresponds to the W&B project.
+project_name: my_project
+# The run name (a name for this experiment); corresponds to the W&B run.
+run_name: my_first_experiment
+# Optional description that will be added to the W&B run.
+run_description: this is my first experiment
+# Which checkpoint to load, if any (default 'auto').
+# 'none' never loads any checkpoint.
+# 'last' loads the most recent checkpoint.
+# 'best' loads the best checkpoint.
+# 'auto' will use 'last' during fit and 'best' during val/test/predict.
+load_checkpoint_mode: auto
+# Whether to fail if the expected checkpoint based on load_checkpoint_mode does not exist (default 'auto').
+# 'yes' will fail while 'no' won't.
+# 'auto' will use 'no' during fit and 'yes' during val/test/predict.
+load_checkpoint_required: auto
+# Whether to log to W&B (default 'auto').
+# 'yes' will enable logging.
+# 'no' will disable logging.
+# 'auto' will use 'yes' during fit and 'no' during val/test/predict.
+log_mode: auto
 ```
 
 ## Using Custom Classes