Skip to content

Commit 1e6c661

Browse files
DanielNobbepre-commit-ci[bot]ericspod
authored
8525 improve documentation on the datalist format (#8539)
Fixes #8525 . ### Description I found the description of the Medical Segmentation Decathlon datalist format (short: decathlon datalist) lacking, although some parts of the framework depend on it, specifically the Auto3DSeg AutoRunner. I've added a comprehensive description of the format under `monai.data.decathlon_datalist.load_decathlon_datalist`, and some small notes elsewhere. There's a corresponding PR for the tutorials [here](Project-MONAI/tutorials#2019). Please let me know if anything is incorrect, the codebase is quite big and I haven't been working with it for very long. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [x] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [x] In-line docstrings updated. - [x] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Daniël Nobbe <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Kerfoot <[email protected]>
1 parent cf86981 commit 1e6c661

File tree

2 files changed

+50
-6
lines changed

2 files changed

+50
-6
lines changed

monai/apps/auto3dseg/auto_runner.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,15 @@ class AutoRunner:
194194
├── segresnet2d_0 # network scripts/configs/checkpoints and pickle object of the algo
195195
└── swinunetr_0 # network scripts/configs/checkpoints and pickle object of the algo
196196
197+
198+
The input config requires at least the following keys:
199+
- ``modality``: the modality of the data, e.g. "ct", "mri", etc.
200+
- ``datalist``: the path to the datalist file in JSON format.
201+
- ``dataroot``: the root directory of the data files.
202+
203+
For the datalist file format, see the description under :py:func:`monai.data.load_decathlon_datalist`.
204+
Note that the AutoRunner will use the "validation" key in the datalist file if it exists, otherwise
205+
it will do cross-validation, by default with five folds (this is hardcoded).
197206
"""
198207

199208
analyze_params: dict | None

monai/data/decathlon_datalist.py

Lines changed: 41 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,42 @@ def load_decathlon_datalist(
9292
) -> list[dict]:
9393
"""Load image/label paths of decathlon challenge from JSON file
9494
95-
Json file is similar to what you get from http://medicaldecathlon.com/
96-
Those dataset.json files
95+
JSON file should follow the format of the Medical Segmentation Decathlon
96+
datalist.json files, see http://medicaldecathlon.com.
97+
The files are structured as follows:
98+
99+
.. code-block:: python
100+
101+
{
102+
"metadata_key_0": "metadata_value_0",
103+
"metadata_key_1": "metadata_value_1",
104+
...,
105+
"training": [
106+
{"image": "path/to/image_1.nii.gz", "label": "path/to/label_1.nii.gz"},
107+
{"image": "path/to/image_2.nii.gz", "label": "path/to/label_2.nii.gz"},
108+
...
109+
],
110+
"test": [
111+
"path/to/image_3.nii.gz",
112+
"path/to/image_4.nii.gz",
113+
...
114+
]
115+
}
116+
117+
118+
The metadata keys are optional for loading the datalist, but include:
119+
- some string items: ``name``, ``description``, ``reference``, ``licence``, ``release``, ``tensorImageSize``
120+
- two dict items: ``modality`` (keyed by channel index), and ``labels`` (keyed by label index)
121+
- and two integer items: ``numTraining`` and ``numTest``, with the number of items.
122+
123+
The ``training`` key contains a list of dictionaries, each of which has at least
124+
the ``image`` and ``label`` keys.
125+
The image and label are loaded by :py:func:`monai.transforms.LoadImaged`, so both can be either
126+
a single file path or a list of file paths, in which case they are loaded as multi-channel images.
127+
Each item can also include a ``fold`` key for cross-validation purposes.
128+
The "test" key contains a list of image paths, without labels, MONAI also supports a "validation" list
129+
with the same format as the "training" list.
130+
97131
98132
Args:
99133
data_list_file_path: the path to the json file of datalist.
@@ -107,11 +141,11 @@ def load_decathlon_datalist(
107141
108142
Returns a list of data items, each of which is a dict keyed by element names, for example:
109143
110-
.. code-block::
144+
.. code-block:: python
111145
112146
[
113-
{'image': '/workspace/data/chest_19.nii.gz', 'label': 0},
114-
{'image': '/workspace/data/chest_31.nii.gz', 'label': 1}
147+
{'image': '/workspace/data/chest_19.nii.gz', 'label': '/workspace/labels/chest_19.nii.gz'},
148+
{'image': '/workspace/data/chest_31.nii.gz', 'label': '/workspace/labels/chest_31.nii.gz'},
115149
]
116150
117151
"""
@@ -134,7 +168,8 @@ def load_decathlon_datalist(
134168

135169

136170
def load_decathlon_properties(data_property_file_path: PathLike, property_keys: Sequence[str] | str) -> dict:
137-
"""Load the properties from the JSON file contains data property with specified `property_keys`.
171+
"""Extract the properties with the specified keys from the Decathlon JSON file.
172+
See under `load_decathlon_datalist` for the expected keys in the Decathlon challenge.
138173
139174
Args:
140175
data_property_file_path: the path to the JSON file of data properties.

0 commit comments

Comments
 (0)