Skip to content

Commit 782dedc

Browse files
author
Suraj Pai
authored
Merge pull request #34 from project-lighter/docs
Update docs
2 parents 83327ac + 504642a commit 782dedc

File tree

21 files changed

+19912
-67
lines changed

21 files changed

+19912
-67
lines changed

docs/index.md

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,43 @@ This repository contains the code and resources for CT-FM, a 3D image-based pre-
1414
* **Task-Agnostic Training:** Enabling transferability across various radiological tasks.
1515
* **Open Source:** Model weights, data, and code are shared for collaborative development.
1616

17-
## Training
18-
CT-FM largely relies on the [lighter](https://github.com/project-lighter/lighter) package. Detailed descriptions on running the pretraining and downstream task specific finetuning can be found in the [replication guide](./replication-guide/data.md)
17+
<br/>
18+
<br/>
19+
20+
## Quick Links
21+
<div class="grid cards" markdown>
22+
23+
- __Downloading Data__
24+
25+
---
26+
27+
All datasets used in the study are public
28+
29+
[:octicons-arrow-right-24: Download data](./replication-guide/data.md)
30+
31+
- __Use CT-FM models__
32+
33+
---
34+
CT-FM feature extractors and trained downstream
35+
models are available on HF
36+
37+
38+
39+
[:octicons-arrow-right-24: Go to HF](https://huggingface.co/project-lighter)
40+
41+
- __Reproduce our pre-training framework__
42+
43+
---
44+
45+
Implement our pre-training method on your own data
46+
47+
[:octicons-arrow-right-24: Pretraining instructions](./replication-guide/pretraining.md)
48+
49+
- __Build your projects using Lighter__
50+
51+
---
52+
53+
Almost all CT-FM experiments use Lighter as the configuration system
54+
[:octicons-arrow-right-24: Explore here](https://github.com/project-lighter/lighter)
55+
56+
</div>

docs/js/sh-annotation.js

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
// this script is used to remove extra leading space when annotating shell code blocks ending with `\`
2+
// character. See https://github.com/squidfunk/mkdocs-material/issues/3846 for more info.
3+
document$.subscribe(() => {
4+
const tags = document.querySelectorAll("code .se")
5+
tags.forEach(tag => {
6+
if (tag.innerText.startsWith("\\")) {
7+
tag.innerText = "\\"
8+
}
9+
})
10+
})

docs/replication-guide/analysis.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<div class="grid cards" markdown>
2+
3+
- **Whole Body Segmentation Analysis:**
4+
[:octicons-arrow-right-24: totalseg_eval.ipynb](https://github.com/project-lighter/CT-FM/tree/main/notebooks/totalseg_eval.ipynb)
5+
6+
- **Tumor Segmentation Analysis and Visualization:**
7+
[:octicons-arrow-right-24: tumor-seg-eval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/tumor-seg-eval)
8+
9+
- **Head CT Triage Classification:**
10+
[:octicons-arrow-right-24: head-ct-triage-eval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/head-ct-triage-eval)
11+
12+
- **Medical Image Retrieval:**
13+
[:octicons-arrow-right-24: retrieval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/retrieval)
14+
15+
- **Semantic Evaluation - Anatomical Clustering, Semantic Search, PCA visualization:**
16+
[:octicons-arrow-right-24: semantic-eval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/semantic-eval)
17+
18+
- **Robustness - Saliency and Stability:**
19+
[:octicons-arrow-right-24: robustness](https://github.com/project-lighter/CT-FM/tree/main/notebooks/robustness)
20+
</div>
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Baselines :material-chart-box-outline:
2+
3+
The selection of baselines varies depending on the evaluation task. Below is a breakdown by task:
4+
5+
## Whole Body Segmentation
6+
7+
- **Architectural Baseline:** Randomly initialized model.
8+
- **SuPREM**
9+
- **Merlin**
10+
- **VISTA3D, Auto3DSeg, nnUNet:** Results reported in previous studies.
11+
12+
## Tumor Segmentation
13+
14+
- **Auto3DSeg Pipeline**
15+
16+
## Head CT Triage
17+
18+
- **Architectural Baseline:** Randomly initialized model.
19+
- **SuPREM**
20+
21+
All baselines—as well as our methods—are implemented using lighter. For detailed configuration scripts and execution instructions, please refer to the [Downstream Tasks](./downstream.md) section.

docs/replication-guide/data.md

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ For our pre-training experiments, we utilize 148,394 CT scans from the Imaging D
1010

1111
Execute the provided SQL query on Google BigQuery to filter for CT scans that meet our quality constraints. The query performs necessary quality checks on each scan.
1212

13-
- Query file: [query.sql](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/query.sql)
13+
[Query file](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/query.sql){ .md-button }
1414

1515
Running this query returns a table with CT scan records that satisfy our criteria. We then convert these query results to a manifest file that can be used to download the data
1616

@@ -19,7 +19,7 @@ This has already been done so you can skip to the next step if you don't want to
1919

2020
After reviewing the query results, use the Jupyter Notebook to create a manifest file. This manifest lists every DICOM file that needs to be downloaded.
2121

22-
- Manifest creation notebook: [prepare_download.ipynb](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/prepare_download.ipynb)
22+
[Manifest creation notebook](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/prepare_download.ipynb){ .md-button }
2323

2424
### 3. Download the DICOM Files
2525

@@ -42,23 +42,47 @@ This command downloads all the specified DICOM files into the designated directo
4242
The downloaded data is in DICOM format. To prepare it for your experiments, follow these steps:
4343

4444
- **Sorting:** Organize the DICOM files using the tool "dicomsort". While the specific usage may depend on your environment, a common workflow involves running a command to categorize files by patient or study. For example, you might first list the files and then run:
45-
46-
```
47-
dicomsort [options...] sourceDir targetDir/<patterns>
48-
```
49-
50-
For more detailed instructions and options, please refer to the [thedicomsort GitHub repository](https://github.com/pieper/dicomsort)
45+
46+
```
47+
dicomsort [options...] sourceDir targetDir/<patterns>
48+
```
49+
50+
For more detailed instructions and options, please refer to the [thedicomsort GitHub repository](https://github.com/pieper/dicomsort)
5151
5252
- **Conversion:** Convert the sorted DICOM files to NRRD format using Plastimatch. A typical command looks similar to:
53-
54-
```
55-
plastimatch convert --input <SORTED_DIR> --output <CONVERTED_DIR> --format nrrd
56-
```
57-
58-
For additional details and advanced options, consult the [Plastimatch documentation](http://plastimatch.org) or relevant online resources.
53+
54+
```
55+
plastimatch convert --input <SORTED_DIR> --output <CONVERTED_DIR> --format nrrd
56+
```
57+
58+
For additional details and advanced options, consult the [Plastimatch documentation](http://plastimatch.org) or relevant online resources.
5959
6060
- **Packaging:** Finally, generate a `.pkl` file that lists the scans. This file serves as the required input for the pre-training experiments.
6161
6262
For a complete example of these final steps, refer again to the [prepare_download.ipynb](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/prepare_download.ipynb) notebook.
6363
6464
Following these instructions will replicate the data download and preprocessing pipeline used in our study, enabling you to work with the same CT scan dataset.
65+
66+
67+
## Downstream Tasks Data
68+
69+
We use several publicly available datasets for our downstream tasks, including:
70+
<div class="grid cards" markdown>
71+
72+
- **Whole Body Segmentation:**
73+
[:octicons-arrow-right-24: TotalSegmentator-v2 dataset](https://zenodo.org/records/8367088)
74+
75+
- **Tumor Segmentation:**
76+
[:octicons-arrow-right-24: MSD dataset](http://medicaldecathlon.com/dataaws/)
77+
78+
- **Head CT Triage:**
79+
[:octicons-arrow-right-24: SinoCT](https://stanfordaimi.azurewebsites.net/datasets?domain=HEAD%2FBRAIN%2FNECK)
80+
[:octicons-arrow-right-24: CQ500](https://academictorrents.com/details/47e9d8aab761e75fd0a81982fa62bddf3a173831)
81+
82+
- **Medical Image Retrieval:**
83+
[:octicons-arrow-right-24: 3D-MIR](http://medicaldecathlon.com/dataaws/)
84+
[:octicons-arrow-right-24: OrganMNIST-3D](https://zenodo.org/records/10519652)
85+
86+
- **Stability Testing:**
87+
[:octicons-arrow-right-24: RIDER](https://www.cancerimagingarchive.net/collection/rider-lung-ct/)
88+
</div>
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Downstream Task Adaptation
2+
3+
Our pre-trained CT-FM model has been adapted to three fine-tuned downstream tasks as well as several additional zero‐shot tasks. While most downstream experiments leverage the Lighter framework, tumor segmentation is handled using Auto3DSeg.
4+
5+
## Whole Body Segmentation
6+
7+
In line with the configuration-based approach detailed in [Pretraining](./pretraining.md), we provide YAML config files for downstream adaptation. To facilitate thorough comparisons, a suite of shell scripts with the relevant configuration components is available. These can be found in the [evaluation](https://github.com/project-lighter/CT-FM/tree/main/evaluation) directory under “scripts.”
8+
9+
[View All Scripts](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts){.md-button}
10+
11+
<br/>
12+
For TotalSeg experiments, refer to the scripts in the totalseg folder:
13+
<div class="grid cards" markdown>
14+
15+
- **Full Finetuning on TotalSegmentatorV2:**
16+
[:octicons-arrow-right-24: fulltune.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/fulltune.sh)
17+
18+
- **Finetuning on the Merlin Split:**
19+
[:octicons-arrow-right-24: merlin.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/merlin.sh)
20+
21+
- **Few-Shot Fine-Tuning:**
22+
[:octicons-arrow-right-24: fewshot.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/fewshot.sh)
23+
24+
- **Pre-Training Checkpoint Selection:**
25+
[:octicons-arrow-right-24: checkpoint_selection.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/checkpoint_selection.sh)
26+
27+
- **Pre-Training Ablations:**
28+
[:octicons-arrow-right-24: pretraining_evaluation.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/pretraining_evaluation.sh)
29+
</div>
30+
31+
!!! tip "Enabling Prediction Mode"
32+
33+
To switch from training to prediction mode:
34+
- Replace the `fit` command with the `predict` command.
35+
- Append the prediction override configuration file `./evaluation/overrides/totalseg_predict_overrides.yaml` to your config list.
36+
- Remove the `--trainer#callbacks#0#until_epoch=0` flag since the new callback now handles prediction mode.
37+
38+
**Example Transformation:**
39+
40+
Original command:
41+
```
42+
lighter fit --config=./evaluation/totalseg.yaml,./evaluation/overrides/totalseg_vista.yaml,./evaluation/baselines/segresnetds_ctfm.yaml --trainer#callbacks#0#until_epoch=0 --vars#name="ct_fm" --vars#project="totalseg" --system#model#trunk#ckpt_path=$ct_fm_path --vars#wandb_group='vista_v2'
43+
```
44+
45+
Modified prediction command:
46+
```
47+
lighter predict --config=./evaluation/totalseg.yaml,./evaluation/overrides/totalseg_vista.yaml,./evaluation/baselines/segresnetds_ctfm.yaml,./evaluation/overrides/totalseg_predict_overrides.yaml --vars#name="ct_fm" --vars#project="totalseg" --vars#wandb_group='vista_v2'
48+
```
49+
50+
By default the predict command uses the checkpoint location mentioned while running the fit pipeline.
51+
If you have a different checkpoint location, to override the model checkpoint directory during prediction, add:
52+
```
53+
--args#predict#ckpt_path=<path>
54+
```
55+
56+
## Tumor Segmentation with Auto3DSeg
57+
58+
Tumor segmentation is performed using Auto3DSeg—a robust segmentation workflow provided by MONAI. This pipeline is designed to simplify segmentation tasks and can be explored further in the official link below
59+
60+
[MONAI Auto3DSeg Tutorial](https://github.com/Project-MONAI/tutorials/blob/main/auto3dseg/README.md){.md-button}
61+
62+
### Workflow Overview
63+
64+
Auto3DSeg operates by running an AutoRunner that takes a configuration file (typically named task.yaml) as input. This file contains all the necessary parameters to handle preprocessing, training, and validation stages of your segmentation task.
65+
66+
### Model Details
67+
68+
Our experiments focus on the segresnet_0 model variant, which is set up for single-fold training and validation. We run the baseline model using the default Auto3DSeg configuration. However, when integrating our CT-FM model into the pipeline, we make the following two key modifications:
69+
70+
- **Orientation Adjustment:**
71+
We change the default image orientation by setting the axcodes to `SPL`.
72+
73+
- **Checkpoint Specification:**
74+
The path to the pre-trained model checkpoint is provided via the `ckpt_path` field in the hyper_parameters.yaml file.
75+
76+
These adjustments allow us to directly benchmark the effectiveness of the pre-trained CT-FM model within the Auto3DSeg pipeline without necessitating major changes to the existing workflow.
77+
78+
!!! tip "Customizing Your Pipeline"
79+
By simply modifying the orientation and specifying the checkpoint path, you can leverage the power of pre-trained models in the Auto3DSeg setup. This makes it easy to compare different configurations and accelerate your experimentation process.
80+
81+
82+
## Head Triage CT classification
83+
:material-progress-clock: Coming soon...
84+
85+
86+
<br/>
87+
<br/>
88+
89+
!!! tip "Zero-shot evaluation"
90+
All the zero shot eval can be found on the [reproduce analysis page](./analysis.md)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Extracting Features and Predictions
2+
3+
Our CT-FM models are available via a pip package API hosted on Hugging Face. This streamlined API enables you to effortlessly extract features and generate predictions for various radiological tasks.
4+
5+
Begin by installing the lighter-zoo package:
6+
7+
```bash
8+
pip install lighter-zoo
9+
```
10+
11+
!!! tip "Quick Start"
12+
For detailed examples and further guidance, visit our [Project Lighter page on Hugging Face](https://huggingface.co/project-lighter) [^1].
13+
14+
## Available Models
15+
16+
<div class="grid cards" markdown>
17+
18+
- **[:octicons-arrow-right-24: project-lighter/ct_fm_feature_extractor](https://huggingface.co/project-lighter/ct_fm_feature_extractor)**
19+
Extract deep features efficiently from CT scans.
20+
21+
- **[:octicons-arrow-right-24: project-lighter/whole_body_segmentation](https://huggingface.co/project-lighter/whole_body_segmentation)**
22+
Generate comprehensive segmentation maps for radiological analysis.
23+
</div>
24+
25+
[^1]: Information adapted from [Hugging Face Project Lighter](https://huggingface.co/project-lighter).

0 commit comments

Comments
 (0)