project-lighter
diff --git a/‎docs/index.md‎
Lines changed: 40 additions & 2 deletions b/‎docs/index.md‎
Lines changed: 40 additions & 2 deletions
diff --git a/‎docs/js/sh-annotation.js‎
Lines changed: 10 additions & 0 deletions b/‎docs/js/sh-annotation.js‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎docs/replication-guide/analysis.md‎
Lines changed: 20 additions & 0 deletions b/‎docs/replication-guide/analysis.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/replication-guide/baselines.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/replication-guide/baselines.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/replication-guide/data.md‎
Lines changed: 38 additions & 14 deletions b/‎docs/replication-guide/data.md‎
Lines changed: 38 additions & 14 deletions
diff --git a/‎docs/replication-guide/downstream.md‎
Lines changed: 90 additions & 0 deletions b/‎docs/replication-guide/downstream.md‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎docs/replication-guide/inference.md‎
Lines changed: 25 additions & 0 deletions b/‎docs/replication-guide/inference.md‎
Lines changed: 25 additions & 0 deletions
@@ -14,5 +14,43 @@ This repository contains the code and resources for CT-FM, a 3D image-based pre-
 *   **Task-Agnostic Training:** Enabling transferability across various radiological tasks.
 *   **Open Source:** Model weights, data, and code are shared for collaborative development.
 
-## Training
-CT-FM largely relies on the [lighter](https://github.com/project-lighter/lighter) package. Detailed descriptions on running the pretraining and downstream task specific finetuning can be found in the [replication guide](./replication-guide/data.md)
+<br/>
+<br/>
+
+## Quick Links
+<div class="grid cards" markdown>
+
+-   __Downloading Data__
+
+    ---
+
+    All datasets used in the study are public
+
+    [:octicons-arrow-right-24: Download data](./replication-guide/data.md)
+
+-    __Use CT-FM models__
+
+    ---
+    CT-FM feature extractors and trained downstream 
+    models are available on HF
+
+    
+
+    [:octicons-arrow-right-24: Go to HF](https://huggingface.co/project-lighter)
+
+-   __Reproduce our pre-training framework__
+
+    ---
+
+    Implement our pre-training method on your own data
+
+    [:octicons-arrow-right-24: Pretraining instructions](./replication-guide/pretraining.md)
+
+-   __Build your projects using Lighter__
+
+    ---
+
+    Almost all CT-FM experiments use Lighter as the configuration system
+    [:octicons-arrow-right-24: Explore here](https://github.com/project-lighter/lighter)
+
+</div>
@@ -0,0 +1,10 @@
+// this script is used to remove extra leading space when annotating shell code blocks ending with `\`
+// character. See https://github.com/squidfunk/mkdocs-material/issues/3846 for more info.
+document$.subscribe(() => {
+    const tags = document.querySelectorAll("code .se")
+    tags.forEach(tag => {
+        if (tag.innerText.startsWith("\\")) {
+            tag.innerText = "\\"
+        }
+    })
+})
@@ -0,0 +1,20 @@
+<div class="grid cards" markdown>
+
+- **Whole Body Segmentation Analysis:**  
+  [:octicons-arrow-right-24: totalseg_eval.ipynb](https://github.com/project-lighter/CT-FM/tree/main/notebooks/totalseg_eval.ipynb)
+
+- **Tumor Segmentation Analysis and Visualization:**  
+  [:octicons-arrow-right-24:  tumor-seg-eval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/tumor-seg-eval)
+
+- **Head CT Triage Classification:**  
+  [:octicons-arrow-right-24:  head-ct-triage-eval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/head-ct-triage-eval)
+
+- **Medical Image Retrieval:**  
+  [:octicons-arrow-right-24:  retrieval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/retrieval)
+
+- **Semantic Evaluation - Anatomical Clustering, Semantic Search, PCA visualization:**  
+  [:octicons-arrow-right-24:  semantic-eval](https://github.com/project-lighter/CT-FM/tree/main/notebooks/semantic-eval)
+
+- **Robustness - Saliency and Stability:**  
+  [:octicons-arrow-right-24:  robustness](https://github.com/project-lighter/CT-FM/tree/main/notebooks/robustness)
+</div>
@@ -0,0 +1,21 @@
+# Baselines :material-chart-box-outline:
+
+The selection of baselines varies depending on the evaluation task. Below is a breakdown by task:
+
+## Whole Body Segmentation
+
+- **Architectural Baseline:** Randomly initialized model.
+- **SuPREM**
+- **Merlin**
+- **VISTA3D, Auto3DSeg, nnUNet:** Results reported in previous studies.
+
+## Tumor Segmentation
+
+- **Auto3DSeg Pipeline**
+
+## Head CT Triage
+
+- **Architectural Baseline:** Randomly initialized model.
+- **SuPREM**
+
+All baselines—as well as our methods—are implemented using lighter. For detailed configuration scripts and execution instructions, please refer to the [Downstream Tasks](./downstream.md) section.
@@ -10,7 +10,7 @@ For our pre-training experiments, we utilize 148,394 CT scans from the Imaging D
 
 Execute the provided SQL query on Google BigQuery to filter for CT scans that meet our quality constraints. The query performs necessary quality checks on each scan.
 
-- Query file: [query.sql](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/query.sql)
+[Query file](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/query.sql){ .md-button }
 
 Running this query returns a table with CT scan records that satisfy our criteria. We then convert these query results to a manifest file that can be used to download the data
 
@@ -19,7 +19,7 @@ This has already been done so you can skip to the next step if you don't want to
 
 After reviewing the query results, use the Jupyter Notebook to create a manifest file. This manifest lists every DICOM file that needs to be downloaded.
 
-- Manifest creation notebook: [prepare_download.ipynb](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/prepare_download.ipynb)
+[Manifest creation notebook](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/prepare_download.ipynb){ .md-button }
 
 ### 3. Download the DICOM Files
 
@@ -42,23 +42,47 @@ This command downloads all the specified DICOM files into the designated directo
 The downloaded data is in DICOM format. To prepare it for your experiments, follow these steps:
 
 - **Sorting:** Organize the DICOM files using the tool "dicomsort". While the specific usage may depend on your environment, a common workflow involves running a command to categorize files by patient or study. For example, you might first list the files and then run:
-  
-  ```
-  dicomsort [options...] sourceDir targetDir/<patterns>
-  ```
-  
-  For more detailed instructions and options, please refer to the [thedicomsort GitHub repository](https://github.com/pieper/dicomsort) 
+    
+    ```
+    dicomsort [options...] sourceDir targetDir/<patterns>
+    ```
+    
+    For more detailed instructions and options, please refer to the [thedicomsort GitHub repository](https://github.com/pieper/dicomsort) 
 
 - **Conversion:** Convert the sorted DICOM files to NRRD format using Plastimatch. A typical command looks similar to:
-  
-  ```
-  plastimatch convert --input <SORTED_DIR> --output <CONVERTED_DIR> --format nrrd
-  ```
-  
-  For additional details and advanced options, consult the [Plastimatch documentation](http://plastimatch.org) or relevant online resources.
+    
+    ```
+    plastimatch convert --input <SORTED_DIR> --output <CONVERTED_DIR> --format nrrd
+    ```
+    
+    For additional details and advanced options, consult the [Plastimatch documentation](http://plastimatch.org) or relevant online resources.
 
 - **Packaging:** Finally, generate a `.pkl` file that lists the scans. This file serves as the required input for the pre-training experiments.
 
 For a complete example of these final steps, refer again to the [prepare_download.ipynb](https://github.com/project-lighter/CT-FM/tree/main/notebooks/data-download/prepare_download.ipynb) notebook.
 
 Following these instructions will replicate the data download and preprocessing pipeline used in our study, enabling you to work with the same CT scan dataset.
+
+
+## Downstream Tasks Data
+
+We use several publicly available datasets for our downstream tasks, including:
+<div class="grid cards" markdown>
+
+- **Whole Body Segmentation:**  
+  [:octicons-arrow-right-24: TotalSegmentator-v2 dataset](https://zenodo.org/records/8367088)
+
+- **Tumor Segmentation:**  
+  [:octicons-arrow-right-24: MSD dataset](http://medicaldecathlon.com/dataaws/)
+
+- **Head CT Triage:**  
+  [:octicons-arrow-right-24: SinoCT](https://stanfordaimi.azurewebsites.net/datasets?domain=HEAD%2FBRAIN%2FNECK)  
+  [:octicons-arrow-right-24: CQ500](https://academictorrents.com/details/47e9d8aab761e75fd0a81982fa62bddf3a173831)
+
+- **Medical Image Retrieval:**  
+  [:octicons-arrow-right-24: 3D-MIR](http://medicaldecathlon.com/dataaws/)  
+  [:octicons-arrow-right-24: OrganMNIST-3D](https://zenodo.org/records/10519652)
+
+- **Stability Testing:**  
+  [:octicons-arrow-right-24: RIDER](https://www.cancerimagingarchive.net/collection/rider-lung-ct/)
+</div>
@@ -0,0 +1,90 @@
+# Downstream Task Adaptation
+
+Our pre-trained CT-FM model has been adapted to three fine-tuned downstream tasks as well as several additional zero‐shot tasks. While most downstream experiments leverage the Lighter framework, tumor segmentation is handled using Auto3DSeg.
+
+## Whole Body Segmentation
+
+In line with the configuration-based approach detailed in [Pretraining](./pretraining.md), we provide YAML config files for downstream adaptation. To facilitate thorough comparisons, a suite of shell scripts with the relevant configuration components is available. These can be found in the [evaluation](https://github.com/project-lighter/CT-FM/tree/main/evaluation) directory under “scripts.”
+
+[View All Scripts](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts){.md-button}
+
+<br/>
+For TotalSeg experiments, refer to the scripts in the totalseg folder:
+<div class="grid cards" markdown>
+
+- **Full Finetuning on TotalSegmentatorV2:**  
+  [:octicons-arrow-right-24: fulltune.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/fulltune.sh)
+
+- **Finetuning on the Merlin Split:**  
+  [:octicons-arrow-right-24: merlin.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/merlin.sh)
+
+- **Few-Shot Fine-Tuning:**  
+  [:octicons-arrow-right-24: fewshot.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/fewshot.sh)
+
+- **Pre-Training Checkpoint Selection:**  
+  [:octicons-arrow-right-24: checkpoint_selection.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/checkpoint_selection.sh)
+
+- **Pre-Training Ablations:**  
+  [:octicons-arrow-right-24: pretraining_evaluation.sh](https://github.com/project-lighter/CT-FM/tree/main/evaluation/scripts/totalseg/pretraining_evaluation.sh)
+</div>
+
+!!! tip "Enabling Prediction Mode"
+
+    To switch from training to prediction mode:
+    - Replace the `fit` command with the `predict` command.
+    - Append the prediction override configuration file `./evaluation/overrides/totalseg_predict_overrides.yaml` to your config list.
+    - Remove the `--trainer#callbacks#0#until_epoch=0` flag since the new callback now handles prediction mode.
+
+    **Example Transformation:**
+
+    Original command:
+    ```
+    lighter fit --config=./evaluation/totalseg.yaml,./evaluation/overrides/totalseg_vista.yaml,./evaluation/baselines/segresnetds_ctfm.yaml --trainer#callbacks#0#until_epoch=0 --vars#name="ct_fm" --vars#project="totalseg" --system#model#trunk#ckpt_path=$ct_fm_path --vars#wandb_group='vista_v2'
+    ```
+
+    Modified prediction command:
+    ```
+    lighter predict --config=./evaluation/totalseg.yaml,./evaluation/overrides/totalseg_vista.yaml,./evaluation/baselines/segresnetds_ctfm.yaml,./evaluation/overrides/totalseg_predict_overrides.yaml --vars#name="ct_fm" --vars#project="totalseg" --vars#wandb_group='vista_v2'
+    ```
+
+    By default the predict command uses the checkpoint location mentioned while running the fit pipeline.
+    If you have a different checkpoint location, to override the model checkpoint directory during prediction, add:
+    ```
+    --args#predict#ckpt_path=<path>
+    ```
+
+## Tumor Segmentation with Auto3DSeg
+
+Tumor segmentation is performed using Auto3DSeg—a robust segmentation workflow provided by MONAI. This pipeline is designed to simplify segmentation tasks and can be explored further in the official link below
+
+[MONAI Auto3DSeg Tutorial](https://github.com/Project-MONAI/tutorials/blob/main/auto3dseg/README.md){.md-button}
+
+### Workflow Overview
+
+Auto3DSeg operates by running an AutoRunner that takes a configuration file (typically named task.yaml) as input. This file contains all the necessary parameters to handle preprocessing, training, and validation stages of your segmentation task.
+
+### Model Details
+
+Our experiments focus on the segresnet_0 model variant, which is set up for single-fold training and validation. We run the baseline model using the default Auto3DSeg configuration. However, when integrating our CT-FM model into the pipeline, we make the following two key modifications:
+
+- **Orientation Adjustment:**  
+  We change the default image orientation by setting the axcodes to `SPL`.
+  
+- **Checkpoint Specification:**  
+  The path to the pre-trained model checkpoint is provided via the `ckpt_path` field in the hyper_parameters.yaml file.
+
+These adjustments allow us to directly benchmark the effectiveness of the pre-trained CT-FM model within the Auto3DSeg pipeline without necessitating major changes to the existing workflow.
+
+!!! tip "Customizing Your Pipeline"
+    By simply modifying the orientation and specifying the checkpoint path, you can leverage the power of pre-trained models in the Auto3DSeg setup. This makes it easy to compare different configurations and accelerate your experimentation process.
+
+
+## Head Triage CT classification
+:material-progress-clock: Coming soon...
+
+
+<br/>
+<br/>
+
+!!! tip "Zero-shot evaluation"
+    All the zero shot eval can be found on the [reproduce analysis page](./analysis.md)
@@ -0,0 +1,25 @@
+# Extracting Features and Predictions
+
+Our CT-FM models are available via a pip package API hosted on Hugging Face. This streamlined API enables you to effortlessly extract features and generate predictions for various radiological tasks.
+
+Begin by installing the lighter-zoo package:
+
+```bash
+pip install lighter-zoo
+```
+
+!!! tip "Quick Start"
+    For detailed examples and further guidance, visit our [Project Lighter page on Hugging Face](https://huggingface.co/project-lighter) [^1].
+
+## Available Models
+
+<div class="grid cards" markdown>
+
+- **[:octicons-arrow-right-24: project-lighter/ct_fm_feature_extractor](https://huggingface.co/project-lighter/ct_fm_feature_extractor)**  
+  Extract deep features efficiently from CT scans.
+
+- **[:octicons-arrow-right-24: project-lighter/whole_body_segmentation](https://huggingface.co/project-lighter/whole_body_segmentation)**  
+  Generate comprehensive segmentation maps for radiological analysis.
+</div>
+
+[^1]: Information adapted from [Hugging Face Project Lighter](https://huggingface.co/project-lighter).