LAN_pipeline_minimal

Minimal version of the LAN pipeline for generating training data and training likelihood approximation networks.

Installation

We recommend using this pipeline with uv. Find installation instructions here.

# Clone and setup
git clone <repo-url>
cd LAN_pipeline_minimal
uv sync

This installs the required dependencies:

ssm-simulators - Data generation
lanfactory - Network training
mlflow - Experiment tracking

Project Structure

LAN_pipeline_minimal/
├── configs/
│   ├── examples/           # Production-ready configs
│   │   ├── data_generation.yaml
│   │   ├── network_training_lan.yaml
│   │   └── network_training_cpn.yaml
│   ├── quick_test/         # Fast testing configs (~1-2 min)
│   │   ├── data_generation.yaml
│   │   └── network_training.yaml
│   ├── legacy/             # Archived old workflow configs
│   └── README.md           # Config documentation
├── sbatch_scripts/
│   ├── gen_sbatch.py       # Main orchestrator script
│   ├── sample_*.sh         # Example generated SBATCH scripts
│   └── legacy/             # Archived old sbatch scripts
├── local_test_run.sh       # Local end-to-end test script
├── using_mlflow.md         # MLflow integration guide
└── pyproject.toml          # Dependencies (from GitHub main branches)

Quick Start

Local Testing

Run a quick end-to-end test locally (~2-3 min):

./local_test_run.sh

This will:

Generate test data with ssm-simulators
Train a network with lanfactory
Track everything in MLflow

Generate SBATCH Scripts

The gen_sbatch.py script creates SBATCH scripts for Slurm clusters:

# View available commands
uv run python sbatch_scripts/gen_sbatch.py --help
uv run python sbatch_scripts/gen_sbatch.py generate --help
uv run python sbatch_scripts/gen_sbatch.py jaxtrain --help
uv run python sbatch_scripts/gen_sbatch.py torchtrain --help

Usage

Data Generation

# Generate SBATCH script for data generation (Slurm cluster)
uv run python sbatch_scripts/gen_sbatch.py generate \
    --config-path configs/examples/data_generation.yaml \
    --output-path /path/to/output \
    --n-jobs-in-array 10 \
    --partition gpu

# Or run directly (local)
uv run generate \
    --config-path configs/quick_test/data_generation.yaml \
    --output ./data \
    --n-files 5

Network Training

# Generate SBATCH script for training (Slurm cluster)
uv run python sbatch_scripts/gen_sbatch.py jaxtrain \
    --config-path configs/examples/network_training_lan.yaml \
    --output-path /path/to/networks \
    --training-data-folder /path/to/data \
    --data-generation-experiment-id <exp-id>  # Links to data gen for lineage
    --partition gpu

# Or run directly (local)
uv run jaxtrain \
    --config-path configs/quick_test/network_training.yaml \
    --training-data-folder ./data/ddm \
    --networks-path-base ./networks

Configuration Files

See configs/README.md for detailed documentation on configuration options.

Data Generation Config

MODEL: 'ddm'
GENERATOR_APPROACH: 'lan'

PIPELINE:
  N_PARAMETER_SETS: 1000
  N_SUBRUNS: 20

SIMULATOR:
  N_SAMPLES: 20000
  DELTA_T: 0.001

TRAINING:
  N_SAMPLES_PER_PARAM: 2000

ESTIMATOR:
  TYPE: 'kde'

Network Training Config

NETWORK_TYPE: "lan"
MODEL: "ddm"
N_EPOCHS: 20
LAYER_SIZES: [[100, 100, 100, 1]]
ACTIVATIONS: [['tanh', 'tanh', 'tanh']]
CPU_BATCH_SIZE: 1000
GPU_BATCH_SIZE: 50000
TRAINING_DATA_FOLDER: ""  # Set via CLI

MLflow Integration

This pipeline includes MLflow integration for experiment tracking. See using_mlflow.md for details.

Key features:

Automatic experiment organization by model name ({model}-data-generation, {model}-training)
Data lineage tracking between generation and training via --data-generation-experiment-id
Works with both local SQLite and remote MLflow servers

# View MLflow UI after running experiments
uv run mlflow ui --backend-store-uri sqlite:///mlflow.db
# Open http://localhost:5000

Dependencies

This package pulls ssm-simulators and lanfactory from their GitHub main branches:

[tool.uv.sources]
lanfactory = { git = "https://github.com/lnccbrown/lanfactory", branch = "main" }
ssm-simulators = { git = "https://github.com/lnccbrown/ssm-simulators", branch = "main" }

To update to the latest versions:

uv sync --refresh

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
configs		configs
sbatch_scripts		sbatch_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
local_test_run.sh		local_test_run.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
using_mlflow.md		using_mlflow.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAN_pipeline_minimal

Installation

Project Structure

Quick Start

Local Testing

Generate SBATCH Scripts

Usage

Data Generation

Network Training

Configuration Files

Data Generation Config

Network Training Config

MLflow Integration

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

lnccbrown/LAN_pipeline_minimal

Folders and files

Latest commit

History

Repository files navigation

LAN_pipeline_minimal

Installation

Project Structure

Quick Start

Local Testing

Generate SBATCH Scripts

Usage

Data Generation

Network Training

Configuration Files

Data Generation Config

Network Training Config

MLflow Integration

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages