Minimal version of the LAN pipeline for generating training data and training likelihood approximation networks.
We recommend using this pipeline with uv. Find installation instructions here.
# Clone and setup
git clone <repo-url>
cd LAN_pipeline_minimal
uv syncThis installs the required dependencies:
ssm-simulators- Data generationlanfactory- Network trainingmlflow- Experiment tracking
LAN_pipeline_minimal/
├── configs/
│ ├── examples/ # Production-ready configs
│ │ ├── data_generation.yaml
│ │ ├── network_training_lan.yaml
│ │ └── network_training_cpn.yaml
│ ├── quick_test/ # Fast testing configs (~1-2 min)
│ │ ├── data_generation.yaml
│ │ └── network_training.yaml
│ ├── legacy/ # Archived old workflow configs
│ └── README.md # Config documentation
├── sbatch_scripts/
│ ├── gen_sbatch.py # Main orchestrator script
│ ├── sample_*.sh # Example generated SBATCH scripts
│ └── legacy/ # Archived old sbatch scripts
├── local_test_run.sh # Local end-to-end test script
├── using_mlflow.md # MLflow integration guide
└── pyproject.toml # Dependencies (from GitHub main branches)
Run a quick end-to-end test locally (~2-3 min):
./local_test_run.shThis will:
- Generate test data with
ssm-simulators - Train a network with
lanfactory - Track everything in MLflow
The gen_sbatch.py script creates SBATCH scripts for Slurm clusters:
# View available commands
uv run python sbatch_scripts/gen_sbatch.py --help
uv run python sbatch_scripts/gen_sbatch.py generate --help
uv run python sbatch_scripts/gen_sbatch.py jaxtrain --help
uv run python sbatch_scripts/gen_sbatch.py torchtrain --help# Generate SBATCH script for data generation (Slurm cluster)
uv run python sbatch_scripts/gen_sbatch.py generate \
--config-path configs/examples/data_generation.yaml \
--output-path /path/to/output \
--n-jobs-in-array 10 \
--partition gpu
# Or run directly (local)
uv run generate \
--config-path configs/quick_test/data_generation.yaml \
--output ./data \
--n-files 5# Generate SBATCH script for training (Slurm cluster)
uv run python sbatch_scripts/gen_sbatch.py jaxtrain \
--config-path configs/examples/network_training_lan.yaml \
--output-path /path/to/networks \
--training-data-folder /path/to/data \
--data-generation-experiment-id <exp-id> # Links to data gen for lineage
--partition gpu
# Or run directly (local)
uv run jaxtrain \
--config-path configs/quick_test/network_training.yaml \
--training-data-folder ./data/ddm \
--networks-path-base ./networksSee configs/README.md for detailed documentation on configuration options.
MODEL: 'ddm'
GENERATOR_APPROACH: 'lan'
PIPELINE:
N_PARAMETER_SETS: 1000
N_SUBRUNS: 20
SIMULATOR:
N_SAMPLES: 20000
DELTA_T: 0.001
TRAINING:
N_SAMPLES_PER_PARAM: 2000
ESTIMATOR:
TYPE: 'kde'NETWORK_TYPE: "lan"
MODEL: "ddm"
N_EPOCHS: 20
LAYER_SIZES: [[100, 100, 100, 1]]
ACTIVATIONS: [['tanh', 'tanh', 'tanh']]
CPU_BATCH_SIZE: 1000
GPU_BATCH_SIZE: 50000
TRAINING_DATA_FOLDER: "" # Set via CLIThis pipeline includes MLflow integration for experiment tracking. See using_mlflow.md for details.
Key features:
- Automatic experiment organization by model name (
{model}-data-generation,{model}-training) - Data lineage tracking between generation and training via
--data-generation-experiment-id - Works with both local SQLite and remote MLflow servers
# View MLflow UI after running experiments
uv run mlflow ui --backend-store-uri sqlite:///mlflow.db
# Open http://localhost:5000This package pulls ssm-simulators and lanfactory from their GitHub main branches:
[tool.uv.sources]
lanfactory = { git = "https://github.com/lnccbrown/lanfactory", branch = "main" }
ssm-simulators = { git = "https://github.com/lnccbrown/ssm-simulators", branch = "main" }To update to the latest versions:
uv sync --refresh