Skip to content

sambklein/tabular_diffusion

Repository files navigation

tabular_diffusion

python pytorch lightning hydra snakemake

This project was generated by the RODEM machine learning template, setting it up with the standard machine learning frameworks PyTorch and Lightning, as well as configuration and workflow management through Hydra and Snakemake. Additionally, this repository relies on MLTools, which is used for certain helper functions.

Usage

After cloning the repo to any HPC cluster, start by setting up the workflow dependencies by running

module load GCCcore/12.3.0 Python/3.11.3
pip install --upgrade pip
pip install -r workflow_requirements.txt

We load python 3.11.3 here (specific to Baobab), to make sure that we have a newer python version, as needed for some of the packages we want to use.

Next, make your WandB API key available through

printf "enter key: "; read WANDB_API_KEY; export WANDB_API_KEY

and configure your user specific details by first copying the defaults

cp config/common/default.yaml config/common/private.yaml

and then filling in the respective values in config/common/private.yaml.

You also need to pull the latest container from the gitlab registry by running

invoke container-pull --login

Now we can launch experiments with

invoke experiment-run <experiment_name>

where <experiment_name> is used to identify the current run/experiment and to define the overall output directory, which will be <experiment_base_path>/initial_testing/<experiment_name>. The base path <experiment_base_path> will be the value you wrote earlier into config/common/private.yaml.

The invoke command runs snakemake with the slurm profile, which automatically launches slurm jobs and uses a container for every workflow step. For testing, the profiles test-in-container and test are useful, which run the whole workflow on the current node, with or without a container. Naturally one should, under these profiles, launch experiments on a compute node within an interactive session.

Configuration

All configuration is stored in config/ and handled by hydra. Additionally there are two special files, config/common/default.yaml and config/common/private.yaml, which are used by snakemake and hydra. Note that config/common/private.yaml should never be pushed, since its user specific.

Docker and Gitlab

This project is setup to use the CERN GitLab CI/CD to automatically build a Docker image based on docker/Dockerfile and requirements.txt. It will also run the pre-commit as part of the pipeline. To edit this behaviour change .gitlab-ci.yml

Contributing

Contributions are welcome! Please submit a pull request or create an issue if you have any improvements or suggestions. Please use the provided pre-commit before making merge requests!

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published