tabular_diffusion

This project was generated by the RODEM machine learning template, setting it up with the standard machine learning frameworks PyTorch and Lightning, as well as configuration and workflow management through Hydra and Snakemake. Additionally, this repository relies on MLTools, which is used for certain helper functions.

Usage

After cloning the repo to any HPC cluster, start by setting up the workflow dependencies by running

module load GCCcore/12.3.0 Python/3.11.3
pip install --upgrade pip
pip install -r workflow_requirements.txt

We load python 3.11.3 here (specific to Baobab), to make sure that we have a newer python version, as needed for some of the packages we want to use.

Next, make your WandB API key available through

printf "enter key: "; read WANDB_API_KEY; export WANDB_API_KEY

and configure your user specific details by first copying the defaults

cp config/common/default.yaml config/common/private.yaml

and then filling in the respective values in config/common/private.yaml.

You also need to pull the latest container from the gitlab registry by running

invoke container-pull --login

Now we can launch experiments with

invoke experiment-run <experiment_name>

where <experiment_name> is used to identify the current run/experiment and to define the overall output directory, which will be <experiment_base_path>/initial_testing/<experiment_name>. The base path <experiment_base_path> will be the value you wrote earlier into config/common/private.yaml.

The invoke command runs snakemake with the slurm profile, which automatically launches slurm jobs and uses a container for every workflow step. For testing, the profiles test-in-container and test are useful, which run the whole workflow on the current node, with or without a container. Naturally one should, under these profiles, launch experiments on a compute node within an interactive session.

Configuration

All configuration is stored in config/ and handled by hydra. Additionally there are two special files, config/common/default.yaml and config/common/private.yaml, which are used by snakemake and hydra. Note that config/common/private.yaml should never be pushed, since its user specific.

Docker and Gitlab

This project is setup to use the CERN GitLab CI/CD to automatically build a Docker image based on docker/Dockerfile and requirements.txt. It will also run the pre-commit as part of the pipeline. To edit this behaviour change .gitlab-ci.yml

Contributing

Contributions are welcome! Please submit a pull request or create an issue if you have any improvements or suggestions. Please use the provided pre-commit before making merge requests!

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
docker		docker
libs		libs
tabular_diffusion		tabular_diffusion
tests		tests
workflow		workflow
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
result.ckpt		result.ckpt
tasks.py		tasks.py
workflow_requirements.txt		workflow_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tabular_diffusion

Usage

Configuration

Docker and Gitlab

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

sambklein/tabular_diffusion

Folders and files

Latest commit

History

Repository files navigation

tabular_diffusion

Usage

Configuration

Docker and Gitlab

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages