This project fine-tunes a Stable Diffusion model to generate images based on tabular data inputs, instead of traditional text prompts. It utilizes a custom neural network to process tabular data and condition the diffusion model's UNet.
- Tabular Data Input: Generates images from numerical or categorical data.
- Custom Conditioning: A custom network transforms tabular input into embeddings for the diffusion model.
- Stable Diffusion v1.5: Built upon the robust and popular Stable Diffusion v1.5 model.
- Mixed Precision Training: Supports
float16andbfloat16mixed precision training for efficient memory usage and faster training. - Gradient Clipping: Implements gradient clipping to prevent gradient explosion during training.
- Fine-Tuning: Modifies the UNet part of the Stable Diffusion pipeline to understand the new conditioning.
- Easy Configuration: All experiment, model, and training parameters are managed through a single
default.yamlfile. - Ready-to-use Scripts: Includes
train.pyfor training andinference.pyfor generating images with a trained model. - Reproducibility: Includes a
Dockerfilefor building a containerized environment.
- Python 3.8+
- PyTorch
- Git and Git LFS
-
Clone the repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name -
Install dependencies:
pip install -r requirements.txt
-
Download Pre-trained Stable Diffusion v1.5: This project requires the original weights for Stable Diffusion v1.5. You need to download them and place them in a directory. The default configuration in
default.yamlexpects them at../../../weight_original/stable-diffusion-v1-5.You can download the weights from the Hugging Face Hub: stable-diffusion-v1-5
Make sure you have
git-lfsinstalled to download the model weights correctly.
The dataloader expects a specific structure for the dataset, as shown in the data_example/sushi directory:
images/: A folder containing all the image files.metadata.jsonl: A JSON Lines file where each line is a dictionary containingfile_nameandcustom_input.
Example metadata.jsonl line:
{"file_name": "sushi_0.jpg", "custom_input": [1, 0, 0, 0, 0]}file_name: The name of the image file in theimagesfolder.custom_input: A list of numbers representing your tabular data. The length of this list must matchmodel_custom.num_input_layerin your configuration file.
Update the data.path_dataset in default.yaml to point to your dataset directory.
All settings are controlled via default.yaml. Key parameters to configure:
experiment: Settings for experiment name, save directory, and seed.model_custom:num_input_layer: The number of features in your tabular data input.path_sd: Path to the pre-trained Stable Diffusion v1.5 weights.
train: Training parameters like epochs, batch size, and learning rates.dtype: Data type for training. Supports"float32","float16", and"bfloat16". Setting to"float16"or"bfloat16"enables mixed precision training automatically if CUDA is available.
data:path_dataset: Path to your dataset directory.
To start training the model, run the train.py script. The script will:
- Load the configuration from
default.yaml. - Set up an experiment directory under
exp_resultswith a timestamp. - Save the configuration and code snapshots for reproducibility.
- Start the training process.
python train.pyCheckpoints and the best model will be saved in the experiment directory (e.g., ../../../exp_results/YYYYMMDD_HHMMSS_exp).
After training, you can generate images using inference.py.
-
Set the model path: Open
inference.pyand modify thepath_infervariable to point to your experiment directory.# in inference.py path_infer = "../../../exp_results/your_experiment_timestamp_exp"
-
Provide input data: Modify the
list_valuesvariable with the tabular data you want to generate images from. The input should be a string of space-separated numbers.# in inference.py values_input_0 = "0 1 0 0 1" values_input_1 = "1 0 0 0 0" list_values = [values_input_0, values_input_1]
-
Run the script:
python inference.py
Generated images will be saved in the img_gen directory.