This repository contains an implementation of a diffusion model trained on pixel art images using a U-Net architecture. The model progressively learns to generate pixel art images from noise by reversing a diffusion process.
- Utilizes a U-Net architecture for denoising.
- Implements Denoising Diffusion Probabilistic Models (DDPM).
- Custom image loader for dataset handling.
- On-the-fly data augmentation (resizing, flipping, normalization).
- Visualization of diffusion steps.
- Training pipeline with loss tracking.
- Sampling pipeline to generate new images.
I used the dataset from this website dataset which consists of pixel art images stored in a directory. The images are loaded using a custom dataset class and transformed for training.
- Resize images to
32x32, despite images are32x32we need to be sure about it. - Apply random horizontal flipping with a probability of 50%.
- Normalize pixel values to the range
[-1, 1].
The diffusion model is implemented using U-Net with the following configurations:
- Input Size:
32x32 - Input/Output Channels:
3(RGB images) - Layers Per Block:
2 - Block Out Channels:
(64, 128, 256) - Down Blocks:
("DownBlock2D", "AttnDownBlock2D", "DownBlock2D") - Up Blocks:
("UpBlock2D", "AttnUpBlock2D", "UpBlock2D")
- Optimizer: Adam (
lr=1e-4) - Loss Function: MSE Loss
- Epochs:
500 - Batch Size:
16
- Images are loaded and transformed.
- Random noise is added at different timesteps.
- The model predicts the noise added at a given timestep.
- The loss is computed between predicted and actual noise.
- The model parameters are updated through backpropagation.
- Training continues for
500epochs.
Once trained, the model is used to generate pixel art images:
- A DDPM pipeline is created using the trained model.
- Random noise is used as input.
- The pipeline iteratively removes noise to generate an image.
- The generated images are saved as
generated_images.png.
The script includes a function to visualize the diffusion process:
- Clean Image
- Random Noise
- Noisy Image at a given timestep
- Predicted Noise
This helps in understanding how the model denoises images over time.
python train.pypython generate.pytorchtorchvisionnumpymatplotlibPIL(Pillow)tqdm
Install dependencies using:
pip install torch torchvision numpy matplotlib pillow tqdmThe trained model generates pixel art images that resemble the training dataset. The quality of generated images improves with more training epochs.
- Train on a larger dataset for better diversity.
- Experiment with different noise schedules.
- Optimize the model architecture for better performance.
- Implement using different methods like DDIM, latent diffusion...
- Try different architectures like transformers based
- Add semantic now this just creates what the model has seen in dataset for future work I'll add prompt based generation
This project is open-source and available under the MIT License.
Developed by z3lka.



