Pixel Art Diffusion Model

This repository contains an implementation of a diffusion model trained on pixel art images using a U-Net architecture. The model progressively learns to generate pixel art images from noise by reversing a diffusion process.

Example Outputs

Features

Utilizes a U-Net architecture for denoising.
Implements Denoising Diffusion Probabilistic Models (DDPM).
Custom image loader for dataset handling.
On-the-fly data augmentation (resizing, flipping, normalization).
Visualization of diffusion steps.
Training pipeline with loss tracking.
Sampling pipeline to generate new images.

Dataset

I used the dataset from this website dataset which consists of pixel art images stored in a directory. The images are loaded using a custom dataset class and transformed for training.

Data Preprocessing

Resize images to 32x32, despite images are 32x32 we need to be sure about it.
Apply random horizontal flipping with a probability of 50%.
Normalize pixel values to the range [-1, 1].

Model Architecture

The diffusion model is implemented using U-Net with the following configurations:

Input Size: 32x32
Input/Output Channels: 3 (RGB images)
Layers Per Block: 2
Block Out Channels: (64, 128, 256)
Down Blocks: ("DownBlock2D", "AttnDownBlock2D", "DownBlock2D")
Up Blocks: ("UpBlock2D", "AttnUpBlock2D", "UpBlock2D")

Training

Training Details

Optimizer: Adam (lr=1e-4)
Loss Function: MSE Loss
Epochs: 500
Batch Size: 16

Training Process

Images are loaded and transformed.
Random noise is added at different timesteps.
The model predicts the noise added at a given timestep.
The loss is computed between predicted and actual noise.
The model parameters are updated through backpropagation.
Training continues for 500 epochs.

Sampling and Image Generation

Once trained, the model is used to generate pixel art images:

A DDPM pipeline is created using the trained model.
Random noise is used as input.
The pipeline iteratively removes noise to generate an image.
The generated images are saved as generated_images.png.

Visualization

The script includes a function to visualize the diffusion process:

Clean Image
Random Noise
Noisy Image at a given timestep
Predicted Noise

This helps in understanding how the model denoises images over time.

Usage

Training the Model

python train.py

Generating Images

python generate.py

Dependencies

torch
torchvision
numpy
matplotlib
PIL (Pillow)
tqdm

Install dependencies using:

pip install torch torchvision numpy matplotlib pillow tqdm

Results

The trained model generates pixel art images that resemble the training dataset. The quality of generated images improves with more training epochs.

Future Improvements

Train on a larger dataset for better diversity.
Experiment with different noise schedules.
Optimize the model architecture for better performance.

ToDos

Implement using different methods like DDIM, latent diffusion...
Try different architectures like transformers based
Add semantic now this just creates what the model has seen in dataset for future work I'll add prompt based generation

License

This project is open-source and available under the MIT License.

Author

Developed by z3lka.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
samples		samples
LICENSE		LICENSE
diffusion_model.pth		diffusion_model.pth
generate.py		generate.py
readme.md		readme.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pixel Art Diffusion Model

Example Outputs

Features

Dataset

Data Preprocessing

Model Architecture

Training

Training Details

Training Process

Sampling and Image Generation

Visualization

Usage

Training the Model

Generating Images

Dependencies

Results

Future Improvements

ToDos

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pixel Art Diffusion Model

Example Outputs

Features

Dataset

Data Preprocessing

Model Architecture

Training

Training Details

Training Process

Sampling and Image Generation

Visualization

Usage

Training the Model

Generating Images

Dependencies

Results

Future Improvements

ToDos

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages