A PyTorch-based reproduction of the LayoutDiffusion model proposed in the paper:
"LayoutDiffusion: Content-Aware Layout Generation with Denoising Diffusion Models"
[arXiv:2305.18252]
This is a work-in-progress implementation aiming to reproduce key components and results from the LayoutDiffusion paper.
- Paper reading and model structure analysis
- Dataset preparation (PubLayNet / RICO or custom layouts)
- Layout encoder & DDPM forward process
- Cross-modal conditioning (category, image, text embedding)
- Loss functions and training pipeline
- Sampling & layout rendering
LayoutDiffusion is a content-aware layout generation method that formulates layout synthesis as a denoising diffusion process. Key contributions include:
- Category-conditional layout generation
- Cross-modal embeddings for alignment
- Transformer-based architecture with DDPM backbone
. βββ configs/ # YAML or JSON training configs βββ datasets/ # Processed layout datasets βββ models/ # Diffusion model, transformer blocks, encoders βββ scripts/ # Training and evaluation scripts βββ utils/ # Data loaders, metrics, visualizations βββ main.py # Entry point
pip install torch torchvision einops matplotlib numpy
pip install diffusers transformers # optional if using Huggingface tools