Skip to content

lukakeso/spatio_temporal_feature_injection

Repository files navigation

Spatio Temporal Feature Injection

Image Editing with Diffusion Models in the domain of fashion
Abstract:
Image editing, specifically appearance or style transfer, is an area of computer vision that has seen significant growth due to recent advancements in image diffusion models. There are many different approaches to appearance transfer with fine-tuning, external network adapters and tuning-free methods. In this work, we focus on the latter, where we leverage the existing Stable Diffusion network with some modifications to the internal processing of the attention maps and latent vectors, without modifying the model weights. We propose an appearance transfer method based on partial masking and timestep-controlled appearance modulation, controlling the area and amount of appearance we transfer. The results on the benchmarks outperform current baseline models, which shows the method’s potential for future improvements. Architecture examples_of_generated_garments

Environment

Our code builds on the requirement of the diffusers library. To set up their environment, please run:

git clone https://github.com/lukakeso/spatio_temporal_feature_injection.git
cd STFI
conda env create -f environment/environment.yaml
conda activate STFI

(Optional) You may also want to install SAM-HQ to extract the instance masks:

pip install git+https://github.com/SysCV/sam-hq.git.

Please download the ViT-L HQ-SAM model from the provided link.

Appearance Transfer

python run.py \
--app_image_path example/0.jpg \                                 # appearance image 
--struct_image_path example/1.jpg \                              # atructure image 
--prompt "high-quality, detailed, realistic photo of clothes" \  # default prompt
--output_path results \                                          # output folder
--scenario 1                                                     # scenario number

Acknowledgement

Our code is largely based on the following open-source projects: DIFT, Cross-image-attention, Eye-for-an-eye.

About

Main repository for my thesis: Image Editing with Diffusion Models in the domain of Fashion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages