Releases: huggingface/diffusers
Instruct-Pix2Pix, DiT, LoRA
🪄 Instruct-Pix2Pix
Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.
The model was released with the paper InstructPix2Pix: Learning to Follow Image Editing Instructions. More information about the model can be found in the paper.
pip install diffusers transformers safetensors accelerate
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(url)
prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images[0].save("snowy_mountains.png")
- Add InstructPix2Pix pipeline by @patil-suraj #2040
🤖 DiT
Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.
The model was released with the paper Scalable Diffusion Models with Transformers.
import torch
from diffusers import DiTPipeline
model_id = "facebook/DiT-XL-2-256"
pipe = DiTPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
# pick words that exist in ImageNet
words = ["white shark", "umbrella"]
class_ids = pipe.get_label_ids(words)
output = pipe(class_labels=class_ids)
image = output.images[0] # label 'white shark'
⚡ LoRA
LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.
LoRA was proposed in LoRA: Low-Rank Adaptation of Large Language Models. In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.
Diffusers now supports LoRA! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to UNet2DConditionModel
and DreamBooth training script by @patrickvonplaten in #1884.
By using LoRA, the fine-tuned checkpoints will be just 3 MBs in size. After fine-tuning, you can use the LoRA checkpoints like so:
from diffusers import StableDiffusionPipeline
import torch
model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")
You can follow these resources to know more about how to use LoRA in diffusers:
- text2image fine-tuning script (by @sayakpaul in #2031).
- Official documentation discussing how LoRA is supported (by @sayakpaul in #2086).
📐 Customizable Cross Attention
LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as Prompt-to-Prompt, and it makes it easier to apply optimizers like xFormers. This new "attention processor" abstraction was created by @patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!
🌿 Flax => PyTorch
A long requested feature, prolific community member @camenduru took up the gauntlet in #1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks @camenduru!
🌀 Flax Img2Img
Another community member, @dhruvrnaik, ported the image-to-image pipeline to Flax in #1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!
🎲 DEIS Scheduler
DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps.
The scheduler was introduced in the paper Fast Sampling of Diffusion Models with Exponential Integrator. More information about the scheduler can be found in the paper.
from diffusers import StableDiffusionPipeline, DEISMultistepScheduler
import torch
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0
Reproducibility
One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures
much better reproducibility across GPU hardware:
import torch
from diffusers import DDIMPipeline
import numpy as np
model_id = "google/ddpm-cifar10-32"
# load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)
ddim.to("cuda")
# create a generator for reproducibility
generator = torch.manual_seed(0)
# run pipeline for just two steps and return numpy tensor
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())
See: #1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
Important New Guides
- Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
- Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
- LoRA: https://huggingface.co/docs/diffusers/training/lora
Important Bug Fixes
- Don't download safetensors if library is not installed: #2057
- Make sure that
save_pretrained(...)
doesn't accidentally delete files: #2038 - Fix CPU offload docs for maximum memory gain: #1968
- Fix conversion for exotically sorted weight names: #1959
- Fix intermediate checkpointing for textual inversion, thanks @lstein #2072
All commits
- update composable diffusion for an updated diffuser library by @nanlliu in #1697
- [Tests] Fix UnCLIP cpu offload tests by @anton-l in #1769
- Bump to 0.12.0.dev0 by @anton-l in #1771
- [Dreambooth] flax fixes by @pcuenca in #1765
- update train_unconditional_ort.py by @prathikr in #1775
- Only test for xformers when enabling them #1773 by @kig in #1776
- expose polynomial:power and cosine_with_restarts:num_cycles params by @zetyquickly in #1737
- [Flax] Stateless schedulers, fixes and refactors by @skirsten in #1661
- Correct hf hub download by @patrickvonplaten in #1767
- Dreambooth docs: minor fixes by @pcuenca in #1758
- Fix num images per prompt unclip by @patil-suraj in #1787
- Add Flax stable diffusion img2img pipeline by @dhruvrnaik in #1355
- Refactor cross attention and allow mechanism to tweak cross attention function by @patrickvonplaten in #1639
- Fix OOM when using PyTorch with JAX installed. by @pcuenca in #1795
- reorder model wrap + bug fix by @prathikr in #1799
- Remove hardcoded names from PT scripts by @patrickvonplaten in #1778
- [textual_inversion] unwrap_model text encoder before accessing weights by @patil-suraj in #1816
- fix small mistake in annotation: 32 -> 64 by @Line290 in #1780
- Make safety_checker optional in more pipelines by @pcuenca in #1796
- Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by @camenduru in #1844
- Avoid duplicating PyTorch + safetensors downloads. by @pcuenca in #1836
- Width was typod as weight by @Helw150 in #1800
- fix: resize transform now preserves aspect ratio by @parlance-zz in #1804
- Make xformers optional even if it is available by @kn in #1753
- Allow selecting precision to make Dreambooth class images by @kabachuha in #1832
- unCLIP image variation by @williamberman in #1781
- [Community Pipeline] MagicMix ...
v0.11.1: Patch release
This patch release fixes a bug with num_images_per_prompt
in the UnCLIPPipeline
- Fix num images per prompt unclip by @patil-suraj in #1787
v0.11.0: Karlo UnCLIP, safetensors, pipeline versions
🪄 Karlo UnCLIP by Kakao Brain
Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details in a small number of denoising steps.
This alpha version of Karlo is trained on 115M image-text pairs, including COYO-100M high-quality subset, CC3M, and CC12M.
For more information about the architecture, see the Karlo repository: https://github.com/kakaobrain/karlo
pip install diffusers transformers safetensors accelerate
import torch
from diffusers import UnCLIPPipeline
pipe = UnCLIPPipeline.from_pretrained("kakaobrain/karlo-v1-alpha", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a high-resolution photograph of a big red frog on a green leaf."
image = pipe(prompt).images[0]
Community pipeline versioning
The community pipelines hosted in diffusers/examples/community
will now follow the installed version of the library.
E.g. if you have diffusers==0.9.0
installed, the pipelines from the v0.9.0
branch will be used: https://github.com/huggingface/diffusers/tree/v0.9.0/examples/community
If you've installed diffusers from source, e.g. with pip install git+https://github.com/huggingface/diffusers
then the latest versions of the pipelines will be fetched from the main
branch.
To change the custom pipeline version, set the custom_revision
variable like so:
pipeline = DiffusionPipeline.from_pretrained(
"google/ddpm-cifar10-32", custom_pipeline="one_step_unet", custom_revision="0.10.2"
)
🦺 safetensors
Many of the most important checkpoints now have https://github.com/huggingface/safetensors available. Upon installing safetensors
with:
pip install safetensors
You will see a nice speed-up when loading your model 🚀
Some of the most improtant checkpoints have safetensor weights added now:
- https://huggingface.co/stabilityai/stable-diffusion-2
- https://huggingface.co/stabilityai/stable-diffusion-2-1
- https://huggingface.co/stabilityai/stable-diffusion-2-depth
- https://huggingface.co/stabilityai/stable-diffusion-2-inpainting
Batched generation bug fixes 🐛
- Make sure all pipelines can run with batched input by @patrickvonplaten in #1669
We fixed a lot of bugs for batched generation. All pipelines should now correctly process batches of prompts and images 🤗
Also we made it much easier to tweak images with reproducible seeds:
https://huggingface.co/docs/diffusers/using-diffusers/reusing_seeds
📝 Changelog
- Remove spurious arg in training scripts by @pcuenca in #1644
- dreambooth: fix #1566: maintain fp32 wrapper when saving a checkpoint to avoid crash when running fp16 by @timh in #1618
- Allow k pipeline to generate > 1 images by @pcuenca in #1645
- Remove unnecessary offset in img2img by @patrickvonplaten in #1653
- Remove unnecessary kwargs in depth2img by @maruel in #1648
- Add text encoder conversion by @lawfordp2017 in #1559
- VersatileDiffusion: fix input processing by @LukasStruppek in #1568
- tensor format ort bug fix by @prathikr in #1557
- Deprecate init image correctly by @patrickvonplaten in #1649
- fix bug if we don't do_classifier_free_guidance by @MKFMIKU in #1601
- Handle missing global_step key in scripts/convert_original_stable_diffusion_to_diffusers.py by @Cyberes in #1612
- [SD] Make sure scheduler is correct when converting by @patrickvonplaten in #1667
- [Textual Inversion] Do not update other embeddings by @patrickvonplaten in #1665
- Added Community pipeline for comparing Stable Diffusion v1.1-4 checkpoints by @suvadityamuk in #1584
- Fix wrong type checking in
convert_diffusers_to_original_stable_diffusion.py
by @apolinario in #1681 - [Version] Bump to 0.11.0.dev0 by @patrickvonplaten in #1682
- Dreambooth: save / restore training state by @pcuenca in #1668
- Disable telemetry when DISABLE_TELEMETRY is set by @w4ffl35 in #1686
- Change one-step dummy pipeline for testing by @patrickvonplaten in #1690
- [Community pipeline] Add github mechanism by @patrickvonplaten in #1680
- Dreambooth: use warnings instead of logger in parse_args() by @pcuenca in #1688
- manually update train_unconditional_ort by @prathikr in #1694
- Remove all local telemetry by @anton-l in #1702
- Update main docs by @patrickvonplaten in #1706
- [Readme] Clarify package owners by @anton-l in #1707
- Fix the bug that torch version less than 1.12 throws TypeError by @chinoll in #1671
- RePaint fast tests and API conforming by @anton-l in #1701
- Add state checkpointing to other training scripts by @pcuenca in #1687
- Improve pipeline_stable_diffusion_inpaint_legacy.py by @cyber-meow in #1585
- apply amp bf16 on textual inversion by @jiqing-feng in #1465
- Add examples with Intel optimizations by @hshen14 in #1579
- Added a README page for docs and a "schedulers" page by @yiyixuxu in #1710
- Accept latents as optional input in Latent Diffusion pipeline by @daspartho in #1723
- Fix ONNX img2img preprocessing and add fast tests coverage by @anton-l in #1727
- Fix ldm tests on master by not running the CPU tests on GPU by @patrickvonplaten in #1729
- Docs: recommend xformers by @pcuenca in #1724
- Nightly integration tests by @anton-l in #1664
- [Batched Generators] This PR adds generators that are useful to make batched generation fully reproducible by @patrickvonplaten in #1718
- Fix ONNX img2img preprocessing by @peterto in #1736
- Fix MPS fast test warnings by @anton-l in #1744
- Fix/update the LDM pipeline and tests by @anton-l in #1743
- kakaobrain unCLIP by @williamberman in #1428
- [fix] pipeline_unclip generator by @williamberman in #1751
- unCLIP docs by @williamberman in #1754
- Correct help text for scheduler_type flag in scripts. by @msiedlarek in #1749
- Add resnet_time_scale_shift to VD layers by @anton-l in #1757
- Add attention mask to uclip by @patrickvonplaten in #1756
- Support attn2==None for xformers by @anton-l in #1759
- [UnCLIPPipeline] fix num_images_per_prompt by @patil-suraj in #1762
- Add CPU offloading to UnCLIP by @anton-l in #1761
- [Versatile] fix attention mask by @patrickvonplaten in #1763
- [Revision] Don't recommend using revision by @patrickvonplaten in #1764
- [Examples] Update train_unconditional.py to include logging argument for Wandb by @ash0ts in #1719
- Transformers version req for UnCLIP by @anton-l in #1766
v0.10.2: Patch release
This patch removes the hard requirement for transformers>=4.25.1
in case external libraries were downgrading the library upon startup in a non-controllable way.
- do not automatically enable xformers by @patrickvonplaten in #1640
- Adapt to forced transformers version in some dependent libraries by @anton-l in #1638
- Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627
🚨🚨🚨 Note that xformers in not automatically enabled anymore 🚨🚨🚨
The reasons for this are given here: #1640 (comment):
We should not automatically enable xformers for three reasons:
It's not PyTorch-like API. PyTorch doesn't by default enable all the fastest options available
We allocate GPU memory before the user even does .to("cuda")
This behavior is not consistent with cases where xformers is not installed
=> This means: If you were used to have xformers automatically enabled, please make sure to add the following now:
from diffusers.utils.import_utils import is_xformers_available
unet = ... # load unet
if is_xformers_available():
try:
unet.enable_xformers_memory_efficient_attention(True)
except Exception as e:
logger.warning(
"Could not enable memory efficient attention. Make sure xformers is installed"
f" correctly and a GPU is available: {e}"
)
for the UNet (e.g. in dreambooth) or for the pipeline:
from diffusers.utils.import_utils import is_xformers_available
pipe = ... # load pipeline
if is_xformers_available():
try:
pipe.enable_xformers_memory_efficient_attention(True)
except Exception as e:
logger.warning(
"Could not enable memory efficient attention. Make sure xformers is installed"
f" correctly and a GPU is available: {e}"
)
v0.10.1: Patch release
This patch returns enable_xformers_memory_efficient_attention()
to UNet2DCondition
to restore backward compatibility.
- Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627
v0.10.0: Depth Guidance and Safer Checkpoints
🐳 Depth-Guided Stable Diffusion and 2.1 checkpoints
The new depth-guided stable diffusion model is fully supported in this release. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.
Installing the transformers
library from source is required for the MiDaS model:
pip install --upgrade git+https://github.com/huggingface/transformers/
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
).to("cuda")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)
prompt = "two tigers"
n_propmt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0]
The updated Stable Diffusion 2.1 checkpoints are also released and fully supported:
- https://huggingface.co/stabilityai/stable-diffusion-2-1
- https://huggingface.co/stabilityai/stable-diffusion-2-1-base
🦺 Safe Tensors
We now support SafeTensors: a new simple format for storing tensors safely (as opposed to pickle) that is still fast (zero-copy).
- [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
- [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16 |
---|---|---|---|---|---|---|---|
pickle (PyTorch) | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ |
H5 (Tensorflow) | ✓ | ✗ | ✓ | ✓ | ~ | ~ | ✗ |
SavedModel (Tensorflow) | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ |
MsgPack (flax) | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ |
SafeTensors | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
**More details about the comparison here: https://github.com/huggingface/safetensors#yet-another-format-
pip install safetensors
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.save_pretrained("./safe-stable-diffusion-2-1", safe_serialization=True)
# you can also push this checkpoint to the HF Hub and load from there
safe_pipe = StableDiffusionPipeline.from_pretrained("./safe-stable-diffusion-2-1")
New Pipelines
🖌️ Paint-by-example
An implementation of Paint by Example: Exemplar-based Image Editing with Diffusion Models by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
- Add paint by example by @patrickvonplaten in #1533
import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example_1.png"
mask_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example_1.png"
example_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example_1.jpg"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
example_image = download_image(example_url).resize((512, 512))
pipe = DiffusionPipeline.from_pretrained("Fantasy-Studio/Paint-by-Example", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]
Audio Diffusion and Latent Audio Diffusion
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to and from mel spectrogram images.
from IPython.display import Audio
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to("cuda")
output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
[Experimental] K-Diffusion pipeline for Stable Diffusion
This pipeline is added to support the latest schedulers from @crowsonkb's k-diffusion
The purpose of this pipeline is to compare scheduler implementations and updates, so new features from other pipelines are unlikely to be supported!
- [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
pip install k-diffusion
from diffusers import StableDiffusionKDiffusionPipeline
import torch
pipe = StableDiffusionKDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")
pipe = pipe.to("cuda")
pipe.set_scheduler("sample_heun")
image = pipe("astronaut riding horse", num_inference_steps=25).images[0]
New Schedulers
Heun scheduler inspired by Karras et. al
Algorithm 1 of Karras et. al. Scheduler ported from @crowsonkb’s k-diffusion
- Add 2nd order heun scheduler by @patrickvonplaten in #1336
from diffusers import HeunDiscreteScheduler
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = HeunDiscreteScheduler.from_config(pipe.scheduler.config)
Single step DPM-Solver
Original paper can be found here and the improved version. The original implementation can be found here.
- Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
from diffusers import DPMSolverSinglestepScheduler
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = DPMSolverSinglestepScheduler.from_config(pipe.scheduler.config)
📝 Changelog
- [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
- Hotfix for AttributeErrors in OnnxStableDiffusionInpaintPipelineLegacy by @anton-l in #1448
- Speed up test and remove kwargs from call by @patrickvonplaten in #1446
- v-prediction training support by @patil-suraj in #1455
- Fix Flax
from_pt
by @pcuenca in #1436 - Ensure Flax pipeline always returns numpy array by @pcuenca in #1435
- Add 2nd order heun scheduler by @patrickvonplaten in #1336
- fix slow tests by @patrickvonplaten in #1467
- Flax support for Stable Diffusion 2 by @pcuenca in #1423
- Updates Image to Image Inpainting community pipeline README by @vvvm23 in #1370
- StableDiffusion: Decode latents separately to run larger batches by @kig in #1150
- Fix bug in half precision for DPMSolverMultistepScheduler by @rtaori in #1349
- [Train unconditional] Unwrap model before EMA by @anton-l in #1469
- Add
ort_nightly_directml
to theonnxruntime
candidates by @anton-l in #1458 - Allow saving trained betas by @patrickvonplaten in #1468
- Fix dtype model loading by @patrickvonplaten in #1449
- [Dreambooth] Make compatible with alt diffusion by @patrickvonplaten in #1470
- Add better docs xformers by @patrickvonplaten in #1487
- Remove reminder comment by @pcuenca in #1489
- Bump to 0.10.0.dev0 + deprecations by @anton-l in #1490
- Add doc for Stable Diffusion on Habana Gaudi by @regisss in #1496
- Replace deprecated hub utils in
train_unconditional_ort
by @anton-l in #1504 - [Deprecate] Correct stacklevel by @patrickvonplaten in #1483
- simplyfy AttentionBlock by @patil-suraj in #1492
- Standardize on using
image
argument in all pipelines by @fboulnois in #1361 - support v prediction in other schedulers by @patil-suraj in #1505
- Fix Flax flip_sin_to_cos by @akashgokul in #1369
- Add an explicit
--image_size
to the conversion script by @anton-l in #1509 - fix heun scheduler by @patil-suraj in #1512
- [docs] [dreambooth training] accelerate.utils.write_basic_config by @williamberman in #1513
- [docs] [dreambooth training] num_class_images clarification by @williamberman in #1508
- [From pretrained] Allow returning local path by @patrickvonplaten in #1450
- Update conversion script to correctly handle SD 2 by @patrickvonplaten in #1511
- [refactor] Making the xformers mem-efficient attention activation recursive by @blefaudeux in #1493
- Do not use torch.long in mps by @pcuenca in #1488
- Fix Imagic example by @dhruvrnaik in #1520
- Fix training docs to install datasets by @pedrogengo in #1476
- Finalize 2nd order schedulers by @patrickvonplaten in #1503
- Fixed mask+masked_image in sd inpaint pipeline by @antoche in #1516
- Create train_dreambooth_inpaint.py by @thedarkzeno in #1091
- Update FlaxLMSDiscreteScheduler by @dzlab in #1474
- [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
- Add xformers attention to VAE by @kig in #1507
- [CI] Add slow MPS tests by @anton-l in #1104
- [Stable Diffusion Inpaint] Allow tensor as input image & mask by @patrickvonplaten in #1527
- Compute embedding distances with torch.cdist by @blefaudeux in #1459
- [Upscaling] Fix batch size by @patrickvonplaten in...
v0.9.0: Stable Diffusion 2
🎨 Stable Diffusion 2 is here!
Installation
pip install diffusers[torch]==0.9 transformers
Stable Diffusion 2.0 is available in several flavors:
Stable Diffusion 2.0-V at 768x768
New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5
, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
repo_id = "stabilityai/stable-diffusion-2"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, guidance_scale=9, num_inference_steps=25).images[0]
image.save("astronaut.png")
Stable Diffusion 2.0-base at 512x512
The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("astronaut.png")
Stable Diffusion 2.0 for Inpanting
This model for text-guided inpanting is finetuned from SD 2.0-base. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning.
import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
repo_id = "stabilityai/stable-diffusion-2-inpainting"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25).images[0]
image.save("yellow_cat.png")
Stable Diffusion X4 Upscaler
The model was trained on crops of size 512x512 and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a noise_level as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule.
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128))
prompt = "a white cat"
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat.png")
Saving & Loading is fixed for Versatile Diffusion
Previously there was a 🐛 when saving & loading versatile diffusion - this is fixed now so that memory efficient saving & loading works as expected.
- [Versatile Diffusion] Fix remaining tests by @patrickvonplaten in #1418
📝 Changelog
- add v prediction by @patil-suraj in #1386
- Adapt UNet2D for supre-resolution by @patil-suraj in #1385
- Version 0.9.0.dev0 by @anton-l in #1394
- Make height and width optional by @patrickvonplaten in #1401
- [Config] Add optional arguments by @patrickvonplaten in #1395
- Upscaling fixed by @patrickvonplaten in #1402
- Add the new SD2 attention params to the VD text unet by @anton-l in #1400
- Deprecate sample size by @patrickvonplaten in #1406
- Support SD2 attention slicing by @anton-l in #1397
- Add SD2 inpainting integration tests by @anton-l in #1412
- Fix sample size conversion script by @patrickvonplaten in #1408
- fix clip guided by @patrickvonplaten in #1414
- Fix all stable diffusion by @patrickvonplaten in #1415
- [MPS] call contiguous after permute by @kashif in #1411
- Deprecate
predict_epsilon
by @pcuenca in #1393 - Fix ONNX conversion and inference by @anton-l in #1416
- Allow to set config params directly in init by @patrickvonplaten in #1419
- Add tests for Stable Diffusion 2 V-prediction 768x768 by @anton-l in #1420
- StableDiffusionUpscalePipeline by @patil-suraj in #1396
- added initial v-pred support to DPM-solver by @kashif in #1421
- SD2 docs by @patrickvonplaten in #1424
v0.8.1: Patch release
This patch release fixes an error with CLIPVisionModelWithProjection
imports on a non-git transformers
installation.
pip install --upgrade diffusers
or pip install diffusers==0.8.1
- [Bad dependencies] Fix imports (#1382) by @patrickvonplaten
v0.8.0: Versatile Diffusion - Text, Images and Variations All in One Diffusion Model
🙆♀️ New Models
VersatileDiffusion
VersatileDiffusion, released by SHI-Labs, is a unified multi-flow multimodal diffusion model that is capable of doing multiple tasks such as text2image, image variations, dual-guided(text+image) image generation, image2text.
- [Versatile Diffusion] Add versatile diffusion model by @patrickvonplaten @anton-l #1283
Make sure to installtransformers
from "main":
pip install git+https://github.com/huggingface/transformers
Then you can run:
from diffusers import VersatileDiffusionPipeline
import torch
import requests
from io import BytesIO
from PIL import Image
pipe = VersatileDiffusionPipeline.from_pretrained("shi-labs/versatile-diffusion", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
# initial image
url = "https://huggingface.co/datasets/diffusers/images/resolve/main/benz.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")
# prompt
prompt = "a red car"
# text to image
image = pipe.text_to_image(prompt).images[0]
# image variation
image = pipe.image_variation(image).images[0]
# image variation
image = pipe.dual_guided(prompt, image).images[0]
More in-depth details can be found on:
AltDiffusion
AltDiffusion is a multilingual latent diffusion model that supports text-to-image generation for 9 different languages: English, Chinese, Spanish, French, Japanese, Korean, Arabic, Russian and Italian.
- Add AltDiffusion by @patrickvonplaten @patil-suraj #1299
Stable Diffusion Image Variations
StableDiffusionImageVariationPipeline
by @justinpinkney is a stable diffusion model that takes an image as an input and generates variations of that image. It is conditioned on CLIP image embeddings instead of text.
- StableDiffusionImageVariationPipeline by @patil-suraj #1365
Safe Latent Diffusion
Safe Latent Diffusion (SLD), released by ml-research@TUDarmstadt group, is a new practical and sophisticated approach to prevent unsolicited content from being generated by diffusion models. One of the authors of the research contributed their implementation to diffusers
.
- Add Safe Stable Diffusion Pipeline by @manuelbrack #1244
VQ-Diffusion with classifier-free sampling
vq diffusion classifier free sampling by @williamberman #1294
LDM super resolution
LDM super resolution is a latent 4x super-resolution diffusion model released by CompVis.
- Add LDM Super Resolution pipeline by @duongna21 #1116
CycleDiffusion
CycleDiffusion is a method that uses Text-to-Image Diffusion Models for Image-to-Image Editing. It is capable of
- Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion.
Traditional unpaired image-to-image translation with diffusion models trained on two related domains. - Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion.
Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
CLIPSeg + StableDiffusionInpainting.
Uses CLIPSeg to automatically generate a mask using segmentation, and then applies Stable Diffusion in-painting.
K-Diffusion wrapper
K-Diffusion Pipeline is community pipeline that allows to use any sampler from K-diffusion with diffusers
models.
- [Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten #1360
🌀New SOTA Scheduler
DPMSolverMultistepScheduler
is the 🧨 diffusers
implementation of DPM-Solver++, a state-of-the-art scheduler that was contributed by one of the authors of the paper. This scheduler is able to achieve great quality in as few as 20 steps. It's a drop-in replacement for the default Stable Diffusion scheduler, so you can use it to essentially half generation times. It works so well that we adopted it for the Stable Diffusion demo Spaces: https://huggingface.co/spaces/stabilityai/stable-diffusion, https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5.
You can use it like this:
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
repo_id = "runwayml/stable-diffusion-v1-5"
scheduler = DPMSolverMultistepScheduler.from_pretrained(repo_id, subfolder="scheduler")
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)
🌐 Better scheduler API
The example above also demonstrates how to load schedulers using a new API that is coherent with model loading and therefore more natural and intuitive.
You can load a scheduler using from_pretrained
, as demonstrated above, or you can instantiate one from an existing scheduler configuration. This is a way to replace the scheduler of a pipeline that was previously loaded:
from diffusers import DiffusionPipeline, EulerDiscreteScheduler
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
Read more about these changes in the documentation. See also the community pipeline that allows using any of the K-diffusion samplers with diffusers
, as mentioned above!
🎉 Performance
We work relentlessly to incorporate performance optimizations and memory reduction techniques to 🧨 diffusers. These are two of the most noteworthy incorporations in this release:
- Enable memory-efficient attention by default if xFormers is installed.
- Use batched-matmuls when possible.
🎁 Quality of Life improvements
- Fix/Enable all schedulers for in-painting
- Easier loading of local pipelines
- cpu offloading: mutli GPU support
📝 Changelog
- Add multistep DPM-Solver discrete scheduler by @LuChengTHU in #1132
- Remove warning about half precision on MPS by @pcuenca in #1163
- Fix typo latens -> latents by @duongna21 in #1171
- Fix community pipeline links by @pcuenca in #1162
- [Docs] Add loading script by @patrickvonplaten in #1174
- Fix dtype safety checker inpaint legacy by @patrickvonplaten in #1137
- Community pipeline img2img inpainting by @vvvm23 in #1114
- [Community Pipeline] Add multilingual stable diffusion to community pipelines by @juancopi81 in #1142
- [Flax examples] Load text encoder from subfolder by @duongna21 in #1147
- Link to Dreambooth blog post instead of W&B report by @pcuenca in #1180
- Fix small typo by @pcuenca in #1178
- [DDIMScheduler] fix noise device in ddim step by @patil-suraj in #1189
- MPS schedulers: don't use float64 by @pcuenca in #1169
- Warning for invalid options without "--with_prior_preservation" by @shirayu in #1065
- [ONNX] Improve ONNXPipeline scheduler compatibility, fix safety_checker by @anton-l in #1173
- Restore compatibility with deprecated
StableDiffusionOnnxPipeline
by @pcuenca in #1191 - Update pr docs actions by @mishig25 in #1194
- handle dtype xformers attention by @patil-suraj in #1196
- [Scheduler] Move predict epsilon to init by @patrickvonplaten in #1155
- add licenses to pipelines by @natolambert in #1201
- Fix cpu offloading by @anton-l in #1177
- Fix slow tests by @patrickvonplaten in #1210
- [Flax] fix extra copy pasta 🍝 by @camenduru in #1187
- [CLIPGuidedStableDiffusion] support DDIM scheduler by @patil-suraj in #1190
- Fix layer names convert LDM script by @duongna21 in #1206
- [Loading] Make sure loading edge cases work by @patrickvonplaten in #1192
- Add LDM Super Resolution pipeline by @duongna21 in #1116
- [Conversion] Improve conversion script by @patrickvonplaten in #1218
- DDIM docs by @patrickvonplaten in #1219
- apply
repeat_interleave
fix formps
to stable diffusion image2image pipeline by @jncasey in #1135 - Flax tests: don't hardcode number of devices by @pcuenca in #1175
- Improve documentation for the LPW pipeline by @exo-pla-net in #1182
- Factor out encode text with Copied from by @patrickvonplaten in #1224
- Match the generator device to the pipeline for DDPM and DDIM by @anton-l in #1222
- [Tests] Fix mps+generator fast tests by @anton-l in #1230
- [Tests] Adjust TPU test values by @anton-l in #1233
- Add a reference to the name 'Sampler' by @apolinario in #1172
- Fix Flax usage comments by @pcuenca in #1211
- [Docs] improve img2img example by @ruanrz in #1193
- [Stable Diffusion] Fix padding / truncation by @patrickvonplaten in #1226
- Finalize stable diffusion refactor by @patrickvonplaten in #1269
- Edited attention.py for older xformers by @Lime-Cakes in #1270
- Fix wrong link in text2img fine-tuning documentation by @daspartho in #1282
- [StableDiffusionInpaintPipeline] fix batch_size for mask and masked latents by @patil-suraj in #1279
- Add UNet 1d for RL model for planning + colab by @natolambert in #105
- Fix documentation typo for
UNet2DModel
andUNet2DConditionModel
by @xenova in #1275 - add source link to composable diffusion model by @nanliu1 in #1293
- Fix incorrect link to Stable Diffusion notebook by @dhruvrnaik in #1291
- [dreambooth] link to bitsandbytes readme for installation by @0xdevalias in #1229
- Add Scheduler.from_pretrained and better scheduler changing by @patrickvonplaten in #1286
- Add AltDiffusion by @patrickvonplaten in #1299
- Better error messag...