@@ -12,22 +12,134 @@ specific language governing permissions and limitations under the License.
12
12
13
13
# Stable diffusion XL
14
14
15
- Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release).
16
- The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).
15
+ Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach
17
16
18
- *The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels.
19
- These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).*
17
+ The abstract of the paper is the following:
20
18
21
- For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release).
19
+ *We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.*
22
20
23
21
## Tips
24
22
23
+ - Stable Diffusion XL works especially well with images between 768 and 1024.
24
+ - Stable Diffusion XL output image can be improved by making use of a refiner as shown below
25
+
25
26
### Available checkpoints:
26
27
27
28
- *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`]
28
29
- *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`]
29
30
30
- TODO
31
+ ## Usage Example
32
+
33
+ Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed.
34
+ You can install the libraries as follows:
35
+
36
+ ```
37
+ pip install transformers
38
+ pip install accelerate
39
+ pip install safetensors
40
+ pip install invisible-watermark >=2.0
41
+ ```
42
+
43
+ ### *Text-to-Image*
44
+
45
+ You can use SDXL as follows for *text-to-image*:
46
+
47
+ ```py
48
+ from diffusers import StableDiffusionXLPipeline
49
+ import torch
50
+
51
+ pipe = StableDiffusionXLPipeline.from_pretrained(
52
+ "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
53
+ )
54
+ pipe.to("cuda")
55
+
56
+ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
57
+ image = pipe(prompt=prompt).images[0]
58
+ ```
59
+
60
+ ### Refining the image output
61
+
62
+ The image can be refined by making use of [ stabilityai/stable-diffusion-xl-refiner-0.9] ( https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9 ) .
63
+ In this case, you only have to output the ` latents ` from the base model.
64
+
65
+ ``` py
66
+ from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
67
+ import torch
68
+
69
+ pipe = StableDiffusionXLPipeline.from_pretrained(
70
+ " stabilityai/stable-diffusion-xl-base-0.9" , torch_dtype = torch.float16, variant = " fp16" , use_safetensors = True
71
+ )
72
+ pipe.to(" cuda" )
73
+
74
+ refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
75
+ " stabilityai/stable-diffusion-xl-refiner-0.9" , torch_dtype = torch.float16, use_safetensors = True , variant = " fp16"
76
+ )
77
+ refiner.to(" cuda" )
78
+
79
+ prompt = " Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
80
+
81
+ image = pipe(prompt = prompt, output_type = " latent" if use_refiner else " pil" ).images[0 ]
82
+ image = refiner(prompt = prompt, image = image[None , :]).images[0 ]
83
+ ```
84
+
85
+ ### Loading single file checkpoitns / original file format
86
+
87
+ By making use of [ ` ~diffusers.loaders.FromSingleFileMixin.from_single_file ` ] you can also load the
88
+ original file format into ` diffusers ` :
89
+
90
+ ``` py
91
+ from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
92
+ import torch
93
+
94
+ pipe = StableDiffusionXLPipeline.from_pretrained(
95
+ " stabilityai/stable-diffusion-xl-base-0.9" , torch_dtype = torch.float16, variant = " fp16" , use_safetensors = True
96
+ )
97
+ pipe.to(" cuda" )
98
+
99
+ refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
100
+ " stabilityai/stable-diffusion-xl-refiner-0.9" , torch_dtype = torch.float16, use_safetensors = True , variant = " fp16"
101
+ )
102
+ refiner.to(" cuda" )
103
+ ```
104
+
105
+ ### Memory optimization via model offloading
106
+
107
+ If you are seeing out-of-memory errors, we recommend making use of [ ` StableDiffusionXLPipeline.enable_model_cpu_offload ` ] .
108
+
109
+ ``` diff
110
+ - pipe.to("cuda")
111
+ + pipe.enable_model_cpu_offload()
112
+ ```
113
+
114
+ and
115
+
116
+ ``` diff
117
+ - refiner.to("cuda")
118
+ + refiner.enable_model_cpu_offload()
119
+ ```
120
+
121
+ ### Speed-up inference with ` torch.compile `
122
+
123
+ You can speed up inference by making use of ` torch.compile ` . This should give you ** ca.** 20% speed-up.
124
+
125
+ ``` diff
126
+ + pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
127
+ + refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)
128
+ ```
129
+
130
+ ### Running with ` torch ` < 2.0
131
+
132
+ ** Note** that if you want to run Stable Diffusion XL with ` torch ` < 2.0, please make sure to enable xformers
133
+ attention:
134
+
135
+ ```
136
+ pip install xformers
137
+ ```
138
+
139
+ ``` diff
140
+ + pipe.enable_xformers_memory_efficient_attention()
141
+ + refiner.enable_xformers_memory_efficient_attention()
142
+ ```
31
143
32
144
## StableDiffusionXLPipeline
33
145
0 commit comments