Skip to content

Commit 38e563d

Browse files
Fix SD XL Docs (#3971)
* finish sd xl docs * make style * Apply suggestions from code review * uP * uP * Correct
1 parent b8f089c commit 38e563d

File tree

4 files changed

+133
-30
lines changed

4 files changed

+133
-30
lines changed

.github/workflows/build_documentation.yml

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,13 @@ on:
1111
jobs:
1212
build:
1313
steps:
14-
- name: Install dependencies
15-
run: |
16-
apt-get update && apt-get install libsndfile1-dev libgl1 -y
17-
18-
- name: Build doc
19-
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
20-
with:
21-
commit_sha: ${{ github.sha }}
22-
package: diffusers
23-
notebook_folder: diffusers_doc
24-
languages: en ko zh
14+
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
15+
with:
16+
commit_sha: ${{ github.sha }}
17+
install_libgl1: true
18+
package: diffusers
19+
notebook_folder: diffusers_doc
20+
languages: en ko zh
2521

2622
secrets:
2723
token: ${{ secrets.HUGGINGFACE_PUSH }}

.github/workflows/build_pr_documentation.yml

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,10 @@ concurrency:
99

1010
jobs:
1111
build:
12-
steps:
13-
- name: Install dependencies
14-
run: |
15-
apt-get update && apt-get install libsndfile1-dev libgl1 -y
16-
17-
- name: Build doc
18-
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
19-
with:
20-
commit_sha: ${{ github.event.pull_request.head.sha }}
21-
pr_number: ${{ github.event.number }}
22-
package: diffusers
23-
languages: en ko zh
12+
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
13+
with:
14+
commit_sha: ${{ github.event.pull_request.head.sha }}
15+
pr_number: ${{ github.event.number }}
16+
install_libgl1: true
17+
package: diffusers
18+
languages: en ko zh

docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_xl.mdx

Lines changed: 118 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,22 +12,134 @@ specific language governing permissions and limitations under the License.
1212

1313
# Stable diffusion XL
1414

15-
Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of [Stable Diffusion 1](https://stability.ai/blog/stable-diffusion-public-release).
16-
The project to train Stable Diffusion 2 was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/).
15+
Stable Diffusion XL was proposed in [SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952) by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach
1716

18-
*The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels.
19-
These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).*
17+
The abstract of the paper is the following:
2018

21-
For more details about how Stable Diffusion 2 works and how it differs from Stable Diffusion 1, please refer to the official [launch announcement post](https://stability.ai/blog/stable-diffusion-v2-release).
19+
*We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.*
2220

2321
## Tips
2422

23+
- Stable Diffusion XL works especially well with images between 768 and 1024.
24+
- Stable Diffusion XL output image can be improved by making use of a refiner as shown below
25+
2526
### Available checkpoints:
2627

2728
- *Text-to-Image (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-base-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9) with [`StableDiffusionXLPipeline`]
2829
- *Image-to-Image / Refiner (1024x1024 resolution)*: [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9) with [`StableDiffusionXLImg2ImgPipeline`]
2930

30-
TODO
31+
## Usage Example
32+
33+
Before using SDXL make sure to have `transformers`, `accelerate`, `safetensors` and `invisible_watermark` installed.
34+
You can install the libraries as follows:
35+
36+
```
37+
pip install transformers
38+
pip install accelerate
39+
pip install safetensors
40+
pip install invisible-watermark>=2.0
41+
```
42+
43+
### *Text-to-Image*
44+
45+
You can use SDXL as follows for *text-to-image*:
46+
47+
```py
48+
from diffusers import StableDiffusionXLPipeline
49+
import torch
50+
51+
pipe = StableDiffusionXLPipeline.from_pretrained(
52+
"stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
53+
)
54+
pipe.to("cuda")
55+
56+
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
57+
image = pipe(prompt=prompt).images[0]
58+
```
59+
60+
### Refining the image output
61+
62+
The image can be refined by making use of [stabilityai/stable-diffusion-xl-refiner-0.9](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-0.9).
63+
In this case, you only have to output the `latents` from the base model.
64+
65+
```py
66+
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
67+
import torch
68+
69+
pipe = StableDiffusionXLPipeline.from_pretrained(
70+
"stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
71+
)
72+
pipe.to("cuda")
73+
74+
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
75+
"stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
76+
)
77+
refiner.to("cuda")
78+
79+
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
80+
81+
image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
82+
image = refiner(prompt=prompt, image=image[None, :]).images[0]
83+
```
84+
85+
### Loading single file checkpoitns / original file format
86+
87+
By making use of [`~diffusers.loaders.FromSingleFileMixin.from_single_file`] you can also load the
88+
original file format into `diffusers`:
89+
90+
```py
91+
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
92+
import torch
93+
94+
pipe = StableDiffusionXLPipeline.from_pretrained(
95+
"stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
96+
)
97+
pipe.to("cuda")
98+
99+
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
100+
"stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
101+
)
102+
refiner.to("cuda")
103+
```
104+
105+
### Memory optimization via model offloading
106+
107+
If you are seeing out-of-memory errors, we recommend making use of [`StableDiffusionXLPipeline.enable_model_cpu_offload`].
108+
109+
```diff
110+
- pipe.to("cuda")
111+
+ pipe.enable_model_cpu_offload()
112+
```
113+
114+
and
115+
116+
```diff
117+
- refiner.to("cuda")
118+
+ refiner.enable_model_cpu_offload()
119+
```
120+
121+
### Speed-up inference with `torch.compile`
122+
123+
You can speed up inference by making use of `torch.compile`. This should give you **ca.** 20% speed-up.
124+
125+
```diff
126+
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
127+
+ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)
128+
```
129+
130+
### Running with `torch` < 2.0
131+
132+
**Note** that if you want to run Stable Diffusion XL with `torch` < 2.0, please make sure to enable xformers
133+
attention:
134+
135+
```
136+
pip install xformers
137+
```
138+
139+
```diff
140+
+pipe.enable_xformers_memory_efficient_attention()
141+
+refiner.enable_xformers_memory_efficient_attention()
142+
```
31143

32144
## StableDiffusionXLPipeline
33145

src/diffusers/utils/import_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -504,7 +504,7 @@ def is_invisible_watermark_available():
504504

505505
# docstyle-ignore
506506
INVISIBLE_WATERMARK_IMPORT_ERROR = """
507-
{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install git+https://github.com/patrickvonplaten/invisible-watermark.git@remove_onnxruntime_depedency`
507+
{0} requires the invisible-watermark library but it was not found in your environment. You can install it with pip: `pip install invisible-watermark>=2.0`
508508
"""
509509

510510

0 commit comments

Comments
 (0)