[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora #12074

linoytsaban · 2025-08-05T10:55:15Z

Wan 2.2 has 2 transformers, the community has found it to be beneficial to load Wan LoRAs into both transformers and occasionally in different scales as well (this also applies for Wan 2.1 LoRAs, loaded into transformer and transformer_2).
Recently, new lighting LoRA was released for Wan2.2 T2V- with separate weights for transformer (High noise stage) and transformer_2 (Low noise stage)

This PR adds support for LoRA loading into transformer_2 + adds support for lightning LoRA (has alpha keys)

T2V example:

import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan
from diffusers.utils import export_to_video, load_image

dtype = torch.bfloat16
device = "cuda"
vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers", subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers", vae=vae, torch_dtype=dtype)
pipe.to(device)

pipe.load_lora_weights(
   "Kijai/WanVideo_comfy", 
   weight_name="Wan22-Lightning/Wan2.2-Lightning_T2V-A14B-4steps-lora_HIGH_fp16.safetensors", 
    adapter_name="lightning"
)
kwargs = {}
kwargs["load_into_transformer_2"] = True
pipe.load_lora_weights(
   "Kijai/WanVideo_comfy", 
   weight_name="Wan22-Lightning/Wan2.2-Lightning_T2V-A14B-4steps-lora_LOW_fp16.safetensors", 
    adapter_name="lightning_2", **kwargs
)
pipe.set_adapters(["lightning", "lightning_2"], adapter_weights=[1., 1.])

height = 480
width = 832

prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=81,
    guidance_scale=1.0,
    guidance_scale_2=1.0,
    num_inference_steps=4,
    generator=torch.manual_seed(0),
).frames[0]
export_to_video(output, "t2v_out.mp4", fps=16)

t2v_out-5.mp4

HuggingFaceDocBuilderDev · 2025-08-05T11:02:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

luke14free · 2025-08-05T14:27:26Z

curious to see an example @linoytsaban would love to try this out

sayakpaul

Thanks for working on this. Left some comments.

src/diffusers/loaders/lora_conversion_utils.py

sayakpaul · 2025-08-05T14:30:28Z

src/diffusers/loaders/lora_conversion_utils.py

+            has_alpha = f"blocks.{i}.cross_attn.{o}.alpha" in original_state_dict
+            original_key_A = f"blocks.{i}.cross_attn.{o}.{lora_down_key}.weight"
+            converted_key_A = f"blocks.{i}.attn2.{c}.lora_A.weight"
+
+            original_key_B = f"blocks.{i}.cross_attn.{o}.{lora_up_key}.weight"
+            converted_key_B = f"blocks.{i}.attn2.{c}.lora_B.weight"
+
+            if has_alpha:
+                down_weight = original_state_dict.pop(original_key_A)
+                up_weight = original_state_dict.pop(original_key_B)
+                scale_down, scale_up = get_alpha_scales(down_weight, f"blocks.{i}.cross_attn.{o}.alpha")
+                converted_state_dict[converted_key_A] = down_weight * scale_down
+                converted_state_dict[converted_key_B] = up_weight * scale_up
+            else:
+                if original_key_A in original_state_dict:
+                    converted_state_dict[converted_key_A] = original_state_dict.pop(original_key_A)


Same as above.

sayakpaul · 2025-08-05T14:31:23Z

src/diffusers/loaders/lora_pipeline.py

-            hotswap=hotswap,
-        )
+        load_into_transformer_2 = kwargs.pop("load_into_transformer_2", False)
+        if load_into_transformer_2:


Should raise in case geattr(self, "transformer_2", None) is None.

sayakpaul · 2025-08-05T14:32:01Z

src/diffusers/loaders/lora_pipeline.py

@@ -5064,7 +5064,7 @@ class WanLoraLoaderMixin(LoraBaseMixin):
    Load LoRA layers into [`WanTransformer3DModel`]. Specific to [`WanPipeline`] and `[WanImageToVideoPipeline`].
    """

-    _lora_loadable_modules = ["transformer"]
+    _lora_loadable_modules = ["transformer", "transformer_2"]


Just to note that this loader is shared amongst Wan 2.1 and 2.2 as the pipelines are also one and the same. For Wan 2.1, we won't have any transformer_2.

sayakpaul · 2025-08-05T14:32:28Z

src/diffusers/loaders/lora_pipeline.py

+        else:
+            self.load_lora_into_transformer(
+                state_dict,
+                transformer=getattr(self, self.transformer_name) if not hasattr(self,
+                                                                                "transformer") else self.transformer,
+                adapter_name=adapter_name,
+                metadata=metadata,
+                _pipeline=self,
+                low_cpu_mem_usage=low_cpu_mem_usage,
+                hotswap=hotswap,
+            )


Why put it under else?

my thought process was that, as opposed to LoRAs with weights for the transformer and text encoder for example, that we load in one load_lora_weights op, here we can have a situation where we have different weights for each transformer, but the state_dict keys are identical. Also, this way we can load the lora into each transformer separately with different adapter names - making it easy to use different scales for each transformer lora (which was seen to be beneficial for quality). I'm happy to improve this logic, but these are the considerations to keep in mind

Yeah. So, in case users want to load both transformers, won't it just load one if load_into_transformer_2=True?

yep it would, they would need to load separately to each

Can you show some pseudo-code expected from the users? This is another way of loading another adapter into transformer_2:
#12040 (comment)

here: #12074 (comment)

I don't feel strongly about it staying that exact way, but i do think it should remain possible to load different lora weights into the transformers and in different scales

Makes sense. Let's go with this but with a note in the docstrings saying it's experimental in nature.

linoytsaban · 2025-08-05T16:53:35Z

I2V example: using Wan2.2 with Wan2.1 lightning LoRA

import torch
import numpy as np
from diffusers import WanImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

model_id = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"
dtype = torch.bfloat16
device = "cuda"

pipe = WanImageToVideoPipeline.from_pretrained(model_id, torch_dtype=dtype)
pipe.to(device)


pipe.load_lora_weights(
   "Kijai/WanVideo_comfy", 
    weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors", 
    adapter_name="lightning"
)
kwargs = {}
kwargs["load_into_transformer_2"] = True
pipe.load_lora_weights(
  "Kijai/WanVideo_comfy", 
            weight_name="Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors", 
    adapter_name="lightning_2", **kwargs
)
pipe.set_adapters(["lightning", "lightning_2"], adapter_weights=[1., 1.])
pipe.fuse_lora(adapter_names=["lightning"], lora_scale=3., components=["transformer"])
pipe.fuse_lora(adapter_names=["lightning_2"], lora_scale=1., components=["transformer_2"])
pipe.unload_lora_weights()

image = load_image(
    "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/wan_i2v_input.JPG"
)
max_area = 480 * 832
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))
prompt = "POV selfie video, white cat with sunglasses standing on surfboard, relaxed smile, tropical beach behind (clear water, green hills, blue sky with clouds). Surfboard tips, cat falls into ocean, camera plunges underwater with bubbles and sunlight beams. Brief underwater view of cat’s face, then cat resurfaces, still filming selfie, playful summer vacation mood."

negative_prompt = "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走"
generator = torch.Generator(device=device).manual_seed(42)
output = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=height,
    width=width,
    num_frames=81,
    guidance_scale=1,
    num_inference_steps=4,
    generator=generator,
).frames[0]
export_to_video(output, "i2v_output.mp4", fps=16)

i2v_output-84.mp4

luke14free · 2025-08-05T18:09:47Z

thanks a lot for the amazing work @linoytsaban just FYI issue #12047 also applies to this PR, I tried and I get the mismatch error with GGUF models, reporting as they are the most popular way to run Wan on consumer hardware.

mayankagrawal10198 · 2025-08-06T07:52:47Z

@linoytsaban are we sure if we don't put boundary_ratio args in our generation pipe would still choose transformer2 as low noise ? Bcs I can see first PR on wan2.2 #12004 by @yiyixuxu has these lines

 if self.config.boundary_ratio is not None:
            boundary_timestep = self.config.boundary_ratio * self.scheduler.config.num_train_timesteps
        else:
            boundary_timestep = None

        with self.progress_bar(total=num_inference_steps) as progress_bar:
            for i, t in enumerate(timesteps):
                if self.interrupt:
                    continue

                self._current_timestep = t

                if boundary_timestep is None or t >= boundary_timestep:
                    # wan2.1 or high-noise stage in wan2.2
                    current_model = self.transformer
                    current_guidance_scale = guidance_scale
                else:
                    # low-noise stage in wan2.2
                    current_model = self.transformer_2
                    current_guidance_scale = guidance_scale_2

Co-authored-by: Sayak Paul <[email protected]>

linoytsaban · 2025-08-07T14:18:48Z

@linoytsaban are we sure if we don't put boundary_ratio args in our generation pipe would still choose transformer2 as low noise ? Bcs I can see first PR on wan2.2 #12004 by @yiyixuxu has these lines

 if self.config.boundary_ratio is not None:
            boundary_timestep = self.config.boundary_ratio * self.scheduler.config.num_train_timesteps
        else:
            boundary_timestep = None

        with self.progress_bar(total=num_inference_steps) as progress_bar:
            for i, t in enumerate(timesteps):
                if self.interrupt:
                    continue

                self._current_timestep = t

                if boundary_timestep is None or t >= boundary_timestep:
                    # wan2.1 or high-noise stage in wan2.2
                    current_model = self.transformer
                    current_guidance_scale = guidance_scale
                else:
                    # low-noise stage in wan2.2
                    current_model = self.transformer_2
                    current_guidance_scale = guidance_scale_2

yes @mayankagrawal10198 it should still use transformer_2 for the low noise stage since the default config sets the boundary ratio of I2V to 0.9 and T2v to 0.875 (https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers/blob/main/model_index.json, https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers/blob/main/model_index.json), you can pass them explicitly to the pipeline if you wish to experiment with different values

src/diffusers/loaders/lora_pipeline.py

linoytsaban · 2025-08-07T15:59:25Z

@bot /style

github-actions · 2025-08-07T15:59:48Z

Style fix is beginning .... View the workflow run here.

linoytsaban · 2025-08-11T06:07:43Z

@bot /style

github-actions · 2025-08-11T06:08:10Z

Style bot fixed some files and pushed the changes.

mayankagrawal10198 · 2025-08-11T07:06:49Z

@linoytsaban are we sure if we don't put boundary_ratio args in our generation pipe would still choose transformer2 as low noise ? Bcs I can see first PR on wan2.2 #12004 by @yiyixuxu has these lines
 if self.config.boundary_ratio is not None:
            boundary_timestep = self.config.boundary_ratio * self.scheduler.config.num_train_timesteps
        else:
            boundary_timestep = None

        with self.progress_bar(total=num_inference_steps) as progress_bar:
            for i, t in enumerate(timesteps):
                if self.interrupt:
                    continue

                self._current_timestep = t

                if boundary_timestep is None or t >= boundary_timestep:
                    # wan2.1 or high-noise stage in wan2.2
                    current_model = self.transformer
                    current_guidance_scale = guidance_scale
                else:
                    # low-noise stage in wan2.2
                    current_model = self.transformer_2
                    current_guidance_scale = guidance_scale_2
yes @mayankagrawal10198 it should still use transformer_2 for the low noise stage since the default config sets the boundary ratio of I2V to 0.9 and T2v to 0.875 (https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers/blob/main/model_index.json, https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers/blob/main/model_index.json), you can pass them explicitly to the pipeline if you wish to experiment with different values

Hi @linoytsaban , Thanks for the reply. Please correct my understanding for this.
If boundary ratio is 0.9 by default and we put num_inference steps as 4 for Lightx Lora, Then High_Noise is getting only 0.4 Steps and Low_Noise is getting 3.6 steps. I mean can we put float number on the steps ? I thought we need whole numbers for High_Noise and Low_Noise like if we put Num_Inference as 8 and Put Boundary_Ratio as 0.75. Then High_Noise will get atleat 2 Setps and Low_Noise will get 6 steps.
I really need to understand this.

innokria · 2025-08-14T18:24:47Z

Hey Guys this is amazing work.. There is now a new concept to do this in 3 stages

3 stage approach==> The first stage uses the original WAN2.2 model, without Lightx2v lora. This allows for faster motions to be generated. The 2nd and 3rd stage uses the High and Low Lightx2v loras like normal.

I will do some experiment on this :)

linoytsaban added 2 commits August 5, 2025 13:25

add alpha

96864fb

load into 2nd transformer

0847255

linoytsaban requested review from sayakpaul and a-r-r-o-w August 5, 2025 10:55

Merge branch 'main' into wan22-lightx2v

dcce164

sayakpaul reviewed Aug 5, 2025

View reviewed changes

linoytsaban and others added 7 commits August 7, 2025 10:26

Merge branch 'main' into wan22-lightx2v

d083f86

Update src/diffusers/loaders/lora_conversion_utils.py

5284a9c

Co-authored-by: Sayak Paul <[email protected]>

Update src/diffusers/loaders/lora_conversion_utils.py

0a7be77

Co-authored-by: Sayak Paul <[email protected]>

pr comments

b7e24d9

pr comments

bcb0924

pr comments

cabcf3d

Merge branch 'main' into wan22-lightx2v

eda4d4b

luke14free reviewed Aug 7, 2025

View reviewed changes

src/diffusers/loaders/lora_pipeline.py Outdated Show resolved Hide resolved

fix

4fdf400

linoytsaban and others added 5 commits August 8, 2025 10:34

Merge remote-tracking branch 'origin/wan22-lightx2v' into wan22-lightx2v

0ed988c

Merge branch 'main' into wan22-lightx2v

f3afbf1

fix

724b9a2

Merge remote-tracking branch 'origin/wan22-lightx2v' into wan22-lightx2v

daaa598

Merge branch 'main' into wan22-lightx2v

6e8d333

Apply style fixes

b09fc48

linoytsaban and others added 5 commits August 11, 2025 14:35

Merge branch 'main' into wan22-lightx2v

af03f73

Merge branch 'main' into wan22-lightx2v

729252e

fix copies

ea451d1

fix

18382f4

fix copies

4c425e2

linoytsaban requested a review from sayakpaul August 13, 2025 13:47

Merge branch 'main' into wan22-lightx2v

a57aa54

asomoza mentioned this pull request Aug 14, 2025

Wan2.2 Lightning LoRA Loading Fails #12147

Open

[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora #12074

Are you sure you want to change the base?

[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora #12074

Conversation

linoytsaban commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 5, 2025

Uh oh!

luke14free commented Aug 5, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linoytsaban commented Aug 5, 2025

Uh oh!

luke14free commented Aug 5, 2025

Uh oh!

mayankagrawal10198 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linoytsaban commented Aug 7, 2025

Uh oh!

Uh oh!

linoytsaban commented Aug 7, 2025

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

linoytsaban commented Aug 11, 2025

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mayankagrawal10198 commented Aug 11, 2025

Uh oh!

innokria commented Aug 14, 2025

Uh oh!

Uh oh!

linoytsaban commented Aug 5, 2025 •

edited

Loading

mayankagrawal10198 commented Aug 6, 2025 •

edited

Loading

github-actions bot commented Aug 11, 2025 •

edited

Loading