Implement Frequency-Decoupled Guidance (FDG) as a Guider #11976

dg845 · 2025-07-23T02:47:39Z

What does this PR do?

This PR implements frequency-decoupled guidance (FDG) (paper), a new guidance strategy, as a guider. The idea behind FDG is to decompose the CFG prediction into low- and high-frequency components and apply guidance separately to each via a CFG-style update (with separate guidance scales $w_{low}$ and $w_{high}$). The authors find that low guidance scales work better for the low-frequency components while high guidance scales work better for the high-frequency components (e.g. you should set $w_{low} < w_{high}$).

Fixes #11956.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@a-r-r-o-w
@yiyixuxu
@Msadat97

…uider

dg845 · 2025-07-23T03:06:06Z

Some notes on the initial implementation:

I have followed the paper implementation in Algorithm 2, which uses the kornia library to build a Laplacian pyramid as the frequency transform $\psi$. I'm not sure if this is already a dependency for diffusers; it happens to be in the dev environment I'm using, but doesn't appear to be explicitly pinned in setup.py.
Right now, the FrequencyDecoupledGuidance class accepts guidance_scale_low and guidance_scale_high arguments in __init__ for $w_{low}$ and $w_{high}$, and similarly for other parameters such as parallel_weights_low/parallel_weights_high. Alternatively, we could accept a e.g. guidance_scales: Tuple[int] = [10.0, 5.0] argument for $w_{high} = 10$ and $w_{low} = 5$, and have all similar parameters (e.g. parallel_weights, guidance_rescale, etc.) be tuples of the same length. The latter approach is nice because it supports multiple frequency levels and makes it easier to decouple all parameters for each frequency level, but might be less usable, especially if using only 2 levels (low and high frequency) is the dominant use case.

Msadat97 · 2025-07-23T11:49:05Z

Thank you for the quick implementation. Regarding your question, I believe it's cleaner to use tuples for the weights, as it allows users to seamlessly apply multiple levels when finer control over the generation is needed.

a-r-r-o-w

@dg845 Thanks for taking it up, implementation looks great!

What you suggested about tuples sounds good, let's do that. We can always update the implementation later if needed to simplify since modular guiders is experimental at the moment (plus, users can pass their own guider implementations so if someone wants to simplify, it will be quite easy to take your implementation and make the necessary modifications)

Let's not add kornia as a dependancy. Instead, we can do the same thing done in attention dispatcher (import only if package is available):

diffusers/src/diffusers/models/attention_dispatch.py

Line 63 in 7ae6347

if _CAN_USE_FLASH_ATTN_3:

a-r-r-o-w · 2025-07-23T12:59:02Z

src/diffusers/guiders/frequency_decoupled_guidance.py

+import math
+from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union
+
+import kornia


Could we add a is_kornia_available to diffusers.utils.import_utils and import only if user already has it downloaded? A check could exist in __init__ as well so that if user tries to instantiate FDG guider, it fails if kornia isn't available

I have added a is_kornia_available function to utils and added logic in the FDG guider to only import from kornia if available following the Flash Attention 3 example above.

… each freq level

dg845 · 2025-07-24T01:02:03Z

Hi @Msadat97, quick question: how should FDG interact with guidance rescaling (from https://arxiv.org/pdf/2305.08891)? Currently, I'm rescaling in frequency space for each frequency level, with different guidance_rescale values allowed for different levels, but would it make more sense to rescale after the FDG prediction is mapped back to data space (in which case there would only be one guidance_rescale value for all frequency levels)?

Msadat97 · 2025-07-24T10:22:08Z

It seems more natural to perform a single rescaling at the end (after the FDG prediction) since FDG is meant to replace the CFG output. Rescaling in the frequency domain is also possible, but I can’t comment further as we haven’t tested FDG with guidance rescaling. Do you have any output comparisons for this?

SahilCarterr · 2025-07-24T11:50:28Z

Can you share a code snippet how to use FDG . @dg845

Msadat97 · 2025-07-25T13:00:10Z

@dg845 I noticed a mistake in the implementation. pred_cond and pred_uncond in the for loop should come from the Laplacian pyramid, but the current code uses the values in the data space. Could you please fix this? The correct approach is given in the paper:

…(speculative)

dg845 · 2025-07-29T02:25:52Z

Here is a code sample for running the new FDG guider with a SD-XL modular pipeline:

import torch

from diffusers.guiders import FrequencyDecoupledGuidance
from diffusers.modular_pipelines import SequentialPipelineBlocks
from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS


modular_repo_id = "YiYiXu/modular-loader-t2i-0704"

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
num_inference_steps = 50
device = "cuda"
dtype = torch.float16
seed = 42

generator = torch.Generator(device=device)
generator.manual_seed(seed)
init_generator_state = generator.get_state()

# Create default SD-XL text-to-image ModularPipeline (with CFG guider)
t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS)
sdxl_pipeline = t2i_blocks.init_pipeline(modular_repo_id)
# Load pretrained components
sdxl_pipeline.load_default_components(torch_dtype=dtype)
sdxl_pipeline.to(device)

# Create CFG baseline image
cfg_guider_spec = sdxl_pipeline.get_component_spec("guider")
cfg_guidance_scale = cfg_guider_spec.config["guidance_scale"]
cfg_image = sdxl_pipeline(
	prompt=prompt,
	num_inference_steps=num_inference_steps,
	generator=generator,
	output="images",
)[0]
cfg_image.save(f"cfg_image_{cfg_guidance_scale}.png")

# Swap in new FDG guider
# Guidance scales listed from high frequency to low frequency
fdg_guidance_scales = [10.0, 5.0]
fdg_guider = FrequencyDecoupledGuidance(guidance_scales=fdg_guidance_scales)
sdxl_pipeline.update_components(guider=fdg_guider)
# TODO: is this necessary to instantiate a new guider?
sdxl_pipeline.load_components(names=["guider"], torch_dtype=dtype)

# Create FDG image
# Reset generator state
generator.set_state(init_generator_state)
fdg_image = sdxl_pipeline(
	prompt=prompt,
	num_inference_steps=num_inference_steps,
	generator=generator,
	output="images",
)[0]
fdg_guidance_scale_str = "_".join(f"{scale:.1f}" for scale in fdg_guidance_scales)
fdg_image.save(f"fdg_image_{fdg_guidance_scale_str}.png")

Quick question for @a-r-r-o-w or @yiyixuxu: for a from_config component like a guider, is it necessary to call something like sdxl_pipeline.load_components(names=["guider"]) after swapping it out with sdxl_pipeline.update_components(guider=fdg_guider)? In general, for a modular pipeline, when do from_config components get instantiated?

dg845 · 2025-07-31T00:07:40Z

Here are some samples for the prompt "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" with 50 inference steps for the SD-XL modular pipeline with repo YiYiXu/modular-loader-t2i-0704:

CFG with guidance scale 7.5 (pipeline default):

FDG with guidance scales [10.0, 5.0] (suggested values for SD-XL in Table 8a of the paper):

FDG with guidance scales [12.0, 3.0] (suggested values for SD-XL in Table 8b of the paper):

dg845 · 2025-07-31T00:23:55Z

With the same prompt and number of inference steps, here are some samples using a guidance rescale value of 0.7 (suggested in the paper that introduced guidance rescaling):

CFG with guidance scale 7.5, guidance rescale 0.7:

FDG with guidance scales [10.0, 5.0], guidance rescale 0.7 in data space:

FDG with guidance scales [10.0, 5.0], guidance rescale 0.7 in freq space (using 0.7 at each frequency level):

It looks like rescaling in frequency space still produces coherent images, and may preserve high-frequency details better than rescaling in data space (for example, the extra details on the astronaut's square pack).

a-r-r-o-w · 2025-07-31T01:55:35Z

Quick question for @a-r-r-o-w or @yiyixuxu: for a from_config component like a guider, is it necessary to call something like sdxl_pipeline.load_components(names=["guider"]) after swapping it out with sdxl_pipeline.update_components(guider=fdg_guider)? In general, for a modular pipeline, when do from_config components get instantiated?

@dg845 Probably @yiyixuxu will be better able to answer your question here since I haven't played around with the loader much or fully read through the refactored update_components methods. I use the following for testing:

pipeline.update_components(
    guider=ComponentSpec(
        name="cfg",
        type_hint=<GUIDER_CLASS>,
        config=<GUIDER_INIT_KWARGS},
        default_creation_method="from_config",
    )
)

From what I understand, the update_components method does accept object instances (which can be converted into a ComponentSpec/ConfigMixin), so it should be able to just take the guider object you passed and work. However, if that's not happening, like in this case with you having to call load_components after update_components, it's probably unintended and can be addressed soon

Nice explorations too! In a blind test, I think my preference would definitely lean towards FDG generations :)

It looks like rescaling in frequency space still produces coherent images, and may preserve high-frequency details better than rescaling in data space (for example, the extra details on the astronaut's square pack).

Feel free to add another knob (rescale_in_data_space or a better name) to the guider if you think this should be something other people can play with and explore! One of the goals with modular diffusers is to have maximum flexibility to do whatever you want as an end user, and so this could be one such example of doing something extra

a-r-r-o-w · 2025-07-31T01:57:13Z

Oh lol, I just looked at the code after writing above comment and saw you've already added guidance_rescale_space 🤦

a-r-r-o-w

Thanks again, the implementation is very clean and easy to understand!

a-r-r-o-w · 2025-07-31T02:00:53Z

src/diffusers/guiders/frequency_decoupled_guidance.py

+        if not _CAN_USE_KORNIA:
+            raise ImportError(
+                "The `FrequencyDecoupledGuidance` guider cannot be instantiated because the `kornia` library on which"
+                "it depends is not available in the current environment."


Could you add a simple instruction like pip install kornia to the message here?

a-r-r-o-w · 2025-07-31T02:01:29Z

src/diffusers/guiders/frequency_decoupled_guidance.py

+                "it depends is not available in the current environment."
+            )
+
+        # Set start to earliest start for any freq component and stop to latest stop for any freq component


a-r-r-o-w · 2025-07-31T02:18:58Z

src/diffusers/guiders/frequency_decoupled_guidance.py

+    dtype = v0.dtype
+    # Assume first dim is a batch dim and all other dims are channel or "spatial" dims
+    all_dims_but_first = list(range(1, len(v0.shape)))
+    v0, v1 = v0.double(), v1.double()


@dg845 @Msadat97 Just curious whether this must be float64 and if you've tested the same with float32/lower-dtype and found it harmful? The operations here are very few, but fp64 is extremely slow and I wonder if this has any impact on the overall runtime (maybe negligible for images, but might be worth understanding for when number of tokens is larger, like in video models, and if the dtype here could be potentially user-configurable).

The project function is called when parallel_weights is set (and is not the default value of 1.0), so the upcasted operations will only be performed sometimes.

For now, I have added a upcast_to_double argument which controls whether project will upcast to fp64.

Here are some FDG samples which use a guidance scale of [10.0, 5.0] and a parallel_weights of 1.5. The 1.5 value is somewhat arbitrary; @Msadat97, what is a reasonable range of values for parallel_weights?

FDG with guidance scales [10.0, 5.0], parallel_weights=1.5, upcast to double:

FDG with guidance scales [10.0, 5.0], parallel_weights=1.5, no upcast to double (with pipeline at fp16):

In this case, the images look of similar quality with and without upcasting (with perhaps a slight reduction in quality for the non-upcasted version).

We haven’t specifically tested the FP32 projection part, but I’m not sure how much it affects performance in this case, as the operations involved are quite lightweight and the model still runs in FP16. I just felt it might be safer to use double for normalization and projection to improve numerical accuracy a bit.

Regarding the parallel component, I think it’s best to keep the weight below 1. A value like 0.5 should give a good balance. That said, we used 1 in most parts of the paper and treated it as optional.

@dg845 One last question: are you using the noise prediction (i.e., the model output) for FDG, or the x_0 prediction? Perhaps using x_0 might be better, since frequency decomposition is likely more meaningful there.

Currently, I am using the raw model output, whether that's $x_0$-prediction, $\epsilon$-prediction, $v$-prediction, etc.

I believe it would be difficult to use the $x_0$-prediction in the current modular pipeline design because this would require the FDG guider to know the internal state of the scheduler (in particular, the prediction_type and the beta/sigma/etc. schedule to calculate the $x_0$ prediction). It would also be a little unnatural because in general the scheduler will execute after the guider, and the scheduler's step method usually expects a raw model output and will convert to an $x_0$-prediction internally, so in the FDG guider we would probably have to convert to $x_0$-prediction, get the FDG prediction, and then convert back to the original prediction_type so that the FDG prediction can be used as expected in the scheduler. @yiyixuxu thoughts?

That’s how we implement FDG as well, and it’s similar to how Adaptive Projected Guidance (APG) was handled in the guiders. So I assume it should also be compatible with FDG?

P.S.: btw, this conversion is mainly useful for projection to be more meaningful. Otherwise, it's almost the same for all prediction types, since the frequency operations are linear.

I think the AdaptiveProjectedGuidance guider is implemented in the same way the FDG guider is currently implemented: the forward method takes in pred_cond and pred_uncond arguments but is agnostic as to whether these inputs are $x_0$-prediction, $\epsilon$-prediction, etc., and it doesn't convert to $x_0$ internally.

My statement above that the FDG guider uses the raw model output is probably a little misleading, in the sense that this assumes that the calling code will supply the denoising model's output to the FDG guider. This is the case in e.g. StableDiffusionXLLoopDenoiser:

diffusers/src/diffusers/modular_pipelines/stable_diffusion_xl/denoise.py

Lines 232 to 243 in e46e139

# Predict the noise residual

# store the noise_pred in guider_state_batch so that we can apply guidance across all batches

guider_state_batch.noise_pred = components.unet(

block_state.scaled_latents,

t,

encoder_hidden_states=prompt_embeds,

timestep_cond=block_state.timestep_cond,

cross_attention_kwargs=block_state.cross_attention_kwargs,

added_cond_kwargs=cond_kwargs,

return_dict=False,

)[0]

components.guider.cleanup_models(components.unet)

but we could imagine that the calling PipelineBlock (such as StableDiffusionXLLoopDenoiser) could instead do the conversion to $x_0$ before calling the guider, and then convert back after the guider executes. This would require the scheduler to be in the block's expected_components, and in this case we'd probably want the guider to expose a config like should_convert_to_sample_prediction and the scheduler to expose convert_to_sample_prediction/convert_to_prediction_type methods.

In general, I think it may make more sense to do something like $x_0$ conversion in the calling PipelineBlock, since in the current design PipelineBlocks can have access to the scheduler whereas the guider itself shouldn't be coupled to the scheduler.

HuggingFaceDocBuilderDev · 2025-07-31T02:26:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dg845 · 2025-07-31T05:15:28Z

Thanks @a-r-r-o-w. My current understanding of how from_config components are instantiated in ModularPipeline is as follows:

When the ModularPipeline is initialized (e.g. by calling __init__), from_config components are instantiated whereas from_pretrained components are not instantiated (and thus cannot have their pretrained weights loaded either).
When calling ModularPipeline.update_components on a from_config component instance, the supplied instance is used and a ComponentSpec is extracted from it; if it is called instead on a from_config ComponentSpec, it will instantiate an instance via ComponentSpec.create.
I think ModularPipeline.load_components will instantiate a given from_pretrained component, then load in pretrained state if available. To swap out a from_pretrained component, you can call update_components with a component instance (but not a ComponentSpec).

So my understanding is that the following code from above

fdg_guider = FrequencyDecoupledGuidance(guidance_scales=fdg_guidance_scales)
sdxl_pipeline.update_components(guider=fdg_guider)

should correctly tell the pipeline to use the supplied FDG guider instance, and load_components should not be called (since there's no pretrained state to load in for a guider).

a-r-r-o-w · 2025-08-02T13:33:38Z

@bot /style

github-actions · 2025-08-02T13:34:03Z

Style bot fixed some files and pushed the changes.

a-r-r-o-w · 2025-08-05T09:51:58Z

@dg845 Could you run make fix-copies? I believe just the tests are blocker for merge

Initial commit implementing frequency-decoupled guidance (FDG) as a g…

7d5901d

…uider

dg845 mentioned this pull request Jul 23, 2025

Frequency-Decoupled Guidance (FDG) for diffusion models #11956

Open

dg845 added 2 commits July 22, 2025 20:47

Update FrequencyDecoupledGuidance docstring to describe FDG

fe824a8

Update project so that it accepts any number of non-batch dims

6949ece

a-r-r-o-w reviewed Jul 23, 2025

View reviewed changes

dg845 added 5 commits July 23, 2025 16:48

Change guidance_scale and other params to accept a list of params for…

8c05d64

… each freq level

Add comment with Laplacian pyramid shapes

33822e8

Add function to import_utils to check if the kornia package is available

565ce2a

Only import from kornia if package is available

f608c5f

Merge branch 'main' into fdg-guider

34427b7

dg845 marked this pull request as ready for review July 24, 2025 00:36

dg845 changed the title ~~[WIP] Implement Frequency-Decoupled Guidance (FDG) as a Guider~~ Implement Frequency-Decoupled Guidance (FDG) as a Guider Jul 24, 2025

dg845 added 2 commits July 25, 2025 14:36

Fix bug: use pred_cond/uncond in freq space rather than data space

c5070e0

Allow guidance rescaling to be done in data space or frequency space …

149c915

…(speculative)

Merge branch 'main' into fdg-guider

0faa57a

a-r-r-o-w approved these changes Jul 31, 2025

View reviewed changes

dg845 added 2 commits July 30, 2025 20:49

Add kornia install instructions to kornia import error message

0a3f908

Add config to control whether operations are upcast to fp64

259952a

Add parallel_weights recommended values to docstring

9c94aef

github-actions bot and others added 2 commits August 2, 2025 13:34

Apply style fixes

4c379a4

Merge branch 'main' into fdg-guider

a4a829e

	# Predict the noise residual
	# store the noise_pred in guider_state_batch so that we can apply guidance across all batches
	guider_state_batch.noise_pred = components.unet(
	block_state.scaled_latents,
	t,
	encoder_hidden_states=prompt_embeds,
	timestep_cond=block_state.timestep_cond,
	cross_attention_kwargs=block_state.cross_attention_kwargs,
	added_cond_kwargs=cond_kwargs,
	return_dict=False,
	)[0]
	components.guider.cleanup_models(components.unet)

Implement Frequency-Decoupled Guidance (FDG) as a Guider #11976

Are you sure you want to change the base?

Implement Frequency-Decoupled Guidance (FDG) as a Guider #11976

Conversation

dg845 commented Jul 23, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

dg845 commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Msadat97 commented Jul 23, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 commented Jul 24, 2025

Uh oh!

Msadat97 commented Jul 24, 2025

Uh oh!

SahilCarterr commented Jul 24, 2025

Uh oh!

Msadat97 commented Jul 25, 2025

Uh oh!

dg845 commented Jul 29, 2025

Uh oh!

dg845 commented Jul 31, 2025

Uh oh!

dg845 commented Jul 31, 2025

Uh oh!

a-r-r-o-w commented Jul 31, 2025

Uh oh!

a-r-r-o-w commented Jul 31, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Msadat97 Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jul 31, 2025

Uh oh!

dg845 commented Jul 31, 2025

Uh oh!

a-r-r-o-w commented Aug 2, 2025

Uh oh!

github-actions bot commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Aug 5, 2025

Uh oh!

Uh oh!

dg845 commented Jul 23, 2025 •

edited

Loading

Msadat97 Jul 31, 2025 •

edited

Loading

github-actions bot commented Aug 2, 2025 •

edited

Loading