CannyEdit: inference script and demo #1346

wtomin · 2025-10-08T06:31:51Z

What does this PR do?

Adds # (feature)

The inference script, document, and demo for CannyEdit, a training-free method for versatile image editing tasks.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SamitHuang @vigo999

gemini-code-assist · 2025-10-08T06:32:41Z

Summary of Changes

Hello @wtomin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the CannyEdit feature, a training-free framework for versatile image editing. It includes the core inference script, a web-based demo for mask generation, and comprehensive documentation. The system supports region-specific and multi-region image edits, with capabilities for automated prompt generation and advanced attention mechanisms for precise control.

Highlights

New Feature: CannyEdit: This PR introduces CannyEdit, a novel training-free framework for versatile image editing tasks. It supports high-quality region-specific edits and multi-region edits in a single generation pass.
Comprehensive Implementation: The implementation includes an inference script, a web-based mask generation demo, detailed documentation (README), and all necessary supporting modules for the Flux model, ControlNet, AutoEncoder, and text/image embedders.
Automated Prompt Generation: The system can automatically generate source and target prompts using the Qwen2.5-VL-7B-Instruct model if they are not explicitly provided by the user, particularly useful for adding and removal tasks.
Interactive Mask Generation: A Flask-based web application (app_mask.py) is provided, allowing users to interactively draw on images to generate masks using either ellipse fitting or SAM2.1, which are then used for editing.
Advanced Denoising and Attention Mechanisms: The core CannyEditPipeline integrates ControlNet and employs specialized denoising functions (denoise_cannyedit, denoise_cannyedit_removal) with 'Attention Amplification' and 'Cyclical Blending' techniques for precise regional control during image generation and removal.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds the inference script and demo for CannyEdit, a new image editing method. The scope of the changes is large, introducing a complete example with a web UI, inference logic, and model definitions. The code is generally well-structured, but there are several critical issues related to correctness and maintainability that need to be addressed. Key areas for improvement include fixing a major logic flaw in the main inference script, resolving TypeError bugs in model layers, improving the robustness of the web application, and addressing significant code duplication. I've provided detailed comments and suggestions to resolve these issues.

gemini-code-assist · 2025-10-08T06:34:53Z

examples/canny_edit/main.py

+        print("Running CannyEdit")
+        # Stage 1: Generation
+        stage1 = "stage_removal"
+        result = cannyedit_pipeline(
+            prompt_source=args.prompt_source,
+            prompt_local1=args.prompt_local[0],
+            prompt_target=args.prompt_target,
+            prompt_local_addition=args.prompt_local[1:],
+            controlnet_image=image,
+            local_mask=local_mask,
+            local_mask_addition=local_mask_addition,
+            width=args.width,
+            height=args.height,
+            guidance=args.guidance,
+            num_steps=args.num_steps,
+            seed=args.seed,
+            true_gs=args.true_gs,
+            control_weight=args.control_weight,
+            control_weight2=args.control_weight2,
+            neg_prompt=args.neg_prompt,
+            # removal_add
+            neg_prompt2=args.neg_prompt2,
+            timestep_to_start_cfg=args.timestep_to_start_cfg,
+            stage=stage1,
+            generate_save_path=args.generate_save_path,
+            inversion_save_path=args.inversion_save_path,
+        )
+
+        #  Save the edited image
+        if not os.path.exists(args.save_folder):
+            os.mkdir(args.save_folder)
+        ind = len(os.listdir(args.save_folder))
+        result_save_path = os.path.join(args.save_folder, f"result_{ind}.png")
+        result.save(result_save_path)
+
+    if removal_flag is False:
+        # Stage 1: Generation
+        stage1 = "stage_generate"
+        print("Running CannyEdit")
+        result = cannyedit_pipeline(
+            prompt_source=args.prompt_source,
+            prompt_local1=args.prompt_local[0],
+            prompt_target=args.prompt_target,
+            prompt_local_addition=args.prompt_local[1:],
+            controlnet_image=image,
+            local_mask=local_mask,
+            local_mask_addition=local_mask_addition,
+            width=args.width,
+            height=args.height,
+            guidance=args.guidance,
+            num_steps=args.num_steps,
+            seed=args.seed,
+            true_gs=args.true_gs,
+            control_weight=args.control_weight,
+            control_weight2=args.control_weight2,
+            neg_prompt=args.neg_prompt,
+            neg_prompt2=args.neg_prompt2,
+            timestep_to_start_cfg=args.timestep_to_start_cfg,
+            stage=stage1,
+            generate_save_path=args.generate_save_path,
+            inversion_save_path=args.inversion_save_path,
+        )
+
+        #  Save the edited image
+        if not os.path.exists(args.save_folder):
+            os.mkdir(args.save_folder)
+        ind = len(os.listdir(args.save_folder))
+        result_save_path = os.path.join(args.save_folder, f"result_{ind}.png")
+        result.save(result_save_path)
+


There is a critical logic flaw in how the cannyedit_pipeline is executed. The current structure can lead to the pipeline being run twice or not at all, depending on the input arguments.

Specifically:

If prompts are not provided (args.prompt_source is None), the pipeline is first executed with stage1 = "stage_removal" (lines 382-413). Then, if removal_flag is False, it runs again with stage1 = "stage_generate" (lines 417-448).

If prompts are provided and removal_flag is True, the pipeline is never executed.

This leads to incorrect behavior and wasted computation. The logic should be refactored to ensure the pipeline is called only once with the correct stage.

I suggest restructuring the main function to:

Determine the stage based on removal_flag.

Generate prompts if they are not provided.

Execute the pipeline a single time with the correct parameters.

gemini-code-assist · 2025-10-08T06:34:53Z

examples/canny_edit/src/modules/layers.py

+    ) -> Tensor:
+        if image_proj is None:
+            return self.processor(self, x, vec, pe, attention_kwargs)
+        else:
+            return self.processor(self, x, vec, pe, image_proj, ip_scale)


Similar to the DoubleStreamBlock, there is a TypeError here. The construct method of SingleStreamBlock calls its processor with image_proj and ip_scale, but the __call__ method of SingleStreamBlockProcessor does not accept these arguments. The signature of SingleStreamBlockProcessor.__call__ must be updated to resolve this.

gemini-code-assist · 2025-10-08T06:34:53Z

examples/canny_edit/src/modules/layers.py

+        if image_proj is None:
+            return self.processor(self, img, txt, vec, pe, attention_kwargs)
+        else:
+            return self.processor(self, img, txt, vec, pe, image_proj, ip_scale)


There is a TypeError here. The construct method of DoubleStreamBlock calls self.processor (an instance of DoubleStreamBlockProcessor) with image_proj and ip_scale arguments when image_proj is not None. However, the __call__ method of DoubleStreamBlockProcessor does not accept these arguments. The signature of DoubleStreamBlockProcessor.__call__ needs to be updated to accept image_proj and ip_scale to fix this bug.

gemini-code-assist · 2025-10-08T06:34:53Z

examples/canny_edit/app_mask.py

+            SAM_AVAILABLE = False
+    else:
+        print("SAM2.1 is not available. Please install segment-anything package.")
+    app.run(host="0.0.0.0", port=5000, debug=True)


Running a Flask application with debug=True in a script that might be deployed is a significant security risk. The debug mode can expose sensitive information and allow arbitrary code execution. It's recommended to disable debug mode by default and make it configurable, for example, through an environment variable or a command-line argument.

app.run(host="0.0.0.0", port=5000, debug=False)

gemini-code-assist · 2025-10-08T06:34:53Z

examples/canny_edit/requirements.txt

+tqdm==4.67.1
+transformers==4.50.0
+hydra-core>=1.3.2
+torch # load SAM2 pytorch weights


For reproducibility, it's crucial to pin dependency versions. The torch package is listed without a specific version, which could lead to different behavior or errors if a new version is installed. Please specify a version that is known to work with your project.

torch>=2.1.0 # load SAM2 pytorch weights

gemini-code-assist · 2025-10-08T06:34:54Z

examples/canny_edit/src/modules/layers.py

+    curr_atten = attn_weight[:, :, -image_size:, 512 : 512 * (num_edit_region + 1)].copy()
+    attn_weight[:, :, -image_size:, 512 : 512 * (num_edit_region + 1)] = mint.where(
+        union_mask == 1, curr_atten, curr_atten * (local_t2i_strength)
+    )
+    # amplify the attention between the target prompt and the whole image
+    curr_atten1 = attn_weight[:, :, -image_size:, :512].copy()
+    attn_weight[:, :, -image_size:, :512] = curr_atten1 * (context_t2i_strength)
+
+    for local_mask in local_mask_list:
+        # outside the union of masks is 1
+        mask1_flat = union_mask.flatten()  # (local_mask).flatten()
+        mask1_indices = 512 * (num_edit_region + 1) + mint.nonzero(mask1_flat, as_tuple=True)[0]
+        # mask2_flat inside the mask is 1
+        mask2_flat = (1 - local_mask).flatten()
+        mask2_indices = 512 * (num_edit_region + 1) + mint.nonzero(mask2_flat, as_tuple=True)[0]
+        # inside the other masks is 1
+        mask3_flat = 1 - mint.logical_or(mask1_flat.bool(), mask2_flat.bool()).int()
+        mask3_indices = 512 * (num_edit_region + 1) + mint.nonzero(mask3_flat, as_tuple=True)[0]


The magic number 512 is used multiple times in this section. This number likely corresponds to the text embedding dimension. Hardcoding it makes the code harder to understand and maintain. It should be defined as a named constant or passed as a parameter to the function to improve clarity and make it easier to modify if the embedding dimension changes.

gemini-code-assist · 2025-10-08T06:34:54Z

examples/canny_edit/src/modules/layers.py

+    def __call__(self, attn, x, pe, **attention_kwargs):
+        print("2" * 30)


This print statement appears to be a leftover from debugging. Such statements should be removed from the final code to keep the output clean.

gemini-code-assist · 2025-10-08T06:34:54Z

examples/canny_edit/app_mask.py

+parent_dir = os.path.abspath(os.path.join(os.getcwd(), os.pardir))
+sam_dir = os.path.join(parent_dir, "sam2")
+sys.path.insert(0, sam_dir)


Manipulating sys.path dynamically can lead to fragile and hard-to-maintain code. It makes the script dependent on the directory from which it is run. A more robust approach would be to structure the project as a package and use relative imports, or to set the PYTHONPATH environment variable. This improves code portability and makes dependencies explicit.

gemini-code-assist · 2025-10-08T06:34:54Z

examples/canny_edit/README.md

+CannyEdit is a novel training-free framework to support multitask image editing. It enables high-quality region-specific image edits, especially useful in cases where SOTA free-form image editing methods fail to ground edits accurately. Besides, it can support edits on multiple user-specific regions at one generation pass when multiple masks are given.
+
+<p align="center">
+   <img src=./assets/page_imgs/grid_image.png width=500 />


In HTML, it's a best practice to enclose attribute values in quotes for correctness and readability. The src and width attributes of the <img> tag are missing quotes.

Suggested change

<img src=./assets/page_imgs/grid_image.png width=500 />

<img src="./assets/page_imgs/grid_image.png" width="500" />

gemini-code-assist · 2025-10-08T06:34:54Z

examples/canny_edit/main.py

+    parser.add_argument("--num_steps", type=int, default=50, help="The num_steps for diffusion process")
+    parser.add_argument("--guidance", type=float, default=4, help="The guidance for diffusion process")
+    parser.add_argument(
+        "--seed", type=int, default=random.randint(0, 9999999), help="A seed for reproducible inference"


Using a random seed by default makes experiments non-reproducible. For scientific and debugging purposes, it's better to use a fixed default seed (e.g., 42). This ensures that anyone running the script gets the same result. The user can still override it with a specific seed if they want randomness.

Suggested change

"--seed", type=int, default=random.randint(0, 9999999), help="A seed for reproducible inference"

"--seed", type=int, default=42, help="A seed for reproducible inference"

examples/canny_edit/src/modules/autoencoder.py

SamitHuang · 2025-10-09T07:29:31Z

examples/canny_edit/src/util.py

+    # Invert the mask (object area becomes 0, background becomes 1)
+    local_mask = 1 - binary_downsampled_mask
+
+    # Convert the final mask to a PyTorch tensor


wrong comments

examples/canny_edit/src/util.py

examples/canny_edit/src/modules/autoencoder.py

examples/canny_edit/README.md

wtomin added 2 commits October 6, 2025 10:50

init cannyedit

008d6e8

sam2 checkpoint path

203806f

wtomin requested review from CaitinZhao, SamitHuang and zhanghuiyao as code owners October 8, 2025 06:31

gemini-code-assist bot reviewed Oct 8, 2025

View reviewed changes

fix ci error

0b68954

SamitHuang reviewed Oct 9, 2025

View reviewed changes

wtomin requested a review from vigo999 October 9, 2025 07:33

wtomin added 4 commits October 9, 2025 15:42

remove torch

9129b28

rm debug

b39cd8d

update readme

dc74262

updates

ce011b7

		def __call__(self, attn, x, pe, **attention_kwargs):
		print("2" * 30)

	<img src=./assets/page_imgs/grid_image.png width=500 />
	<img src="./assets/page_imgs/grid_image.png" width="500" />

	"--seed", type=int, default=random.randint(0, 9999999), help="A seed for reproducible inference"
	"--seed", type=int, default=42, help="A seed for reproducible inference"

CannyEdit: inference script and demo #1346

Are you sure you want to change the base?

CannyEdit: inference script and demo #1346

Uh oh!

Conversation

wtomin commented Oct 8, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Oct 8, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SamitHuang Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants