-
Notifications
You must be signed in to change notification settings - Fork 88
CannyEdit: inference script and demo #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @wtomin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the CannyEdit feature, a training-free framework for versatile image editing. It includes the core inference script, a web-based demo for mask generation, and comprehensive documentation. The system supports region-specific and multi-region image edits, with capabilities for automated prompt generation and advanced attention mechanisms for precise control. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the inference script and demo for CannyEdit, a new image editing method. The scope of the changes is large, introducing a complete example with a web UI, inference logic, and model definitions. The code is generally well-structured, but there are several critical issues related to correctness and maintainability that need to be addressed. Key areas for improvement include fixing a major logic flaw in the main inference script, resolving TypeError
bugs in model layers, improving the robustness of the web application, and addressing significant code duplication. I've provided detailed comments and suggestions to resolve these issues.
print("Running CannyEdit") | ||
# Stage 1: Generation | ||
stage1 = "stage_removal" | ||
result = cannyedit_pipeline( | ||
prompt_source=args.prompt_source, | ||
prompt_local1=args.prompt_local[0], | ||
prompt_target=args.prompt_target, | ||
prompt_local_addition=args.prompt_local[1:], | ||
controlnet_image=image, | ||
local_mask=local_mask, | ||
local_mask_addition=local_mask_addition, | ||
width=args.width, | ||
height=args.height, | ||
guidance=args.guidance, | ||
num_steps=args.num_steps, | ||
seed=args.seed, | ||
true_gs=args.true_gs, | ||
control_weight=args.control_weight, | ||
control_weight2=args.control_weight2, | ||
neg_prompt=args.neg_prompt, | ||
# removal_add | ||
neg_prompt2=args.neg_prompt2, | ||
timestep_to_start_cfg=args.timestep_to_start_cfg, | ||
stage=stage1, | ||
generate_save_path=args.generate_save_path, | ||
inversion_save_path=args.inversion_save_path, | ||
) | ||
|
||
# Save the edited image | ||
if not os.path.exists(args.save_folder): | ||
os.mkdir(args.save_folder) | ||
ind = len(os.listdir(args.save_folder)) | ||
result_save_path = os.path.join(args.save_folder, f"result_{ind}.png") | ||
result.save(result_save_path) | ||
|
||
if removal_flag is False: | ||
# Stage 1: Generation | ||
stage1 = "stage_generate" | ||
print("Running CannyEdit") | ||
result = cannyedit_pipeline( | ||
prompt_source=args.prompt_source, | ||
prompt_local1=args.prompt_local[0], | ||
prompt_target=args.prompt_target, | ||
prompt_local_addition=args.prompt_local[1:], | ||
controlnet_image=image, | ||
local_mask=local_mask, | ||
local_mask_addition=local_mask_addition, | ||
width=args.width, | ||
height=args.height, | ||
guidance=args.guidance, | ||
num_steps=args.num_steps, | ||
seed=args.seed, | ||
true_gs=args.true_gs, | ||
control_weight=args.control_weight, | ||
control_weight2=args.control_weight2, | ||
neg_prompt=args.neg_prompt, | ||
neg_prompt2=args.neg_prompt2, | ||
timestep_to_start_cfg=args.timestep_to_start_cfg, | ||
stage=stage1, | ||
generate_save_path=args.generate_save_path, | ||
inversion_save_path=args.inversion_save_path, | ||
) | ||
|
||
# Save the edited image | ||
if not os.path.exists(args.save_folder): | ||
os.mkdir(args.save_folder) | ||
ind = len(os.listdir(args.save_folder)) | ||
result_save_path = os.path.join(args.save_folder, f"result_{ind}.png") | ||
result.save(result_save_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical logic flaw in how the cannyedit_pipeline
is executed. The current structure can lead to the pipeline being run twice or not at all, depending on the input arguments.
Specifically:
- If prompts are not provided (
args.prompt_source
isNone
), the pipeline is first executed withstage1 = "stage_removal"
(lines 382-413). Then, ifremoval_flag
isFalse
, it runs again withstage1 = "stage_generate"
(lines 417-448). - If prompts are provided and
removal_flag
isTrue
, the pipeline is never executed.
This leads to incorrect behavior and wasted computation. The logic should be refactored to ensure the pipeline is called only once with the correct stage.
I suggest restructuring the main
function to:
- Determine the
stage
based onremoval_flag
. - Generate prompts if they are not provided.
- Execute the pipeline a single time with the correct parameters.
) -> Tensor: | ||
if image_proj is None: | ||
return self.processor(self, x, vec, pe, attention_kwargs) | ||
else: | ||
return self.processor(self, x, vec, pe, image_proj, ip_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the DoubleStreamBlock
, there is a TypeError
here. The construct
method of SingleStreamBlock
calls its processor with image_proj
and ip_scale
, but the __call__
method of SingleStreamBlockProcessor
does not accept these arguments. The signature of SingleStreamBlockProcessor.__call__
must be updated to resolve this.
if image_proj is None: | ||
return self.processor(self, img, txt, vec, pe, attention_kwargs) | ||
else: | ||
return self.processor(self, img, txt, vec, pe, image_proj, ip_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a TypeError
here. The construct
method of DoubleStreamBlock
calls self.processor
(an instance of DoubleStreamBlockProcessor
) with image_proj
and ip_scale
arguments when image_proj
is not None
. However, the __call__
method of DoubleStreamBlockProcessor
does not accept these arguments. The signature of DoubleStreamBlockProcessor.__call__
needs to be updated to accept image_proj
and ip_scale
to fix this bug.
SAM_AVAILABLE = False | ||
else: | ||
print("SAM2.1 is not available. Please install segment-anything package.") | ||
app.run(host="0.0.0.0", port=5000, debug=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running a Flask application with debug=True
in a script that might be deployed is a significant security risk. The debug mode can expose sensitive information and allow arbitrary code execution. It's recommended to disable debug mode by default and make it configurable, for example, through an environment variable or a command-line argument.
app.run(host="0.0.0.0", port=5000, debug=False)
tqdm==4.67.1 | ||
transformers==4.50.0 | ||
hydra-core>=1.3.2 | ||
torch # load SAM2 pytorch weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reproducibility, it's crucial to pin dependency versions. The torch
package is listed without a specific version, which could lead to different behavior or errors if a new version is installed. Please specify a version that is known to work with your project.
torch>=2.1.0 # load SAM2 pytorch weights
curr_atten = attn_weight[:, :, -image_size:, 512 : 512 * (num_edit_region + 1)].copy() | ||
attn_weight[:, :, -image_size:, 512 : 512 * (num_edit_region + 1)] = mint.where( | ||
union_mask == 1, curr_atten, curr_atten * (local_t2i_strength) | ||
) | ||
# amplify the attention between the target prompt and the whole image | ||
curr_atten1 = attn_weight[:, :, -image_size:, :512].copy() | ||
attn_weight[:, :, -image_size:, :512] = curr_atten1 * (context_t2i_strength) | ||
|
||
for local_mask in local_mask_list: | ||
# outside the union of masks is 1 | ||
mask1_flat = union_mask.flatten() # (local_mask).flatten() | ||
mask1_indices = 512 * (num_edit_region + 1) + mint.nonzero(mask1_flat, as_tuple=True)[0] | ||
# mask2_flat inside the mask is 1 | ||
mask2_flat = (1 - local_mask).flatten() | ||
mask2_indices = 512 * (num_edit_region + 1) + mint.nonzero(mask2_flat, as_tuple=True)[0] | ||
# inside the other masks is 1 | ||
mask3_flat = 1 - mint.logical_or(mask1_flat.bool(), mask2_flat.bool()).int() | ||
mask3_indices = 512 * (num_edit_region + 1) + mint.nonzero(mask3_flat, as_tuple=True)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 512
is used multiple times in this section. This number likely corresponds to the text embedding dimension. Hardcoding it makes the code harder to understand and maintain. It should be defined as a named constant or passed as a parameter to the function to improve clarity and make it easier to modify if the embedding dimension changes.
def __call__(self, attn, x, pe, **attention_kwargs): | ||
print("2" * 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parent_dir = os.path.abspath(os.path.join(os.getcwd(), os.pardir)) | ||
sam_dir = os.path.join(parent_dir, "sam2") | ||
sys.path.insert(0, sam_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manipulating sys.path
dynamically can lead to fragile and hard-to-maintain code. It makes the script dependent on the directory from which it is run. A more robust approach would be to structure the project as a package and use relative imports, or to set the PYTHONPATH
environment variable. This improves code portability and makes dependencies explicit.
CannyEdit is a novel training-free framework to support multitask image editing. It enables high-quality region-specific image edits, especially useful in cases where SOTA free-form image editing methods fail to ground edits accurately. Besides, it can support edits on multiple user-specific regions at one generation pass when multiple masks are given. | ||
|
||
<p align="center"> | ||
<img src=./assets/page_imgs/grid_image.png width=500 /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parser.add_argument("--num_steps", type=int, default=50, help="The num_steps for diffusion process") | ||
parser.add_argument("--guidance", type=float, default=4, help="The guidance for diffusion process") | ||
parser.add_argument( | ||
"--seed", type=int, default=random.randint(0, 9999999), help="A seed for reproducible inference" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a random seed by default makes experiments non-reproducible. For scientific and debugging purposes, it's better to use a fixed default seed (e.g., 42
). This ensures that anyone running the script gets the same result. The user can still override it with a specific seed if they want randomness.
"--seed", type=int, default=random.randint(0, 9999999), help="A seed for reproducible inference" | |
"--seed", type=int, default=42, help="A seed for reproducible inference" |
examples/canny_edit/src/util.py
Outdated
# Invert the mask (object area becomes 0, background becomes 1) | ||
local_mask = 1 - binary_downsampled_mask | ||
|
||
# Convert the final mask to a PyTorch tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong comments
What does this PR do?
Adds # (feature)
The inference script, document, and demo for CannyEdit, a training-free method for versatile image editing tasks.
Before submitting
What's New
. Here are thedocumentation guidelines
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@SamitHuang @vigo999