-
Notifications
You must be signed in to change notification settings - Fork 88
CannyEdit: inference script and demo #1346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,223 @@ | ||||||
<h1 align="center">CannyEdit: Easy multitask image editing</h1> | ||||||
|
||||||
<p align="center"> | ||||||
<a href="https://vaynexie.github.io/CannyEdit/"> | ||||||
<img alt="Build" src="https://img.shields.io/badge/Project%20Page-CannyEdit-yellow"> | ||||||
</a> | ||||||
<a href="https://arxiv.org/abs/2508.06937"> | ||||||
<img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2508.06397-b31b1b.svg"> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The arXiv badge has an inconsistent link and text. The link points to
Suggested change
|
||||||
</a> | ||||||
</p> | ||||||
|
||||||
# Overview | ||||||
|
||||||
This is the official MindSpore implementation of [CannyEdit](https://vaynexie.github.io/CannyEdit/). | ||||||
|
||||||
CannyEdit is a novel training-free framework to support multitask image editing. It enables high-quality region-specific image edits, especially useful in cases where SOTA free-form image editing methods fail to ground edits accurately. Besides, it can support edits on multiple user-specific regions at one generation pass when multiple masks are given. | ||||||
|
||||||
<p align="center"> | ||||||
<img src=./assets/page_imgs/grid_image.png width=500 /> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
</p> | ||||||
<p align="center"> | ||||||
<em> Figure 1. Examples of CannyEdit </em> | ||||||
</p> | ||||||
|
||||||
## 📦 Requirements | ||||||
|
||||||
<div align="center"> | ||||||
|
||||||
| MindSpore | Ascend Driver | Firmware | CANN toolkit/kernel | | ||||||
|:---------:|:-------------:|:-----------:|:-------------------:| | ||||||
| 2.7.0 | 24.1.RC3 | 7.6.0.1.220 | 8.0.RC3.beta1 | | ||||||
|
||||||
</div> | ||||||
|
||||||
1. Install | ||||||
[CANN 8.0.RC3.beta1](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.beta1) and MindSpore according to the [official instructions](https://www.mindspore.cn/install). | ||||||
2. Install requirements | ||||||
```shell | ||||||
pip install -r requirements.txt | ||||||
``` | ||||||
3. Install mindone | ||||||
```shell | ||||||
cd mindone | ||||||
pip install -e . | ||||||
``` | ||||||
Try `python -c "import mindone"`. If no error occurs, the installation is successful. | ||||||
|
||||||
## 🚀 Quick Start | ||||||
The pipeline of using CannyEdit consists of 3 steps: | ||||||
1. Generate masks (Optional. Skipped if you have) | ||||||
2. Generate prompts (Optional. Skipped if you have) | ||||||
3. Generate edited image | ||||||
|
||||||
### Step 1: Generate masks (Optional) | ||||||
At first, the step needs model weights of [SAM2](https://github.com/facebookresearch/sam2/). Please download it using tools in `examples/sam2`. | ||||||
```bash | ||||||
cd examples/sam2/checkpoints && \ | ||||||
./download_ckpts.sh && | ||||||
``` | ||||||
And the checkpoints will be downloaded into examples/sam2/checkpoints. | ||||||
|
||||||
Then, modify the path of checkpoint in the script file below. And run the shell script to launch the app of mask generator. | ||||||
```bash | ||||||
cd examples/canny_edit && \ | ||||||
bash run_app_mask.sh | ||||||
``` | ||||||
Then open the address of http://localhost:5000. If you use browser remotely, you can set on your remote machine as below: | ||||||
```bash | ||||||
ssh -L 8081:localhost:5000 username@ip | ||||||
``` | ||||||
According to the mapping, just open the address of http://localhost:8081 on your remote machine. | ||||||
|
||||||
In the webpage of mask generator, choose specific method for corresponding editing task. | ||||||
|
||||||
- Adding task: Circle a target area where you want to add an object or person. Then click "Generate Ellipse Mask" | ||||||
- Replace and removal tasks: Draw a line on a certain area of an existing object or person. Then click "Generate SAM Mask" | ||||||
|
||||||
### Step 2: Generate prompts (Optional) | ||||||
In main.py, it will check if there is not source prompt for input image or target prompt for edited image. It will call Visual Language Model (VLM) to generate related prompts. Here we use [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). | ||||||
|
||||||
### Step 3: Generate edited image | ||||||
There are several examples listed in run_infer.sh. Just uncomment one of them to generate corresponding case. | ||||||
```bash | ||||||
bash run_infer.sh | ||||||
``` | ||||||
Here are examples of output for each test case | ||||||
|
||||||
- **case 1: Replace background with mountains** | ||||||
```bash | ||||||
python main.py \ | ||||||
--image_path './assets/imgs/girl33.jpeg' \ | ||||||
--image_whratio_unchange \ | ||||||
--save_folder './results/' \ | ||||||
--prompt_local "A mountain." \ | ||||||
--prompt_source "A young girl with red hair smiles brightly, wearing a red and white checkered shirt." \ | ||||||
--prompt_target "A young girl with red hair smiles brightly, wearing a red and white checkered shirt, sitting on a bench with mountains in the background." \ | ||||||
--mask_path "./assets/mask_temp/mask_209_inverse.png" | ||||||
``` | ||||||
|
||||||
|
||||||
<div align="center"> | ||||||
<img src=./assets/imgs/girl33.jpeg width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/mask_temp/mask_209_inverse.png width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/example_results/result_338.png width="240" height="318" /> | ||||||
|
||||||
</div> | ||||||
<p align="center"> | ||||||
<em> From left to right, these are original image, mask image, and generated edited image. </em> | ||||||
</p> | ||||||
|
||||||
- **case 2: Replace the girl with a boy** | ||||||
```bash | ||||||
python main.py \ | ||||||
--image_path './assets/imgs/girl33.jpeg' \ | ||||||
--image_whratio_unchange \ | ||||||
--save_folder './results/' \ | ||||||
--prompt_local "A boy smiling." \ | ||||||
--prompt_source "A young girl with red hair smiles brightly, wearing a red and white checkered shirt." \ | ||||||
--prompt_target "A young boy with red hair smiles brightly, wearing a red and white checkered shirt." \ | ||||||
--mask_path "./assets/mask_temp/mask_208.png" | ||||||
``` | ||||||
<div align="center"> | ||||||
<img src=./assets/imgs/girl33.jpeg width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/mask_temp/mask_208.png width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/example_results/result_339.png width="240" height="318" /> | ||||||
|
||||||
</div> | ||||||
<p align="center"> | ||||||
<em> From left to right, these are original image, mask image, and generated edited image. </em> | ||||||
</p> | ||||||
|
||||||
- **case 3: Add a monkey** | ||||||
```bash | ||||||
python main.py \ | ||||||
--image_path './assets/imgs/girl33.jpeg' \ | ||||||
--image_whratio_unchange \ | ||||||
--save_folder './results/' \ | ||||||
--prompt_local "A monkey playing." \ | ||||||
--prompt_source "A young girl with red hair smiles brightly, wearing a red and white checkered shirt." \ | ||||||
--prompt_target "A young girl with red hair smiles brightly, wearing a red and white checkered shirt, a monkey playing nearby." \ | ||||||
--mask_path "./assets/mask_temp/mask_213.png" | ||||||
``` | ||||||
<div align="center"> | ||||||
<img src=./assets/imgs/girl33.jpeg width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/mask_temp/mask_213.png width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/example_results/result_346.png width="240" height="318" /> | ||||||
|
||||||
</div> | ||||||
<p align="center"> | ||||||
<em> From left to right, these are original image, mask image, and generated edited image. </em> | ||||||
</p> | ||||||
|
||||||
- **case 4: Remove the girl** | ||||||
```bash | ||||||
python main.py \ | ||||||
--image_path './assets/imgs/girl33.jpeg' \ | ||||||
--image_whratio_unchange \ | ||||||
--save_folder './results/' \ | ||||||
--prompt_local '[remove]' \ | ||||||
--mask_path "./assets/mask_temp/mask_208.png" \ | ||||||
--dilate_mask | ||||||
``` | ||||||
<div align="center"> | ||||||
<img src=./assets/imgs/girl33.jpeg width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/mask_temp/mask_208.png width="240" height="318" /> | ||||||
|
||||||
<img src=./assets/example_results/result_800.png width="240" height="318" /> | ||||||
|
||||||
</div> | ||||||
<p align="center"> | ||||||
<em> From left to right, these are original image, mask image, and generated edited image. </em> | ||||||
</p> | ||||||
|
||||||
- **case 5: Replace the girl with a boy + add a monkey** | ||||||
```bash | ||||||
python main.py \ | ||||||
--image_path './assets/imgs/girl33.jpeg' \ | ||||||
--image_whratio_unchange \ | ||||||
--save_folder './results/' \ | ||||||
--prompt_source "A young girl with red hair smiles brightly, wearing a red and white checkered shirt." \ | ||||||
--prompt_local "A boy smiling." \ | ||||||
--prompt_local "A monkey playing." \ | ||||||
--mask_path "./assets/mask_temp/mask_208.png" \ | ||||||
--mask_path "./assets/mask_temp/mask_215.png" \ | ||||||
--prompt_target "A young boy wearing a red and white checkered shirt, a monkey playing nearby." | ||||||
``` | ||||||
<div align="center"> | ||||||
<img src=./assets/imgs/girl33.jpeg width="200" height="265" /> | ||||||
|
||||||
<img src=./assets/mask_temp/mask_208.png width="200" height="265" /> | ||||||
<img src=./assets/mask_temp/mask_215.png width="200" height="265" /> | ||||||
|
||||||
<img src=./assets/example_results/result_345.png width="200" height="265" /> | ||||||
|
||||||
</div> | ||||||
<p align="center"> | ||||||
<em> From left to right, these are original image, two mask images, and generated edited image. </em> | ||||||
</p> | ||||||
|
||||||
|
||||||
## Performance | ||||||
|
||||||
|
||||||
Experiments are tested on Ascend Atlas 800T A2 machines with pyantive mode. | ||||||
|
||||||
- mindspore 2.7.0 | ||||||
|
||||||
| model | cards | resolution | task | steps | s/Step | s/Image | | ||||||
|------------|-------|------------|----------------|-------|--------------|---------------| | ||||||
| CannyEdit | 1 | 768x768 | Replace | 50 | 6.12 | 306 | | ||||||
| CannyEdit | 1 | 768x768 | Add | 50 | 1.96 | 98 | | ||||||
| CannyEdit | 1 | 768x768 | Removal | 50 | 6.6 | 330 | | ||||||
| CannyEdit | 1 | 768x768 | Replace + Add | 50 | 5.7 | 285 | | ||||||
|
||||||
## Acknowledgement | ||||||
The codebase is modified based on [x-flux](https://github.com/XLabs-AI/x-flux). |
Uh oh!
There was an error while loading. Please reload this page.