Skip to content

Commit 2a81f99

Browse files
author
weilong.cwl
committed
update segment difference image
1 parent 36e2bb1 commit 2a81f99

File tree

2 files changed

+17
-7
lines changed

2 files changed

+17
-7
lines changed

content/blog/ming-lite-omni-1_5-seg/index.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,12 @@ Against Qwen-Image and Nano Banana, our model:
130130
![Segmentation Comparison 2](https://mdn.alipayobjects.com/huamei_wp0xz6/afts/img/A*yL2MR7vLQdEAAAAAgEAAAAgAevzJAQ/original)
131131
*For the prompt "please segment the girl with red mask," our model (right) is precise. Qwen-Image (second from left) misses the feet, and Nano-banana (third from left) alters the subject's proportions.*
132132

133-
During evaluation, thanks to the high consistency of non-edited regions in our model, we can directly derive the segmentation mask by calculating the difference between the edited result and the original image. The results show that our model's performance on segmentation is now on par with specialized vision models.
133+
During evaluation, thanks to the high consistency of non-edited regions in our model, we can directly derive the segmentation mask by calculating the difference between the edited result and the original image.
134+
135+
![Calculating difference on Ming-Lite-Omni1.5, Qwen-Image-Edit, Nano-banana](https://mdn.alipayobjects.com/huamei_wp0xz6/afts/img/A*UJX1RJJpu3cAAAAASyAAAAgAevzJAQ/original)
136+
137+
138+
The results show that our model's performance on segmentation is now on par with specialized vision models.
134139

135140
| Model Category | Model Name | RefCOCO (val) | RefCOCO+ (val) | RefCOCOg (val) |
136141
| :--- | :--- | :---: | :---: | :---: |
@@ -140,9 +145,11 @@ During evaluation, thanks to the high consistency of non-edited regions in our m
140145
| | PolyFormer-B | 74.8 | 67.6 | 67.8 |
141146
| **MLLM + Specialist (SAM)** | LISA-7B | 74.1 | 62.4 | 66.4 |
142147
| | PixelLM-7B | 73.0 | 66.3 | 69.3 |
143-
| **Generative Models** | Qwen-Image-Edit* | 30.3 | 28.8 | 34.0 |
144-
| | **Ming-Lite-Omni1.5 (Ours)** | **72.4** | **62.8** | **64.3** |
145-
*<small>Due to its lower metrics, Qwen-Image-Edit was evaluated on a random sample of 500 images per test subset.</small>*
148+
| **Generative Models** | Nano-banana* | 15.7 | 13.9 | 14.9 |
149+
| | Qwen-Image-Edit* | 30.3 | 28.8 | 34.0 |
150+
| | **Ming-Lite-Omni1.5** | **72.4** | **62.8** | **64.3** |
151+
152+
*<small>For each test set, Nano-banana and Qwen-Image-Edit was evaluated on a randomly sampled subset of 500 images, to reduce computational cost while preserving the key statistical trends. We observed that Nano-banana frequently fails to accurately grasp the image segmentation intent during inference, leading to its comparatively lower evaluation metrics. This may be attributed to differences in training objectives and data emphasis.</small>*
146153

147154
#### 2. Sharper, More Controllable Editing
148155

content/blog/ming-lite-omni-1_5-seg/index.zh.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,9 @@ show_word_count: true
7878

7979
在推理分割指标评估过程中,依托于我们模型在非编辑区域的高度一致性,我们直接通过将涂色编辑结果与原图进行差分计算,获得分割掩码,示例如下:
8080

81-
<!-- ![Ming-Lite-Omni1.5 vs Qwen-Image-Edit 差分对比](占位符:请在这里替换为您的图示链接) -->
81+
![Ming-Lite-Omni1.5, Qwen-Image-Edit, Nano-banana 差分对比](https://mdn.alipayobjects.com/huamei_wp0xz6/afts/img/A*UJX1RJJpu3cAAAAASyAAAAgAevzJAQ/original)
82+
8283

83-
评估结果显示,我们的模型在分割任务中的表现已达到与专为分割设计的专业模型相当的水平。其中,Qwen-Image-Edit因评估指标明显较低,仅在每个测试子集上随机采样500个样本进行评估。
8484

8585
| 模型类别 | 模型名称 | RefCOCO (val) | RefCOCO+ (val) | RefCOCOg (val) |
8686
| :--- | :--- | :---: | :---: | :---: |
@@ -90,9 +90,12 @@ show_word_count: true
9090
| | PolyFormer-B | 74.8 | 67.6 | 67.8 |
9191
| **MLLM + SAM**<br>(专用的分割模型) | LISA-7B | 74.1 | 62.4 | 66.4 |
9292
| | PixelLM-7B | 73.0 | 66.3 | 69.3 |
93-
| **MLLM + DiT**<br>(生成式模型做分割) | Qwen-Image-Edit* | 30.3 | 28.8 | 34.0 |
93+
| **MLLM + DiT**<br>(生成式模型做分割) | Nano-banana* | 15.7 | 13.9 | 14.9 |
94+
| | Qwen-Image-Edit* | 30.3 | 28.8 | 34.0 |
9495
| | **Ming-Lite-Omni1.5** | **72.4** | **62.8** | **64.3** |
9596

97+
评估结果表明,我们的模型在分割任务中的表现已接近专为分割设计的专业模型。在评估过程中,Qwen-Image-Edit 和Nano-banana 在每个测试子集上随机采样 500 个样本进行测试,以降低计算开销,同时保证结果的统计趋势稳定。评估过程中我们发现,Nano-banana 在推理中经常无法准确把握图像分割的意图,因此评价指标相对较低,这可能与训练目标和数据侧重差异有关。
98+
9699
### 2. 更精准、更可控的编辑能力
97100

98101
这个方法的魅力在于,它不仅治好了分割的“短板”,还反过来极大地增强了模型的通用编辑能力。

0 commit comments

Comments
 (0)