You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*Our model (right) accurately locates and segments the target subject. Qwen-Image (second from left) fails to locate the correct target, while Nano-banana (third from left) fails to accurately segment the man's head and has loose boundary lines.*
*For the prompt "please segment the girl with red mask," our model (right) is precise. Qwen-Image (second from left) misses the feet, and Nano-banana (third from left) alters the subject's proportions.*
132
132
133
133
During evaluation, thanks to the high consistency of non-edited regions in our model, we can directly derive the segmentation mask by calculating the difference between the edited result and the original image. The results show that our model's performance on segmentation is now on par with specialized vision models.
@@ -150,18 +150,18 @@ The beauty of this method is that it not only fixed the segmentation weakness bu
150
150
151
151
Because the model has learned an unprecedented "respect for boundaries" through thousands of "precise coloring" exercises, this "muscle memory" for fine-grained control has transferred to all editing tasks. Our edit controllability score saw a significant jump from **7.69 to 8.12** across sub-tasks like background, color, and material changes.
*Prompt: "remove the bow tie of the man on the far right." Our model (right) precisely removes only the target bow tie while maintaining background consistency. Qwen (second from left) incorrectly removes multiple bow ties and introduces inconsistencies. Nano-banana (third from left) also struggles with consistency.*
155
155
156
156
#### 3. Stronger ID Consistency
157
157
158
158
A core challenge in portrait editing is maintaining identity. Our model excels here as well. Whether changing a hairstyle or adjusting an expression, the model skillfully preserves the person's core features.
*Top Row (Turn head): Our model (right) maintains ID and background consistency, unlike competitors. Middle Row (Smile): Our model (right) correctly follows the prompt while preserving ID, avoiding distortions seen in others. Bottom Row (Change background): Our model (right) excels at preserving the subject's ID and appearance during a background swap.*
@@ -190,6 +190,9 @@ We suspect 3D understanding, video generation, and other domains have their own
190
190
191
191
**And this is only the overture.**
192
192
193
+
Try out our open-source model **Ming-lite-omni 1.5** on our [**GitHub Page / Demo Page**](https://github.com/inclusionAI/Ming/blob/main/cookbook.ipynb). Please star our repo if you like it!
194
+
195
+
193
196
<!-- ---
194
197
195
198
Try out our open-source model **Ming-lite-omni 1.5** on our [**GitHub Page / Demo Page**](占位符:你的GitHub/Demo链接). Please star our repo if you like it!
0 commit comments