Skip to content

Conversation

Fzilan
Copy link
Collaborator

@Fzilan Fzilan commented Oct 11, 2025

add hidream pipelines and required modules, comparable with diffusers v0.34.
but currently have oom issues.

@Fzilan Fzilan requested a review from vigo999 as a code owner October 11, 2025 08:21
Copy link
Contributor

Summary of Changes

Hello @Fzilan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone.diffusers library by integrating the innovative HiDream-I1 image generation model and its corresponding pipeline. The core of this addition is the HiDreamImageTransformer2DModel, which features a sophisticated Mixture-of-Experts architecture for enhanced performance and leverages multiple text encoders to process diverse textual inputs. The accompanying HiDreamImagePipeline provides a streamlined interface for users to generate high-quality images from text prompts, complete with robust input handling, detailed documentation, and thorough testing to ensure reliability.

Highlights

  • New HiDream-I1 Model Integration: Introduces the HiDreamImageTransformer2DModel, a novel 2D image transformer model, into the library, designed for advanced image generation tasks.
  • Mixture-of-Experts (MoE) Architecture: The HiDreamImageTransformer2DModel incorporates a Mixture-of-Experts (MoE) gating mechanism within its feed-forward networks, which can lead to more efficient and potentially higher-quality processing by selectively activating specialized sub-networks.
  • Multi-modal Pipeline with Multiple Text Encoders: A new HiDreamImagePipeline is added, capable of leveraging multiple text encoders (specifically CLIP, T5, and LlamaForCausalLM) to process rich and diverse textual inputs for enhanced text-to-image generation.
  • Comprehensive Documentation and Testing: Dedicated documentation files have been created for both the HiDreamImageTransformer2DModel and HiDreamImagePipeline, alongside new unit tests to ensure the correctness, numerical consistency, and reliability of these new components across various data types.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Fzilan Fzilan changed the title Diffusers0.34 dev feat(diffusers/pipelines): add hidream Oct 11, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the HiDream-I1 model by introducing the HiDreamImageTransformer2DModel and HiDreamImagePipeline. The changes include the core model and pipeline implementations, along with corresponding documentation and tests. Overall, the implementation looks solid, but I've identified a critical bug in the model's initialization that would prevent it from running. Additionally, there are a few areas with leftover TODO/FIXME comments, and some minor typos in documentation and test files that should be addressed to improve code quality and maintainability.


self.gradient_checkpointing = False

self.patch_size = self.patch_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This line self.patch_size = self.patch_size is a self-assignment that will raise an AttributeError at runtime because self.patch_size has not been initialized on the instance yet. This is a critical bug that will prevent the model from being instantiated.

Based on the surrounding code, it seems the intention was to assign the value from the configuration. It should be self.patch_size = self.config.patch_size.

Suggested change
self.patch_size = self.patch_size
self.patch_size = self.config.patch_size

Comment on lines 454 to 456
# FIXME: mindspore lacks tensor.scatter_reduce_
# expert_cache.scatter_reduce_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out, reduce="sum")
expert_cache.scatter_add_(0, exp_token_idx.view(-1, 1).repeat(1, x.shape[-1]), expert_out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

A FIXME comment indicates that mindspore lacks tensor.scatter_reduce_. The code uses scatter_add_ as a workaround. This could lead to incorrect behavior if the indices in exp_token_idx are not unique for each operation, potentially affecting model correctness. Please verify if scatter_add_ is a safe replacement here. If it is, the comment should be updated to reflect that. If not, a proper implementation of scatter_reduce with sum reduction should be considered.

Comment on lines +1 to +13
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The license header in this new documentation file uses a mix of HTML comments (<!-- ... -->) and hash comments (#). This is inconsistent with other documentation files and looks like a copy-paste error. For better maintainability and consistency, please use only HTML comments for the entire license block.

Suggested change
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. -->

Comment on lines +98 to +100
# TODO check here, notes: hf use ms.float32 if npu else
# dtype = ms.float32 if (is_mps or is_npu) else ms.float64
dtype = ms.float32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a leftover TODO comment and a hardcoded dtype. The comment suggests that the dtype should be conditionally set based on the hardware (e.g., NPU), but it's currently hardcoded to ms.float32. This should be resolved to ensure correct behavior on different hardware platforms.

>>> import mindspore
>>> import numpy as np
>>> from transformers import AutoTokenizer
>>> form mindone.transformers import LlamaForCausalLM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the example code within the docstring. form should be from. This will cause an error for users who copy and paste the example.

Suggested change
>>> form mindone.transformers import LlamaForCausalLM
>>> from mindone.transformers import LlamaForCausalLM

# expand the latents if we are doing classifier free guidance
latent_model_input = mint.cat([latents] * 2) if self.do_classifier_free_guidance else latents
# broadcast to batch dimension in a way that's compatible with ONNX/Core ML
timestep = t.broadcast_to((latent_model_input.shape[0],)) # .to(latents.dtype) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # .to(latents.dtype) ? suggests uncertainty about whether a dtype conversion is needed for the timestep tensor. This should be clarified. If the conversion is not needed, the comment should be removed to improve code clarity. If it is needed, the code should be added.

Suggested change
timestep = t.broadcast_to((latent_model_input.shape[0],)) # .to(latents.dtype) ?
timestep = t.broadcast_to((latent_model_input.shape[0],))

]


HIDREAM_IMAGE_TRANSFORER2D_CASES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the variable name HIDREAM_IMAGE_TRANSFORER2D_CASES. It should be HIDREAM_IMAGE_TRANSFORMER2D_CASES to match the model name and for consistency. Please also update its usage on line 1569.

Suggested change
HIDREAM_IMAGE_TRANSFORER2D_CASES = [
HIDREAM_IMAGE_TRANSFORMER2D_CASES = [

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant