Skip to content

Question about GRPO training setup for Qwen-Image-Edit-2509 or 2511 #5

Description

@Weistrass

Hi, thanks for sharing this great repository.

I noticed that your repo supports GRPO training for Qwen-Image-Edit-2509, and that it allows choosing between LoRA and full-parameter training modes. I have a few questions regarding training setup and customization:

Multi-reference image training

(1) I would like to train the model to generate a target image conditioned on 4–5 reference images simultaneously. In this case, do you think a configuration like 8 × 140 is sufficient for either LoRA or full training?

(2) Are there any practical differences in feasibility or stability between LoRA and full training for this multi-reference setting?

Configuration considerations

For the above setup, which configuration aspects should I pay special attention to? For example:

(1) Image resolution / sequence length

(2) GRPO-specific hyperparameters (e.g., rollout length, reward normalization)

(3) Any model- or data-related constraints specific to Qwen-Image-Edit-2509 or 2511

Adding custom reward functions:

(1) If I want to add custom reward functions (e.g., for reference consistency or visual alignment). Which part of the codebase should I modify or extend?

(2) Is there a recommended interface or example for registering new reward functions in the GRPO pipeline?

Thanks a lot for your time and for open-sourcing this work. Any guidance would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions