Granular-GRPO

Click for the full abstract

The integration of online reinforcement learning (RL) into diffusion and flow models has recently emerged as a promising approach for aligning generative models with human preferences. Stochastic sampling via Stochastic Differential Equations (SDE) is employed during the denoising process to generate diverse denoising directions for RL exploration. While existing methods effectively explore potential high-value samples, they suffer from sub-optimal preference alignment due to sparse and narrow reward signals. To address these challenges, we propose a novel Granular-GRPO (G²RPO) framework that achieves precise and comprehensive reward assessments of sampling directions in reinforcement learning of flow models. Specifically, a Singular Stochastic Sampling strategy is introduced to support step-wise stochastic exploration while enforcing a high correlation between the reward and the injected noise, thereby facilitating a faithful reward for each SDE perturbation. Concurrently, to eliminate the bias inherent in fixed-granularity denoising, we introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales, producing a more comprehensive and robust evaluation of the sampling directions. Experiments conducted on various reward models, including both in-domain and out-of-domain evaluations, demonstrate that our G²RPO significantly outperforms existing flow-based GRPO baselines, highlighting its effectiveness and robustness.

G²RPO: Granular GRPO for Precise Reward in Flow Models
Yujie Zhou*, Pengyang Ling*, Jiazi Bu*, Yibin Wang, Yuhang Zang, Jiaqi Wang^†, Li Niu^†, Guangtao Zhai

(*Equal Contribution)(^†Corresponding Author)

📜 News

[2025/10/3] Code is available now!

[2025/10/2] The paper and project page have been released!

🏗️ Todo

Release a gradio demo.

📚 Gallery

We show more results in the Project Page.

🚀 Method Overview

Granular-GRPO: an online RL framework for precise and comprehensive reward assessments.

🔧 Installations

Setup repository and conda environment

git clone https://github.com/bcmi/Granular-GRPO.git
cd Granular-GRPO

conda create -n g2rpo python=3.10
conda activate g2rpo

bash env_setup.sh

git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2
pip install -e . 
cd ..

The environment dependency is the same as DanceGRPO.

🔑 Model Preparations

1. FLUX

# Download the FLUX.1-dev model.
mkdir ./ckpt/flux
huggingface-cli login
huggingface-cli download --resume-download  black-forest-labs/FLUX.1-dev --local-dir ./ckpt/flux

2. Reward Models

HPS-v2.1

# Download the HPS reward model.
python scripts/huggingface/download_hf.py --repo_id xswu/HPSv2 --local_dir ./ckpt/hps

# Download the CLIP-ViT-H-14-laion2B-s32B-b79K.
python scripts/huggingface/download_hf.py --repo_id laion/CLIP-ViT-H-14-laion2B-s32B-b79K --local_dir ./ckpt/CLIP-ViT-H-14-laion2B-s32B-b79K

CLIP_Score

# Download the CLIP_Score reward model.
python scripts/huggingface/download_hf.py --repo_id apple/DFN5B-CLIP-ViT-H-14 --local_dir ./ckpt/clip_score

🎈 Quick Start

Preprocess Data

# Obtain the embeddings of the prompt dataset.
bash scripts/preprocess/preprocess_flux_rl_embeddings.sh

Training

# Training with 16 GPUs for hps reward.
bash scripts/finetune/finetune_g2rpo_hps.sh

# Training with 16 GPUs for hps and clip_score reward.
bash scripts/finetune/finetune_g2rpo_hps_clip.sh

Inference

We provide our G2RPO ckpt at Huggingface

# Download the G2RPO ckpt
mkdir ./ckpt/g2rpo
huggingface-cli login
huggingface-cli download --resume-download yujieouo/G2RPO diffusion_pytorch_model.safetensors --local-dir ./ckpt/g2rpo

# inference
python scripts/inference/infer.py

📎 Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

@article{zhou2025g2rpo,
  title={G$^2$RPO: Granular GRPO for Precise Reward in Flow Models},
  author={Zhou, Yujie and Ling, Pengyang and Bu, Jiazi and Wang, Yibin and Zang, Yuhang and Wang, Jiaqi and Niu, Li and Zhai, Guangtao},
  journal={arXiv preprint arXiv:2510.01982},
  year={2025}
}

💞 Acknowledgement

The code is built upon the below repositories, we thank all the contributors for open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__assets__		__assets__
fastvideo		fastvideo
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env_setup.sh		env_setup.sh
prompts.txt		prompts.txt
pyproject.toml		pyproject.toml
requirements-lint.txt		requirements-lint.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Granular-GRPO

📜 News

🏗️ Todo

📚 Gallery

🚀 Method Overview

🔧 Installations

Setup repository and conda environment

🔑 Model Preparations

1. FLUX

2. Reward Models

HPS-v2.1

CLIP_Score

🎈 Quick Start

Preprocess Data

Training

Inference

📎 Citation

💞 Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

bcmi/Granular-GRPO

Folders and files

Latest commit

History

Repository files navigation

Granular-GRPO

📜 News

🏗️ Todo

📚 Gallery

🚀 Method Overview

🔧 Installations

Setup repository and conda environment

🔑 Model Preparations

1. FLUX

2. Reward Models

HPS-v2.1

CLIP_Score

🎈 Quick Start

Preprocess Data

Training

Inference

📎 Citation

💞 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages