Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Jaihoon Kim*, Taehoon Yoon*, Jisung Hwang*, Minhyuk Sung

Our inference-time scaling precisely aligns pretrained flow models with user preferences—such as text prompts, object quantities, and more.

Release

[12/10/25] 📌 Implementations of differentiable reward (aesthetc image generation) has been released.
[18/09/25] 🎉 Our work has been accepted to NeurIPS 2025.
[02/07/25] ⚙️ Configuration files for compositional image generation and quantity-aware image generation are released.
[26/06/25] 📝 Baseline implementations are released.
[30/04/25] 🚀 The code for quantity-aware image generation has been released.
[21/04/25] 🔥 We have released the implementation of Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing for compositional image generation.

Setup

Create and activate a Conda environment (tested with Python 3.10):

conda create -n rbf python=3.10
conda activate rbf

Clone the repository:

git clone https://github.com/KAIST-Visual-AI-Group/Flow-Inference-Time-Scaling.git
cd Flow-Inference-Time-Scaling

Install PyTorch (tested with version 2.1.0) and required dependencies:

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install git+https://github.com/openai/CLIP.git
pip install -r requirements.txt
pip install -e .

For compositional image generation, we use VQAScore, which can be installed with the following command:

cd third-party/t2v_metrics/
pip install -e .

Configuration:

--text_prompt : Text prompt to guide generation. Required for reward-based sampling.
--filtering_method : Strategy to select or prune particles (bon, smc, code, svdd, rbf).
--batch_size : Number of prompts or samples processed in parallel during inference.
--n_particles : Number of particles used per timestep to explore the reward landscape.
--block_size : Number of timesteps grouped together for blockwise updates (set to 1 except in code.yaml).
--convert_scheduler : Apply interpolant conversion at inference time (vp).
--sample_method : Sampling method (sde, ode).
--diffusion_norm : SDE sampling diffusion norm.
--max_nfe : Total computational budget (in number of function evaluations) available during sampling.
--max_steps : Number of denoising steps in the generative process.
--reward_score : Reward function used for alignment (vqa, counting ,aesthetic).
--init_n_particles : Initial number of particles at the start of generation.

Compositional Image Generation

Host the VQAScore VLM on a separate device to save GPU memory. By default, the server responds on port 5000:

python rbf/corrector/reward_model/vqa_server.py

Run compositional image generation using the following command. To prevent out-of-memory (OOM) issues, we recommend running it on a different device from the VQA server.

You may optionally override configuration values by specifying arguments directly in the command line:

CUDA_VISIBLE_DEVICES={$DEVICE} python main.py --config config/compositional_image/rbf.yaml text_prompt={$TEXT_PROMPT}

Quantity-Aware Image Generation

Run quantity-aware image generation using the following command. For the reward function, we use a combination of Grounding DINO and Segment Anything for robust object detection (experimented on a 48G GPU).

CUDA_VISIBLE_DEVICES={$DEVICE} python main.py --config config/quantity_aware/rbf.yaml text_prompt={$TEXT_PROMPT}

Aesthetic Image Generation

Download the checkpoint (sac+logos+ava1-l14-linearMSE.pth) of pretrained aesthetic score model from this link. Place the checkpoint at ckpt directory and run the following command:

CUDA_VISIBLE_DEVICES={$DEVICE} python main.py --config config/aesthetic_image/rbf_dps.yaml text_prompt={$TEXT_PROMPT}

Citation

If you find our code helpful, please consider citing our work:

@article{kim2025inference,
  title={Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing},
  author={Kim, Jaihoon and Yoon, Taehoon and Hwang, Jisung and Sung, Minhyuk},
  journal={arXiv preprint arXiv:2503.19385},
  year={2025}
}

Acknowledgement

This repository incorporates implementations from Flow Matching and FLUX. We sincerely thank the authors for publicly releasing their codebases.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
asset		asset
config		config
rbf		rbf
third-party/t2v_metrics		third-party/t2v_metrics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Release

Setup

Configuration:

Compositional Image Generation

Quantity-Aware Image Generation

Aesthetic Image Generation

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

KAIST-Visual-AI-Group/Flow-Inference-Time-Scaling

Folders and files

Latest commit

History

Repository files navigation

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing

Release

Setup

Configuration:

Compositional Image Generation

Quantity-Aware Image Generation

Aesthetic Image Generation

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages