Skip to content

Conversation

@bwasti
Copy link
Contributor

@bwasti bwasti commented Nov 10, 2025

As in title, text can be found in PR content

Downloaded GitHub-hosted images to local assets directory and updated
all image references to use local paths. Converted standalone images to
markdown syntax while keeping centered images as HTML img tags for
proper rendering.

Signed-off-by: Bram Wasti <[email protected]>
Added Jekyll frontmatter with layout, title, and author metadata to
properly render the blog post.

Signed-off-by: Bram Wasti <[email protected]>
Restored the original width and height attributes (340x130 and 480x355)
for the two centered images to maintain their fixed sizing.

Signed-off-by: Bram Wasti <[email protected]>
Added horizontal rule and italic formatting to the acknowledgements
section for better visual separation and styling.

Signed-off-by: Bram Wasti <[email protected]>

In the septillions of flops used to pre-train models, this mismatch between values has largely been avoidable. Pre-training typically runs at a fixed batch size which induces the same reduction kernels to be run - often side-stepping the issue entirely.

Reinforcement learning, on the other hand, seems to almost exclusively run different reduction algorithms due to its inference-heavy (and thus largely latency and memory-bound) nature. Kernels optimized for low-batch size inference typically run reductions all at once, whereas kernels for training models parallelize heavily to reuse data and amp up compute utilization. That means the generators and the trainers are typically running completely different kernels!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernels optimized for low-batch size inference typically run reductions all at once

I don't understand this part. Are you talking about reductions like in RMS norm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the kernels, they don't tile. let me use the word "tile" for clarity

Added links to #sig-post-training and #sig-batch-invariant Slack channels
in the blog post to invite readers to contribute to future developments.

Signed-off-by: Bram Wasti <[email protected]>
@youkaichao youkaichao merged commit b56f9ce into vllm-project:main Nov 12, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants