Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Nov 17, 2025

Runs:

  1. Single GPU GRPO: Beaker
  2. Multi-node GRPO: Beaker

@finbarrtimbers finbarrtimbers changed the title Refactors grpo_fast.py to pull a lot of complexity into a dataloader class Adds StreamingDataLoader class following the OLMo-core definition, and changes grpo_fast.py to use it. Nov 17, 2025
finbarrtimbers and others added 26 commits November 17, 2025 10:28
Moved work_dir, global_batch_size, dp_world_size, and max_possible_score
from StreamingDataLoaderConfig fields to build() method parameters. These
values are computed at runtime from Args and should not be CLI arguments.

- work_dir comes from args.output_dir
- global_batch_size comes from args.num_unique_prompts_rollout
- dp_world_size comes from the actual world_size (number of PolicyTrainerRayProcess instances)
- max_possible_score is computed in Args.__post_init__

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Moved max_prompt_token_length and response_length to StreamingDataLoaderConfig
and added __post_init__ to validate pack_length assertion there instead of in Args.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…erConfig

Removed these fields from Args to avoid argparse conflicts and updated all
references to use streaming_config instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Added async_steps and num_samples_per_prompt_rollout fields to
StreamingDataLoaderConfig and moved the validation logic there.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…oaderConfig

Refactored to access these values directly from config instead of passing
them as parameters. Updated function signatures to pass streaming_config
where needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Move num_samples_per_prompt_rollout validation to StreamingDataLoaderConfig
and fix references to use self instead of self.config.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Moved the inference_batch_size default calculation from Args.__post_init__
to setup_runtime_variables where we have access to streaming_config.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants