Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Nov 11, 2025

Runs:

  1. Single GPU script: Beaker
  2. Multi-node: Beaker

Note

Adds a flag to skip loading the reference policy (and KL), centralizes ref-policy loading, refactors eval DeepSpeed config, and updates training flow and scripts accordingly.

  • GRPO training (open_instruct/grpo_fast.py):
    • Add Args.load_ref_policy with validation (requires beta=0.0 when disabled).
    • Conditionally load/ref-update/save reference policy; gate KL computation/metrics on load_ref_policy.
    • Refactor logprob computations via new compute_logprobs; streamline old-logprobs path.
    • Ensure ref-policy updates run only when enabled; fetch update futures immediately.
  • Utilities & Config:
    • New load_ref_policy in open_instruct/model_utils.py; disable dropout, DS init, optional checkpoint load.
    • Refactor get_eval_ds_config in open_instruct/utils.py to return (ds_config, HfDeepSpeedConfig) and accept per_device_train_batch_size.
    • Minor typing/import cleanups in model_utils.save_with_accelerate.
  • PPO (open_instruct/ppo.py):
    • Use shared load_ref_policy and updated get_eval_ds_config return signature.
  • Scripts:
    • large_test_script.sh: switch cluster; set --beta 0.0 and --load_ref_policy false.
    • single_gpu_on_beaker.sh: set --beta 0.0 and --load_ref_policy true.

Written by Cursor Bugbot for commit 1974db8. This will update automatically on new commits. Configure here.

@finbarrtimbers finbarrtimbers changed the title Removes the reference policy to save memory Adds a flag to skip loading the reference policy to save memory Nov 12, 2025
@finbarrtimbers finbarrtimbers marked this pull request as ready for review November 12, 2025 23:12
Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but need to fix the dschf stuff

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this for dschf

Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Nov 20, 2025
Merged via the queue into main with commit 110fb9e Nov 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants