Adds a flag to skip loading the reference policy to save memory #1171

finbarrtimbers · 2025-11-11T19:07:50Z

Runs:

Single GPU script: Beaker
Multi-node: Beaker

Note

Adds a flag to skip loading the reference policy (and KL), centralizes ref-policy loading, refactors eval DeepSpeed config, and updates training flow and scripts accordingly.

GRPO training (open_instruct/grpo_fast.py):
- Add Args.load_ref_policy with validation (requires beta=0.0 when disabled).
- Conditionally load/ref-update/save reference policy; gate KL computation/metrics on load_ref_policy.
- Refactor logprob computations via new compute_logprobs; streamline old-logprobs path.
- Ensure ref-policy updates run only when enabled; fetch update futures immediately.
Utilities & Config:
- New load_ref_policy in open_instruct/model_utils.py; disable dropout, DS init, optional checkpoint load.
- Refactor get_eval_ds_config in open_instruct/utils.py to return (ds_config, HfDeepSpeedConfig) and accept per_device_train_batch_size.
- Minor typing/import cleanups in model_utils.save_with_accelerate.
PPO (open_instruct/ppo.py):
- Use shared load_ref_policy and updated get_eval_ds_config return signature.
Scripts:
- large_test_script.sh: switch cluster; set --beta 0.0 and --load_ref_policy false.
- single_gpu_on_beaker.sh: set --beta 0.0 and --load_ref_policy true.

^{Written by Cursor Bugbot for commit 1974db8. This will update automatically on new commits. Configure here.}

open_instruct/grpo_fast.py

hamishivi

LGTM, but need to fix the dschf stuff

open_instruct/grpo_fast.py

hamishivi · 2025-11-13T17:27:10Z

open_instruct/grpo_fast.py

see this for dschf

open_instruct/model_utils.py

open_instruct/grpo_fast.py

hamishivi

LGTM!

finbarrtimbers and others added 18 commits November 11, 2025 10:44

First commit with mocked scripts and launcher.

2e13017

Add script

1123baa

Updated code to be longer

c21cf54

Updated script

8c7f5e3

updated code to use 4 nodes

f2d7286

updated script to not die with checkpoint error

d4db1bc

fix init for tokenizer

4990aa9

fixed dataset loader

c8b7626

updated dataset

c39be00

Updated code

e939ee1

updated mocks cript

b7b4ffa

fixes mock

b1b4422

removed ref policy

9f875a7

removed old files

85c22e5

Uses a flag now

c690375

Updated scripts

379ab73

fixed kl calculation

74b6fd3

conditionally load config

54d2cbb

finbarrtimbers changed the title ~~Removes the reference policy to save memory~~ Adds a flag to skip loading the reference policy to save memory Nov 12, 2025

finbarrtimbers added 8 commits November 12, 2025 09:33

cleaned up PR

2f43921

Cleaned up calculate_ref_Logprobs

59e28fe

Cleaned up code

a0edde4

Cleaned up code.

5b220a9

Merge branch 'main' into finbarr/no-ref-policy

5e6a6ad

Added docstring plus type annotations

80c04eb

updated message

6d02b68

added comment

afa00b2

finbarrtimbers marked this pull request as ready for review November 12, 2025 23:12

finbarrtimbers requested a review from hamishivi November 12, 2025 23:12

cursor bot reviewed Nov 12, 2025

View reviewed changes

open_instruct/grpo_fast.py Outdated Show resolved Hide resolved

open_instruct/grpo_fast.py Show resolved Hide resolved

finbarrtimbers added 2 commits November 12, 2025 16:24

Fixed bugs

e367e3e

Merge branch 'main' into finbarr/no-ref-policy

6dd533d

mnoukhov approved these changes Nov 13, 2025

View reviewed changes

hamishivi requested changes Nov 13, 2025

View reviewed changes

open_instruct/grpo_fast.py Show resolved Hide resolved

open_instruct/grpo_fast.py

Copy link

Collaborator

hamishivi Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this for dschf

finbarrtimbers added 2 commits November 13, 2025 10:55

now dschf is plumbed through

cc453ea

Merge branch 'main' into finbarr/no-ref-policy

5a38087

cursor bot reviewed Nov 13, 2025

View reviewed changes

open_instruct/model_utils.py Show resolved Hide resolved

finbarrtimbers added 8 commits November 13, 2025 14:38

uses active sampling

1caaa2c

Merge branch 'main' into finbarr/no-ref-policy

93a4171

ANother config

3bfb96b

Removed active sampling

e5ee27a

Cleaned up code

51639ef

Added back comments

a992387

Cleaned up code by removing unnecessary code.

65679a5

Updated script

7e2d945

cursor bot reviewed Nov 17, 2025

View reviewed changes

open_instruct/grpo_fast.py Show resolved Hide resolved

finbarrtimbers added 3 commits November 17, 2025 13:54

set load ref policy true

8e4f325

Updated logprob code

3b24977

Fix double import

53c5b81

finbarrtimbers requested a review from hamishivi November 18, 2025 03:20

Ran linter.

1974db8

hamishivi approved these changes Nov 20, 2025

View reviewed changes

finbarrtimbers added this pull request to the merge queue Nov 20, 2025

Merged via the queue into main with commit 110fb9e Nov 20, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds a flag to skip loading the reference policy to save memory #1171

Adds a flag to skip loading the reference policy to save memory #1171

Uh oh!

finbarrtimbers commented Nov 11, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Uh oh!

Uh oh!

hamishivi Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adds a flag to skip loading the reference policy to save memory #1171

Adds a flag to skip loading the reference policy to save memory #1171

Uh oh!

Conversation

finbarrtimbers commented Nov 11, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hamishivi Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

finbarrtimbers commented Nov 11, 2025 •

edited by cursor bot

Loading