-
Notifications
You must be signed in to change notification settings - Fork 466
rlzero template and fixes to rlzero scripts #1216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
913622f
dafc4c3
96dab51
7b5106c
ff5bed5
5ca69a5
0483eb1
7c32864
2934df7
5ae065e
85f61c5
c2b5b7b
9423d06
f3d08c3
971153f
9cb5f75
599c9bb
5d94d9e
44b8dc5
6c21889
2c4010a
832992f
9c0a0d3
c22dd97
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,14 @@ | ||
| #!/bin/bash | ||
|
|
||
| # OLMo 3 model | ||
| MODEL_NAME_OR_PATH="/weka/oe-training-default/ai2-llm/checkpoints/tylerr/long-context/olmo25_7b_lc_64k_6T_M100B_round5-sparkle_6634-pre_s2pdf_gzip2080_cweN-yake-all-olmo_packing_yarn-fullonly_50B-fb13a737/step11921-hf" | ||
| MODEL_NAME_OR_PATH="allenai/Olmo-3-1025-7B" | ||
| DATASETS="allenai/Dolci-RLZero-IF-7B 1.0" | ||
|
|
||
| DATASETS="saurabh5/IF_multi_constraints_upto5_filtered_olmo_completions_filtered 13314" | ||
|
|
||
| LOCAL_EVALS="hamishivi/IF_multi_constraints_upto5_filtered 8" | ||
| LOCAL_EVALS="allenai/Dolci-RLZero-IF-7B 8" | ||
| LOCAL_EVAL_SPLITS="train" | ||
|
|
||
| EVALS="ifeval::hamish_zs_reasoning_deepseek" | ||
|
|
||
| EXP_NAME="grpo_if_from_zero" | ||
| EXP_NAME="olmo3_7b_rlzero_if" | ||
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="${1:-${BEAKER_USER}/open-instruct-integration-test}" | ||
| shift | ||
|
|
@@ -30,9 +28,8 @@ python mason.py \ | |
| --env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \ | ||
| --gpus 8 \ | ||
| --budget ai2/oe-adapt \ | ||
| -- \ | ||
| source configs/beaker_configs/ray_node_setup.sh \&\& \ | ||
| python open_instruct/grpo_fast.py \ | ||
| -- source configs/beaker_configs/ray_node_setup.sh \ | ||
| \&\& uv run open_instruct/grpo_fast.py \ | ||
| --exp_name ${EXP_NAME} \ | ||
| --beta 0.0 \ | ||
| --async_steps 4 \ | ||
|
|
@@ -54,8 +51,7 @@ python open_instruct/grpo_fast.py \ | |
| --response_length 16384 \ | ||
| --pack_length 18432 \ | ||
| --model_name_or_path ${MODEL_NAME_OR_PATH} \ | ||
| --chat_template_name olmo_thinker \ | ||
| --stop_strings "</answer>" \ | ||
| --chat_template_name olmo_thinker_rlzero \ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Stop string mismatch with new templatesThe script uses |
||
| --non_stop_penalty False \ | ||
| --temperature 1.0 \ | ||
| --total_episodes 10000000 \ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| #!/bin/bash | ||
|
|
||
| MODEL_NAME_OR_PATH="allenai/Olmo-3-1025-7B" | ||
| DATASETS="allenai/Dolci-RLZero-Code-7B 1.0 allenai/Dolci-RLZero-IF-7B 1.0 allenai/Dolci-RLZero-Code-7B 1.0 allenai/Dolci-RLZero-General-7B 1.0" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Duplicate Code dataset in mix scriptThe Additional Locations (1) |
||
|
|
||
| LOCAL_EVALS="allenai/Dolci-RLZero-Code-7B 8 allenai/Dolci-RLZero-IF-7B 8 allenai/Dolci-RLZero-Code-7B 8 allenai/Dolci-RLZero-General-7B 8" | ||
| LOCAL_EVAL_SPLITS="train train train train train train train train" | ||
|
|
||
| EVALS="alpaca_eval_v3::hamish_zs_reasoning_deepseek,agi_eval_english:0shot_cot::hamish_zs_reasoning_deepseek,gpqa:0shot_cot::hamish_zs_reasoning_deepseek" | ||
|
|
||
| EXP_NAME="olmo3_7b_rlzero_mix" | ||
| BEAKER_USER=$(beaker account whoami --format json | jq -r '.[0].name') | ||
| BEAKER_IMAGE="${1:-${BEAKER_USER}/open-instruct-integration-test}" | ||
| shift | ||
|
|
||
| cluster=ai2/augusta | ||
|
|
||
| python mason.py \ | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. uv run?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed |
||
| --task_name ${EXP_NAME} \ | ||
| --cluster ${cluster} \ | ||
| --workspace ai2/olmo-instruct \ | ||
| --priority high \ | ||
| --pure_docker_mode \ | ||
| --image ${BEAKER_IMAGE} \ | ||
| --preemptible \ | ||
| --num_nodes 4 \ | ||
| --env VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \ | ||
| --env VLLM_ATTENTION_BACKEND="FLASH_ATTN" \ | ||
| --gpus 8 \ | ||
| --budget ai2/oe-adapt \ | ||
| -- source configs/beaker_configs/ray_node_setup.sh \ | ||
| \&\& uv run open_instruct/grpo_fast.py \ | ||
| --exp_name ${EXP_NAME} \ | ||
| --beta 0.0 \ | ||
| --async_steps 4 \ | ||
| --inflight_updates \ | ||
| --truncated_importance_sampling_ratio_cap 2.0 \ | ||
| --num_samples_per_prompt_rollout 8 \ | ||
| --num_unique_prompts_rollout 32 \ | ||
| --num_mini_batches 1 \ | ||
| --num_epochs 1 \ | ||
| --learning_rate 1e-6 \ | ||
| --per_device_train_batch_size 1 \ | ||
| --kl_estimator kl3 \ | ||
| --dataset_mixer_list $DATASETS \ | ||
| --dataset_mixer_list_splits train \ | ||
| --dataset_mixer_eval_list $LOCAL_EVALS \ | ||
| --dataset_mixer_eval_list_splits $LOCAL_EVAL_SPLITS \ | ||
| --max_token_length 10240 \ | ||
| --max_prompt_token_length 2048 \ | ||
| --response_length 16384 \ | ||
| --pack_length 18432 \ | ||
| --model_name_or_path ${MODEL_NAME_OR_PATH} \ | ||
| --chat_template_name olmo_thinker_rlzero \ | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Template mismatch for Code datasetThe script uses |
||
| --non_stop_penalty False \ | ||
| --temperature 1.0 \ | ||
| --total_episodes 10000000 \ | ||
| --deepspeed_stage 3 \ | ||
| --num_learners_per_node 8 \ | ||
| --vllm_num_engines 32 \ | ||
| --vllm_tensor_parallel_size 1 \ | ||
| --llm_judge_model hosted_vllm/Qwen/Qwen3-32B \ | ||
| --llm_judge_timeout 600 \ | ||
| --llm_judge_max_tokens 2048 \ | ||
| --llm_judge_max_context_length 32768 \ | ||
| --lr_scheduler_type constant \ | ||
| --apply_verifiable_reward true \ | ||
| --seed 1 \ | ||
| --local_eval_every 50 \ | ||
| --save_freq 50 \ | ||
| --checkpoint_state_freq 50 \ | ||
| --gradient_checkpointing \ | ||
| --with_tracking \ | ||
| --vllm_enable_prefix_caching \ | ||
| --clip_higher 0.272 \ | ||
| --keep_last_n_checkpoints -1 \ | ||
| --mask_truncated_completions True \ | ||
| --oe_eval_max_length 16384 \ | ||
| --code_api_url https://p9f1719l7f.execute-api.us-west-2.amazonaws.com/prod/test_program \ | ||
| --try_launch_beaker_eval_jobs_on_weka True \ | ||
| --oe_eval_tasks $EVALS \ | ||
| --eval_on_step_0 True \ | ||
| --oe_eval_beaker_image oe-eval-beaker/oe_eval_olmo2_retrofit_auto \ | ||
| --output_dir /output/olmo3-7b-rlzero-general/checkpoints $@ | ||
Uh oh!
There was an error while loading. Please reload this page.