[trainer] fix: dump all outputs in validation in main_ppo_sync#6227
[trainer] fix: dump all outputs in validation in main_ppo_sync#6227guillemgt wants to merge 1 commit intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the validation process in verl/trainer/main_ppo_sync.py to collect and dump all inputs and outputs, rather than just the final samples per session. It introduces new tracking for all keys and sessions, ensuring that the dumped data is sorted by UID, session ID, and index. Feedback was provided regarding the sorting logic, which assumes specific key formats for integer conversion and could lead to a crash if unexpected formats are encountered.
| sort_keys = [] | ||
| for key in dump_all_keys: | ||
| parts = key.rsplit("_", 2) | ||
| sort_keys.append((parts[0], int(parts[1]), int(parts[2])) if len(parts) == 3 else (key, 0, 0)) |
There was a problem hiding this comment.
The sorting logic assumes that parts[1] (session_id) and parts[2] (index) are always convertible to integers. While this is generally true for keys generated by AgentLoopWorkerTQ, any unexpected key format in dump_all_keys will cause a ValueError during the validation dump. It is safer to wrap the integer conversion in a try-except block or use a more robust parsing method to avoid crashing the trainer at the end of a validation run.
What does this PR do?
In
_validateinmain_ppo_sync.py, whenvalidation_data_diris set, the dump only included the final output per session. For multi-turn or multi-output rollouts, all intermediate and alternative outputs were silently discarded. This PR fixes_validateto fetch prompts/responses for all keys and include every output in the dump, while mapping each entry back to its session's final score/ground-truth. Follows up on #6101.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
This change only affects the file dump path (when
trainer.validation_data_diris set). No CI test covers this code path — it requires a full training run with multi-output rollouts.API and Usage Example
No API changes.
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
kv_batch_getcall, so every output is captured in the dump.dump_all_inputs,dump_all_outputs,dump_all_keysaccumulate all entries across batches.session_to_sample_idxmaps session keys to their position in the per-session lists, so scores/ground-truths can be looked up correctly when writing the dump.prompts/responsesfrom thefieldslist of the final-keykv_batch_get(fetched separately above).(uid, session_id, index)so outputs from the same rollout are grouped together.Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always— passed (naming convention failures are pre-existing in third-party packages, not in changed files).ci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main. Not applicable.