Skip to content

Conversation

@mnoukhov
Copy link
Contributor

@mnoukhov mnoukhov commented Nov 18, 2025

measure the training steps between

  • adding prompt to queue
  • starting generation of prompt
  • using prompt for training

rename prompt_param_Q to just prompt_Q since it doesn't contain params anymore


Note

Adds staleness metrics (queue→generation→consume) wired through ActorManager, vLLM actor, batching, and logs; renames param_prompt_Q to prompt_Q.

  • Metrics/Observability:
    • Track training step staleness via GenerationResult (queued_training_step, generation_started_training_step).
    • ActorManager: store/expose current training step; main loop updates it each step.
    • vLLM actor: capture current step when dequeuing, propagate via request metadata; process_completed_request sets fields.
    • BatchStatistics: add staleness fields; compute staleness/queue_to_generation and staleness/generation_to_consume (+ means) and log in training metrics.
    • Update tests to validate new fields.
  • Queue API:
    • Rename param_prompt_Qprompt_Q across code paths (add_prompt_to_generator, data_preparation_thread, run_training, create_model_and_optimizer, main).
  • Misc:
    • Minor dashboard/manager plumbing for KV cache and timing unchanged; integrate step reporting with metrics.

Written by Cursor Bugbot for commit ac22910. This will update automatically on new commits. Configure here.

measure the training steps between
- adding prompt to queue
- starting generation of prompt
- using prompt for training

rename prompt_param_Q to just prompt_Q since it doesn't contain params
anymore
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mnoukhov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the training pipeline by introducing a comprehensive system for measuring and reporting prompt staleness. By meticulously tracking prompts from their initial queuing, through generation, and finally to their consumption in training, the changes provide valuable insights into the efficiency and potential bottlenecks of the prompt processing flow. This improved visibility aims to optimize the overall training process. Additionally, a variable name for a key queue was updated for better clarity and consistency.

Highlights

  • Staleness Metrics Introduction: New metrics have been implemented to track the 'staleness' of prompts throughout their lifecycle within the training pipeline. This includes measuring the time (in training steps) a prompt spends in the queue before generation, and the time from generation start until it's consumed for training.
  • Prompt Lifecycle Tracking: The system now records the current training step at three critical points for each prompt: when it is added to the queue, when its generation begins, and when it is ultimately used for training. These timestamps enable the calculation of staleness metrics.
  • Queue Renaming: The param_prompt_Q queue has been renamed to prompt_Q across the codebase to accurately reflect its current role, as it no longer exclusively handles parameters.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces staleness metrics to measure the time prompts spend in different stages of the pipeline, and also refactors the naming of prompt_param_Q to prompt_Q. The refactoring is well-executed. The logic for adding staleness metrics is mostly sound, but I've identified a critical bug in open_instruct/grpo_fast.py within the staleness calculation logic that would lead to a runtime error. I've provided a detailed comment and a code suggestion to fix this. Additionally, I've included a medium-severity comment for a minor code simplification in the same file. The other changes in actor_manager.py, queue_types.py, vllm_utils.py, and the test files look good and correctly support the new feature.

Comment on lines +2098 to +2114
# calculate staleness metrics
queue_steps = batch_stats.pop("staleness_queue_steps", [])
generation_steps = batch_stats.pop("staleness_generation_start_steps", [])
consumed_steps = batch_stats.pop("staleness_consumed_step", [])

queue_to_generation = []
generation_to_consume = []
for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps):
queue_to_generation.append(generation_step - queue_step)
generation_to_consume.append(consume_step - generation_step)

staleness_metrics = {
"staleness/queue_to_generation": queue_to_generation,
"staleness/queue_to_generation_mean": np.mean(queue_to_generation),
"staleness/generation_to_consume": generation_to_consume,
"staleness/generation_to_consume_mean": np.mean(generation_to_consume),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are a few issues in this block that will cause runtime errors and incorrect behavior:

  1. AttributeError: You are calling .pop() on batch_stats, which is a dataclass instance and does not have a pop method. You should be using batch_metrics, which is the dictionary created from asdict(batch_stats).
  2. TypeError: batch_metrics.pop("staleness_consumed_step", []) will return an integer. The subsequent zip call attempts to iterate over this integer (consumed_steps), which will raise a TypeError. The consumed step is the same for all items in the batch and should be treated as a scalar.
  3. None values: The queue_steps and generation_steps lists can contain None values. The subtraction generation_step - queue_step will raise a TypeError if either value is None. These cases should be handled.
  4. RuntimeWarning from np.mean: If queue_to_generation or generation_to_consume is an empty list (e.g., if all steps were None), np.mean will return NaN and raise a RuntimeWarning. It's better to handle this case and return 0.0.

I've provided a suggestion that fixes these issues by using batch_metrics, handling the scalar consumed_step correctly, checking for None before subtraction, and safely calculating the mean.

Suggested change
# calculate staleness metrics
queue_steps = batch_stats.pop("staleness_queue_steps", [])
generation_steps = batch_stats.pop("staleness_generation_start_steps", [])
consumed_steps = batch_stats.pop("staleness_consumed_step", [])
queue_to_generation = []
generation_to_consume = []
for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps):
queue_to_generation.append(generation_step - queue_step)
generation_to_consume.append(consume_step - generation_step)
staleness_metrics = {
"staleness/queue_to_generation": queue_to_generation,
"staleness/queue_to_generation_mean": np.mean(queue_to_generation),
"staleness/generation_to_consume": generation_to_consume,
"staleness/generation_to_consume_mean": np.mean(generation_to_consume),
}
# calculate staleness metrics
queue_steps = batch_metrics.pop("staleness_queue_steps", [])
generation_steps = batch_metrics.pop("staleness_generation_start_steps", [])
consumed_step = batch_metrics.pop("staleness_consumed_step", None)
queue_to_generation = []
generation_to_consume = []
if consumed_step is not None:
for q_step, g_step in zip(queue_steps, generation_steps):
if q_step is not None and g_step is not None:
queue_to_generation.append(g_step - q_step)
generation_to_consume.append(consumed_step - g_step)
staleness_metrics = {
"staleness/queue_to_generation": queue_to_generation,
"staleness/queue_to_generation_mean": np.mean(queue_to_generation) if queue_to_generation else 0.0,
"staleness/generation_to_consume": generation_to_consume,
"staleness/generation_to_consume_mean": np.mean(generation_to_consume) if generation_to_consume else 0.0,
}

Comment on lines +1931 to +1932
staleness_queue_steps=staleness_queue_steps if staleness_queue_steps else [],
staleness_generation_start_steps=staleness_generation_start_steps if staleness_generation_start_steps else [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The conditional assignments here are redundant. The variables staleness_queue_steps and staleness_generation_start_steps are initialized as empty lists. If they remain empty, the expression ... if [] else [] evaluates to []. If they are populated, the expression ... if [1, 2] else [] evaluates to [1, 2]. In both cases, the result is the same as the original variable. You can simplify this by directly assigning the variables.

Suggested change
staleness_queue_steps=staleness_queue_steps if staleness_queue_steps else [],
staleness_generation_start_steps=staleness_generation_start_steps if staleness_generation_start_steps else [],
staleness_queue_steps=staleness_queue_steps,
staleness_generation_start_steps=staleness_generation_start_steps,

generation_to_consume = []
for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps):
queue_to_generation.append(generation_step - queue_step)
generation_to_consume.append(consume_step - generation_step)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Zip Error: Single Value Not Iterable

The staleness_consumed_step field in BatchStatistics is a single int | None value, but the code tries to zip() it with two lists (queue_steps and generation_steps). This causes a runtime error because zip() expects an iterable, not a single integer. The consumed step should be repeated for each prompt or the code should be restructured to handle the single value properly.

Fix in Cursor Fix in Web

generation_to_consume = []
for queue_step, generation_step, consume_step in zip(queue_steps, generation_steps, consumed_steps):
queue_to_generation.append(generation_step - queue_step)
generation_to_consume.append(consume_step - generation_step)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Staleness Calculation: Robustness Against Missing Data

The staleness calculation performs arithmetic on queue_step, generation_step, and consume_step values that can be None. When _get_current_training_step() fails or when metadata keys are missing, these values default to None, causing TypeError: unsupported operand type(s) for -: 'NoneType' and 'int' when attempting subtraction. The code needs to handle None values before performing arithmetic operations.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants