Skip to content

Conversation

@AlonKellner-RedHat
Copy link
Collaborator

@AlonKellner-RedHat AlonKellner-RedHat commented Oct 29, 2025

Summary

This PR adds over-saturation stopping to the GuideLLM CLI.
It's based on the OSD (Over-Saturation Detection) algorithm we developed and evaluated at Jounce.
Use --stop-over-saturated or --stop-osd to enable.

Details

This PR adds:

  • Over-saturation stopping (--stop-over-saturated)
  • Comprehensive OSD unit tests

Test Plan


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

@AlonKellner-RedHat AlonKellner-RedHat mentioned this pull request Oct 30, 2025
9 tasks
@AlonKellner-RedHat AlonKellner-RedHat force-pushed the feat/over-saturation-stopping branch 4 times, most recently from 85cf65e to f996254 Compare November 6, 2025 11:11
sjmonson added a commit that referenced this pull request Nov 6, 2025
## Summary

E2E tests which check basic GuideLLM functionality, using vLLM
simulator.

## Details

- [x] Max requests test
- [x] Max duration test
- [ ] Over-saturation stopping test - skipped for now, will be enabled
when #438 lands

## Test Plan

- [x] Local testing
- [x] GitHub action

---

- [x] "I certify that all code in this PR is my own, except as noted
below."

## Use of AI

- [x] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
@AlonKellner-RedHat AlonKellner-RedHat force-pushed the feat/over-saturation-stopping branch 4 times, most recently from ce85b1c to 46bc491 Compare November 18, 2025 08:15
Copy link
Collaborator

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt-python_module_docs.md
Overall looks good. I have a few comments/discussion points to run through around names, types, package layout, etc. I haven't gone through the full logic yet to validate, will do that soon, but adding a writeup of the algorithm would help at the top.

Also, can you ensure all public methods have docs? I'll attach the prompt I use to generate docs that is generally fairly self sufficient and accurate in case you'd like to use that so it doesn't take too much time.

sophisticated benchmark stopping criteria through composable constraint types.
"""

from .base import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we merge this and protocols into a single file called constraint.py to better merge some other new patterns across the repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ConstraintInitializer,
SerializableConstraintInitializer,
)
from .standard import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we break this apart into request.py and error.py or something along those lines? That way we have some clearer delineation of where future constraints might go particularly when looking at in the context of adding over_saturation.py module

OverSaturationConstraintInitializer,
OverSaturationDetector,
)
from .protocols import (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note above on merging

max_global_error_rate: float | None = Field(
default=None, description="Maximum global error rate (0-1) before stopping"
)
stop_over_saturated: bool = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this and retype this a bit? Something along the lines of just over_saturation, detect_saturation, etc? And then can ideally can support either a bool or dict input to resolve with arguments for the constraint/initializer implementations. This way we can keep these configurations local and stored within each benchmark command rather than global envs which can change across systems. This last parte being more important since we are saving this entrypoints class in the output report which can then be reloaded again later for replication / other use

max_errors: int | None,
max_error_rate: float | None,
max_global_error_rate: float | None,
stop_over_saturated: bool | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point here as earlier on the naming and type settings

return (slope > 0) and (margin_of_error < self.moe_threshold)


class OverSaturationDetector(OverSaturationDetectorBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to fold this class in with the Constraint class to simplify the flow. That way the initializer instantiates a constraint which actively enforces detection rather than chaining another level down. The initializer would then be responsible for instantiating the correct over saturation constraint based on settings passed in if we add some more later on

@@ -0,0 +1,467 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe rename to just saturation.py for the module if we are looking to include under saturation and other general variants within

@@ -0,0 +1,467 @@
"""
Over-saturation detection constraint implementation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide a short writeup of the method/algorithm here for what's implemented down below and how the classes interface?

help="Maximum global error rate across all benchmarks.",
)
@click.option(
"--stop-over-saturated",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on the previous comment for entrypoints args for name and type

@@ -0,0 +1,256 @@
# Over-Saturation Feature Test Coverage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doc would become hard to keep updated and reflective of the test cases so they could quickly become out of sync. Also, not sure how much value it adds since it should be able to be fairly reproducible by feeding the code back into an LLM with a prompt to generate an overview/description

AlonKellner-RedHat and others added 14 commits November 20, 2025 11:34
Add over-saturation detection and stopping capability to GuideLLM CLI.

- Implement OverSaturationDetector with statistical slope detection
- Add OverSaturationConstraint for scheduler integration
- Add CLI flags --stop-over-saturated and --stop-osd
- Integrate with benchmark entrypoints and main CLI

Signed-off-by: Alon Kellner <[email protected]>
Add unit tests for over-saturation detection and constraint functionality.

Signed-off-by: Alon Kellner <[email protected]>
Add comprehensive test suite for over-saturation detection algorithm.

Signed-off-by: Alon Kellner <[email protected]>
Enable end-to-end test for over-saturation stopping functionality.

Signed-off-by: Alon Kellner <[email protected]>
Refactor the single constraints.py file into a package structure where each
constraint type has its own file:

- protocols.py: Protocol definitions (Constraint, ConstraintInitializer, SerializableConstraintInitializer)
- factory.py: ConstraintsInitializerFactory for creating and managing constraints
- base.py: Base classes (PydanticConstraintInitializer, UnserializableConstraintInitializer)
- standard.py: Standard constraints (MaxNumber, MaxDuration, MaxErrors, MaxErrorRate, MaxGlobalErrorRate, RequestsExhausted)
- over_saturation.py: Over-saturation detection constraint implementation

This improves code organization and maintainability while preserving backward
compatibility through the package's __init__.py exports.

Signed-off-by: Alon Kellner <[email protected]>
…Args

The field was referenced in __main__.py but missing from the schema definition,
causing ValueError when trying to get default values.

Signed-off-by: Alon Kellner <[email protected]>
- Fix first_iteration -> first_token_iteration attribute name
- Add type ignore for OverSaturationConstraint return type
- Fix validated_kwargs type handling for stop_over_saturated parameter

Signed-off-by: Alon Kellner <[email protected]>
When is_flag=True, Click automatically handles boolean values.
Specifying type=bool can cause Pydantic validation errors.

Signed-off-by: Alon Kellner <[email protected]>
- Fix import paths from advanced_constraints to constraints package
- Fix line length errors (E501) in test files
- Fix type error for request_start None check
- Update test coverage documentation

Signed-off-by: Alon Kellner <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Alon Kellner <[email protected]>
Signed-off-by: Alon Kellner <[email protected]>
Signed-off-by: Alon Kellner <[email protected]>
@AlonKellner-RedHat AlonKellner-RedHat force-pushed the feat/over-saturation-stopping branch from 59ffa16 to 3ebdcaa Compare November 20, 2025 11:35
Signed-off-by: Alon Kellner <[email protected]>
Signed-off-by: Alon Kellner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants