Feat/over saturation stopping #438

AlonKellner-RedHat · 2025-10-29T14:24:29Z

Summary

This PR adds over-saturation stopping to the GuideLLM CLI.
It's based on the OSD (Over-Saturation Detection) algorithm we developed and evaluated at Jounce.
Use --stop-over-saturated or --stop-osd to enable.

Details

This PR adds:

Over-saturation stopping (--stop-over-saturated)
Comprehensive OSD unit tests

Test Plan

Currently, only unit tests
When test(e2e): basic E2Es #440 lands, we'll enable its over-saturation stopping E2E test

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

## Summary E2E tests which check basic GuideLLM functionality, using vLLM simulator. ## Details - [x] Max requests test - [x] Max duration test - [ ] Over-saturation stopping test - skipped for now, will be enabled when #438 lands ## Test Plan - [x] Local testing - [x] GitHub action --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

markurtz

prompt-python_module_docs.md
Overall looks good. I have a few comments/discussion points to run through around names, types, package layout, etc. I haven't gone through the full logic yet to validate, will do that soon, but adding a writeup of the algorithm would help at the top.

Also, can you ensure all public methods have docs? I'll attach the prompt I use to generate docs that is generally fairly self sufficient and accurate in case you'd like to use that so it doesn't take too much time.

markurtz · 2025-11-19T11:04:14Z

src/guidellm/scheduler/constraints/__init__.py

+sophisticated benchmark stopping criteria through composable constraint types.
+"""
+
+from .base import (


can we merge this and protocols into a single file called constraint.py to better merge some other new patterns across the repo?

markurtz · 2025-11-19T11:05:45Z

src/guidellm/scheduler/constraints/__init__.py

+    ConstraintInitializer,
+    SerializableConstraintInitializer,
+)
+from .standard import (


Could we break this apart into request.py and error.py or something along those lines? That way we have some clearer delineation of where future constraints might go particularly when looking at in the context of adding over_saturation.py module

markurtz · 2025-11-19T11:06:01Z

src/guidellm/scheduler/constraints/__init__.py

+    OverSaturationConstraintInitializer,
+    OverSaturationDetector,
+)
+from .protocols import (


Note above on merging

markurtz · 2025-11-19T13:19:53Z

src/guidellm/benchmark/schemas/generative/entrypoints.py

    max_global_error_rate: float | None = Field(
        default=None, description="Maximum global error rate (0-1) before stopping"
    )
+    stop_over_saturated: bool = Field(


Can we rename this and retype this a bit? Something along the lines of just over_saturation, detect_saturation, etc? And then can ideally can support either a bool or dict input to resolve with arguments for the constraint/initializer implementations. This way we can keep these configurations local and stored within each benchmark command rather than global envs which can change across systems. This last parte being more important since we are saving this entrypoints class in the output report which can then be reloaded again later for replication / other use

markurtz · 2025-11-19T13:20:34Z

src/guidellm/benchmark/entrypoints.py

    max_errors: int | None,
    max_error_rate: float | None,
    max_global_error_rate: float | None,
+    stop_over_saturated: bool | None = None,


Same point here as earlier on the naming and type settings

markurtz · 2025-11-19T13:32:28Z

src/guidellm/scheduler/constraints/over_saturation.py

+        return (slope > 0) and (margin_of_error < self.moe_threshold)
+
+
+class OverSaturationDetector(OverSaturationDetectorBase):


It might make sense to fold this class in with the Constraint class to simplify the flow. That way the initializer instantiates a constraint which actively enforces detection rather than chaining another level down. The initializer would then be responsible for instantiating the correct over saturation constraint based on settings passed in if we add some more later on

markurtz · 2025-11-19T13:32:59Z

src/guidellm/scheduler/constraints/saturation.py

@@ -0,0 +1,467 @@
+"""


NIT: maybe rename to just saturation.py for the module if we are looking to include under saturation and other general variants within

markurtz · 2025-11-19T13:33:21Z

src/guidellm/scheduler/constraints/saturation.py

@@ -0,0 +1,467 @@
+"""
+Over-saturation detection constraint implementation.


Can we provide a short writeup of the method/algorithm here for what's implemented down below and how the classes interface?

markurtz · 2025-11-19T13:34:21Z

src/guidellm/__main__.py

    help="Maximum global error rate across all benchmarks.",
 )
+@click.option(
+    "--stop-over-saturated",


Note on the previous comment for entrypoints args for name and type

markurtz · 2025-11-19T13:35:55Z

tests/unit/scheduler/OVER_SATURATION_TEST_COVERAGE.md

@@ -0,0 +1,256 @@
+# Over-Saturation Feature Test Coverage


I think this doc would become hard to keep updated and reflective of the test cases so they could quickly become out of sync. Also, not sure how much value it adds since it should be able to be fairly reproducible by feeding the code back into an LLM with a prompt to generate an overview/description

Add over-saturation detection and stopping capability to GuideLLM CLI. - Implement OverSaturationDetector with statistical slope detection - Add OverSaturationConstraint for scheduler integration - Add CLI flags --stop-over-saturated and --stop-osd - Integrate with benchmark entrypoints and main CLI Signed-off-by: Alon Kellner <[email protected]>

Add unit tests for over-saturation detection and constraint functionality. Signed-off-by: Alon Kellner <[email protected]>

Add comprehensive test suite for over-saturation detection algorithm. Signed-off-by: Alon Kellner <[email protected]>

Enable end-to-end test for over-saturation stopping functionality. Signed-off-by: Alon Kellner <[email protected]>

Refactor the single constraints.py file into a package structure where each constraint type has its own file: - protocols.py: Protocol definitions (Constraint, ConstraintInitializer, SerializableConstraintInitializer) - factory.py: ConstraintsInitializerFactory for creating and managing constraints - base.py: Base classes (PydanticConstraintInitializer, UnserializableConstraintInitializer) - standard.py: Standard constraints (MaxNumber, MaxDuration, MaxErrors, MaxErrorRate, MaxGlobalErrorRate, RequestsExhausted) - over_saturation.py: Over-saturation detection constraint implementation This improves code organization and maintainability while preserving backward compatibility through the package's __init__.py exports. Signed-off-by: Alon Kellner <[email protected]>

…Args The field was referenced in __main__.py but missing from the schema definition, causing ValueError when trying to get default values. Signed-off-by: Alon Kellner <[email protected]>

- Fix first_iteration -> first_token_iteration attribute name - Add type ignore for OverSaturationConstraint return type - Fix validated_kwargs type handling for stop_over_saturated parameter Signed-off-by: Alon Kellner <[email protected]>

When is_flag=True, Click automatically handles boolean values. Specifying type=bool can cause Pydantic validation errors. Signed-off-by: Alon Kellner <[email protected]>

- Fix import paths from advanced_constraints to constraints package - Fix line length errors (E501) in test files - Fix type error for request_start None check - Update test coverage documentation Signed-off-by: Alon Kellner <[email protected]>

Signed-off-by: Alon Kellner <[email protected]>

Signed-off-by: Samuel Monson <[email protected]> Signed-off-by: Alon Kellner <[email protected]>

Signed-off-by: Alon Kellner <[email protected]>

AlonKellner-RedHat mentioned this pull request Oct 30, 2025

test(e2e): basic E2Es #440

Merged

9 tasks

AlonKellner-RedHat requested review from markurtz and sjmonson November 4, 2025 16:46

AlonKellner-RedHat force-pushed the feat/over-saturation-stopping branch 4 times, most recently from 85cf65e to f996254 Compare November 6, 2025 11:11

AlonKellner-RedHat force-pushed the feat/over-saturation-stopping branch 4 times, most recently from ce85b1c to 46bc491 Compare November 18, 2025 08:15

markurtz requested changes Nov 19, 2025

View reviewed changes

AlonKellner-RedHat and others added 14 commits November 20, 2025 11:34

test: over-saturation stopping

db51048

Add unit tests for over-saturation detection and constraint functionality. Signed-off-by: Alon Kellner <[email protected]>

test: comprehensive over-saturation detection

5ba3c49

Add comprehensive test suite for over-saturation detection algorithm. Signed-off-by: Alon Kellner <[email protected]>

test(e2e): enable over-saturation test

cd26395

Enable end-to-end test for over-saturation stopping functionality. Signed-off-by: Alon Kellner <[email protected]>

fix: add missing stop_over_saturated field to BenchmarkGenerativeText…

fdbd239

…Args The field was referenced in __main__.py but missing from the schema definition, causing ValueError when trying to get default values. Signed-off-by: Alon Kellner <[email protected]>

fix: remove type=bool from click flag option

d96e813

When is_flag=True, Click automatically handles boolean values. Specifying type=bool can cause Pydantic validation errors. Signed-off-by: Alon Kellner <[email protected]>

fix: over-saturation test coverage mdformat

18a73af

Signed-off-by: Alon Kellner <[email protected]>

Unmask StopIteration

c727dbc

Signed-off-by: Samuel Monson <[email protected]> Signed-off-by: Alon Kellner <[email protected]>

fix: mark review comments

54536b4

Signed-off-by: Alon Kellner <[email protected]>

fix: mark review further comments

e41033f

Signed-off-by: Alon Kellner <[email protected]>

fix: CI errors

3ebdcaa

Signed-off-by: Alon Kellner <[email protected]>

AlonKellner-RedHat force-pushed the feat/over-saturation-stopping branch from 59ffa16 to 3ebdcaa Compare November 20, 2025 11:35

AlonKellner-RedHat added 2 commits November 20, 2025 13:35

Merge branch 'main' into feat/over-saturation-stopping

8a4b343

fix: E2E tests

b5ba4b0

Signed-off-by: Alon Kellner <[email protected]>

AlonKellner-RedHat added 2 commits November 20, 2025 12:03

fix: E2E tests

b967dad

Signed-off-by: Alon Kellner <[email protected]>

fix: E2E tests

9be1d86

Signed-off-by: Alon Kellner <[email protected]>

		return (slope > 0) and (margin_of_error < self.moe_threshold)


		class OverSaturationDetector(OverSaturationDetectorBase):

		@@ -0,0 +1,467 @@
		"""
		Over-saturation detection constraint implementation.

Feat/over saturation stopping #438

Are you sure you want to change the base?

Feat/over saturation stopping #438

Uh oh!

Conversation

AlonKellner-RedHat commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Use of AI

Uh oh!

markurtz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlonKellner-RedHat commented Oct 29, 2025 •

edited

Loading

markurtz left a comment •

edited

Loading