Add enforce_max_duration setting #2394
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new enforce_max_duration setting to the LoadGen test configuration. This allows users to control whether exceeding max_duration should terminate query issuance early and how minimum query count validation is applied.
Key Changes
The changes are taken from the branch https://github.com/mlcommons/inference/commits/mobile_update/, which is now outdated and therefore not possible to merge into master without resolving a conflict.
Motivation
We've maintained this change in a separate branch called mobile_update until now. This makes it difficult to update the loadgen version, so we want to merge this change into the master branch.
Related issues:
mlcommons/mobile_app_open#798
#1621