-
-
Couldn't load subscription status.
- Fork 10.8k
Remove deprecated fields from CompilationConfig
#27593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remove deprecated fields from CompilationConfig
#27593
Conversation
Signed-off-by: Harry Mellor <[email protected]>
|
Documentation preview: https://vllm--27593.org.readthedocs.build/en/27593/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request removes the deprecated use_cudagraph and full_cuda_graph fields from the CompilationConfig and updates the config tests accordingly. The removal of these flags simplifies the configuration and aligns with the deprecation schedule. The test updates ensure the config tests are passing.
Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for doing the deprecations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamp
| vllm_config = VllmConfig( | ||
| compilation_config=CompilationConfig( | ||
| mode=CompilationMode.VLLM_COMPILE, | ||
| use_cudagraph=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make mode explicitly Piecewise here?
| # de-duplicate the sizes provided by the config | ||
| dedup_sizes = list(set(self.compilation_config.cudagraph_capture_sizes)) | ||
| cudagraph_capture_sizes = dedup_sizes | ||
| # filter out sizes larger than max_cudagraph_capture_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have to be added? Bad merge?
use_cudagraphandfull_cuda_graphhave been deprecated since [Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer #20059v0.11.0, which has already been released