[xpu] disable cudagraph for xpu platform #21354
Closed
+16
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
When running certain models in torch.compile mode, the system might try to use CUDA Graph-related code. This isn't supported on XPU platforms, so we've disabled it here.
Test Plan
VLLM_USE_V1=1 python examples/offline_inference/audio_language.py -m granite_speech
Test Result
Without this PR:
lora_b_stacked_2_ = None
File "/home/chaojun/vllm/vllm/compilation/cuda_piecewise_backend.py", line 164, in call
cudagraph = torch.cuda.CUDAGraph()
File "/home/chaojun/.local/lib/python3.10/site-packages/torch/cuda/graphs.py", line 74, in new
return super().new(cls, keep_graph)
File "/home/chaojun/.local/lib/python3.10/site-packages/torch/_utils.py", line 996, in err_fn
raise RuntimeError(f"Tried to instantiate dummy base class {class_name}")
RuntimeError: Tried to instantiate dummy base class CUDAGraph
With this PR:
Adding requests: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:10<00:00, 10.00s/it]
Fetching 25 files: 100%|████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 23101.48it/s]
Processed prompts: 100%|███████████| 1/1 [00:03<00:00, 3.68s/it, est. speed input: 61.98 toks/s, output: 12.50 toks/s]
the first words i spoke in the original phonograph a little piece of practical poetry mary had a little lamb its fleece was white as snow and everywhere that mary went the lamb was sure to go
(Optional) Documentation Update