-
Notifications
You must be signed in to change notification settings - Fork 301
Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 #2126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 #2126
Conversation
| if (device.find("NPU") != std::string::npos) { | ||
| m_is_npu = true; | ||
| } | ||
|
|
||
| ov::CompiledModel compiled_model; | ||
| if (m_is_npu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (device.find("NPU") != std::string::npos) { | |
| m_is_npu = true; | |
| } | |
| ov::CompiledModel compiled_model; | |
| if (m_is_npu) { | |
| ov::CompiledModel compiled_model; | |
| // npu device | |
| if (device.find("NPU") != std::string::npos) { |
m_is_npu member seems redundant as used in ctor only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why .find is needed can't we just compare device == "NPU"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
a378dc9 to
470bc81
Compare
src/cpp/src/whisper/whisper.cpp
Outdated
|
|
||
| // reset input tensor | ||
| request.set_tensor("input_features", ov::Tensor(ov::element::f32, {0, feature_size, nb_max_frames})); | ||
| auto m_is_npu = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix this, how to get here that pipeline is running on NPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this reset isn't needed at all? (I don't remember the implementation details)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @as-suvorov
Reset is needed here as request refers to mel_data that can be destroyed later (details in #789 (comment))
|
Need to merge these changes first: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR switches the Whisper implementation to use the ov::genai::WhisperStatefulImpl variant when running on NPU, and updates the associated pipelines and decoder models accordingly.
- Removed unnecessary tensor reset for non-NPU inference and replaced it with NPU-specific tensor dimensions.
- Added a new encoder reshaping function and updated the decoder APIs to accept an additional shape parameter.
- Enhanced NPU configuration in utils with a dedicated whisper-specific setup.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/cpp/src/whisper/whisper.cpp | Updated tensor creation for NPU usage by setting batch dim to 1. |
| src/cpp/src/whisper/pipeline.cpp | Added static encoder reshaping and modified decoder instantiation. |
| src/cpp/src/whisper/models/statefull_decoder.{hpp,cpp} | Updated decoder constructor to accept the encoder hidden state shape. |
| src/cpp/src/whisper/models/decoder.{hpp,cpp} | Updated from_path API to require additional shape parameter. |
| src/cpp/src/utils.{hpp,cpp} | Modified compile_decoder_for_npu to handle Whisper-specific configurations. |
Comments suppressed due to low confidence (2)
src/cpp/src/whisper/models/statefull_decoder.hpp:11
- [nitpick] The class name 'WhisperStatefullDecoder' has a double 'l' in 'statefull'; consider renaming it to 'WhisperStatefulDecoder' for consistency and clarity.
class WhisperStatefullDecoder : public WhisperDecoder {
src/cpp/src/whisper/models/decoder.hpp:14
- [nitpick] Consider renaming the 'lhs_shape' parameter (introduced in the modified function signature) to 'encoder_hidden_state_shape' to improve clarity on what shape is being passed.
static std::shared_ptr<WhisperDecoder> from_path(const std::filesystem::path& models_path,
src/cpp/src/whisper/whisper.cpp
Outdated
| auto m_is_npu = true; | ||
| uint8_t batch_size = m_is_npu ? 1 : 0; |
Copilot
AI
Jun 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The variable 'm_is_npu' is hardcoded to true; if the NPU configuration is always expected, consider directly setting the batch size to 1 to simplify the logic.
| auto m_is_npu = true; | |
| uint8_t batch_size = m_is_npu ? 1 : 0; | |
| uint8_t batch_size = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to set batch size depending on used device
6a9be8b to
217d973
Compare
217d973 to
7153bda
Compare
7153bda to
7ed0695
Compare
| update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"}); | ||
| update_config(config, {"NPUW_FOLD", "NO"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable weight sharing for FP16 models:
| update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"}); | |
| update_config(config, {"NPUW_FOLD", "NO"}); | |
| update_config(config, {"NPUW_FUNCALL_FOR_ALL", "YES"}); | |
| update_config(config, {"NPUW_FOLD", "YES"}); | |
| update_config(config, {"NPUW_WEIGHTS_BANK", "whisper-shared"}); |
On top of that, for INT8-SYM:
update_config(config, {"NPUW_DQ", "YES"});
update_config(config, {"NPU_COMILER_DYNAMIC_QUANTIZATION", "YES"});
For asym, we need to consider a third option (CWAI?)
7ed0695 to
2109ceb
Compare
src/cpp/src/whisper/pipeline.cpp
Outdated
| if (device == "NPU") { | ||
| if (device == "NPU" && properties.count("STATIC_PIPELINE")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STATIC pipeline should remain default for NPU Whisper so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed STATIC_PIPELINE property currently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/cpp/src/whisper/whisper.cpp
Outdated
| // reset input tensor | ||
| request.set_tensor("input_features", ov::Tensor(ov::element::f32, {0, feature_size, nb_max_frames})); | ||
| auto devices = request.get_compiled_model().get_property(ov::execution_devices); | ||
| uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0; |
Copilot
AI
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using devices[0] assumes the execution_devices list is non-empty; prefer devices.front() and guard empty cases. Also, batch_size should use size_t (or at least unsigned int) for clarity since tensor shape indices are size_t; uint8_t can be confusing and risks unintended narrowing if later arithmetic is applied.
| uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0; | |
| size_t batch_size = 1; // Default batch size | |
| if (!devices.empty()) { | |
| batch_size = (devices.front() == "NPU") ? 1 : 0; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added assert and switched to size_t
|
Tests for stateless models fail: https://github.com/openvinotoolkit/openvino.genai/actions/runs/18727271892/job/53418373613?pr=2126 should be added to original with past model: https://github.com/openvinotoolkit/openvino.genai/pull/2126/files#diff-e9a19fecb7ef8f410831fddd75ee0c086b6cdf3cb1e32a81dfed1f8387a61cfdR1057 If not appropriate tests should be removed. |
|
Done! Thanks a lot @as-suvorov for a lot of help! |
| if (!is_initial_step) { | ||
| ov::Tensor cache_position_tensor = request.get_tensor("cache_position"); | ||
| cache_position_tensor.set_shape({1}); | ||
| cache_position_tensor.data<int64_t>()[0] = m_cache_position; | ||
| if (m_has_cache_position) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!is_initial_step) { | |
| ov::Tensor cache_position_tensor = request.get_tensor("cache_position"); | |
| cache_position_tensor.set_shape({1}); | |
| cache_position_tensor.data<int64_t>()[0] = m_cache_position; | |
| if (m_has_cache_position) { | |
| if (!is_initial_step && m_has_cache_position) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to address non static with past decoder as well, please revert remove of stateless tests then: bd178b4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done!
6508f95 to
04c89a2
Compare
d3841ed to
a303604
Compare
|
To summarize. At this point we have support of non cache_position input models for static and non static pipelines and for with_past and non with_past models. optimum-intel is on commit which non cache_position models. I expect all whisper tests to pass now. We wait for whisper tests to pass and then we revert optimum-intel and transformers deps. |
|
@AsyaPronina whisper tests passed, do you plan to revert optimum and transformers deps? |
|
@AsyaPronina @Wovchena @as-suvorov what's the matter with the tests here, are you aware of the issue/looking for the resolution? |
|
@dmatveev Whisper pipelines were aligned with non cache_position input models in this PR. We updated optimum-intel and transformers versions as in #2611. We checked only whisper tests with this updated deps and they are passed. Now we need either to revert optimum-intel and transformers versions to master or wait for #2611 to merge. |
5b05799
Ticket: 174805