Skip to content

Conversation

@eshiryae
Copy link
Contributor

@eshiryae eshiryae commented Apr 28, 2025

Ticket: 174805

@github-actions github-actions bot added the category: whisper Whisper pipeline label Apr 28, 2025
@ilya-lavrenov ilya-lavrenov added the category: NPU NPU related topics label May 1, 2025
Comment on lines 19 to 24
if (device.find("NPU") != std::string::npos) {
m_is_npu = true;
}

ov::CompiledModel compiled_model;
if (m_is_npu) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (device.find("NPU") != std::string::npos) {
m_is_npu = true;
}
ov::CompiledModel compiled_model;
if (m_is_npu) {
ov::CompiledModel compiled_model;
// npu device
if (device.find("NPU") != std::string::npos) {

m_is_npu member seems redundant as used in ctor only

Copy link
Collaborator

@as-suvorov as-suvorov May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why .find is needed can't we just compare device == "NPU"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


// reset input tensor
request.set_tensor("input_features", ov::Tensor(ov::element::f32, {0, feature_size, nb_max_frames}));
auto m_is_npu = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix this, how to get here that pipeline is running on NPU?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this reset isn't needed at all? (I don't remember the implementation details)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @as-suvorov
Reset is needed here as request refers to mel_data that can be destroyed later (details in #789 (comment))

@eshiryae
Copy link
Contributor Author

eshiryae commented Jun 26, 2025

Need to merge these changes first:
openvinotoolkit/openvino#31643

@Wovchena Wovchena requested a review from Copilot June 26, 2025 12:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR switches the Whisper implementation to use the ov::genai::WhisperStatefulImpl variant when running on NPU, and updates the associated pipelines and decoder models accordingly.

  • Removed unnecessary tensor reset for non-NPU inference and replaced it with NPU-specific tensor dimensions.
  • Added a new encoder reshaping function and updated the decoder APIs to accept an additional shape parameter.
  • Enhanced NPU configuration in utils with a dedicated whisper-specific setup.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/cpp/src/whisper/whisper.cpp Updated tensor creation for NPU usage by setting batch dim to 1.
src/cpp/src/whisper/pipeline.cpp Added static encoder reshaping and modified decoder instantiation.
src/cpp/src/whisper/models/statefull_decoder.{hpp,cpp} Updated decoder constructor to accept the encoder hidden state shape.
src/cpp/src/whisper/models/decoder.{hpp,cpp} Updated from_path API to require additional shape parameter.
src/cpp/src/utils.{hpp,cpp} Modified compile_decoder_for_npu to handle Whisper-specific configurations.
Comments suppressed due to low confidence (2)

src/cpp/src/whisper/models/statefull_decoder.hpp:11

  • [nitpick] The class name 'WhisperStatefullDecoder' has a double 'l' in 'statefull'; consider renaming it to 'WhisperStatefulDecoder' for consistency and clarity.
class WhisperStatefullDecoder : public WhisperDecoder {

src/cpp/src/whisper/models/decoder.hpp:14

  • [nitpick] Consider renaming the 'lhs_shape' parameter (introduced in the modified function signature) to 'encoder_hidden_state_shape' to improve clarity on what shape is being passed.
    static std::shared_ptr<WhisperDecoder> from_path(const std::filesystem::path& models_path,

Comment on lines 215 to 216
auto m_is_npu = true;
uint8_t batch_size = m_is_npu ? 1 : 0;
Copy link

Copilot AI Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The variable 'm_is_npu' is hardcoded to true; if the NPU configuration is always expected, consider directly setting the batch size to 1 to simplify the logic.

Suggested change
auto m_is_npu = true;
uint8_t batch_size = m_is_npu ? 1 : 0;
uint8_t batch_size = 1;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to set batch size depending on used device

@eshiryae eshiryae marked this pull request as ready for review July 11, 2025 14:28
@eshiryae eshiryae force-pushed the b_whisper_unification branch from 6a9be8b to 217d973 Compare August 11, 2025 10:39
@github-actions github-actions bot added the category: Whisper samples GenAI Whisper samples label Aug 11, 2025
@eshiryae eshiryae force-pushed the b_whisper_unification branch from 217d973 to 7153bda Compare August 11, 2025 10:43
@github-actions github-actions bot removed the category: Whisper samples GenAI Whisper samples label Aug 11, 2025
@eshiryae eshiryae force-pushed the b_whisper_unification branch from 7153bda to 7ed0695 Compare September 24, 2025 10:08
Comment on lines +104 to +105
update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"});
update_config(config, {"NPUW_FOLD", "NO"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable weight sharing for FP16 models:

Suggested change
update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"});
update_config(config, {"NPUW_FOLD", "NO"});
update_config(config, {"NPUW_FUNCALL_FOR_ALL", "YES"});
update_config(config, {"NPUW_FOLD", "YES"});
update_config(config, {"NPUW_WEIGHTS_BANK", "whisper-shared"});

On top of that, for INT8-SYM:

    update_config(config, {"NPUW_DQ", "YES"});
    update_config(config, {"NPU_COMILER_DYNAMIC_QUANTIZATION", "YES"});

For asym, we need to consider a third option (CWAI?)

@eshiryae eshiryae force-pushed the b_whisper_unification branch from 7ed0695 to 2109ceb Compare October 12, 2025 19:08
Comment on lines 157 to 181
if (device == "NPU") {
if (device == "NPU" && properties.count("STATIC_PIPELINE")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STATIC pipeline should remain default for NPU Whisper so far

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed STATIC_PIPELINE property currently

@Wovchena Wovchena requested a review from Copilot October 16, 2025 13:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

// reset input tensor
request.set_tensor("input_features", ov::Tensor(ov::element::f32, {0, feature_size, nb_max_frames}));
auto devices = request.get_compiled_model().get_property(ov::execution_devices);
uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0;
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using devices[0] assumes the execution_devices list is non-empty; prefer devices.front() and guard empty cases. Also, batch_size should use size_t (or at least unsigned int) for clarity since tensor shape indices are size_t; uint8_t can be confusing and risks unintended narrowing if later arithmetic is applied.

Suggested change
uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0;
size_t batch_size = 1; // Default batch size
if (!devices.empty()) {
batch_size = (devices.front() == "NPU") ? 1 : 0;
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added assert and switched to size_t

@AsyaPronina AsyaPronina changed the title Switch NPU Whisper to ov::genai::WhisperStatefulImpl Enable NPU in ov::genai::WhisperStatefulImpl, handle missed "cache_position" Oct 17, 2025
@AsyaPronina AsyaPronina changed the title Enable NPU in ov::genai::WhisperStatefulImpl, handle missed "cache_position" Enable NPU in ov::genai::WhisperStatefulImpl, handle removed "cache_position" input Oct 17, 2025
@AsyaPronina AsyaPronina changed the title Enable NPU in ov::genai::WhisperStatefulImpl, handle removed "cache_position" input Enable ov::genai::WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3 Oct 18, 2025
@AsyaPronina AsyaPronina changed the title Enable ov::genai::WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3 Enable WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3 Oct 18, 2025
@AsyaPronina AsyaPronina changed the title Enable WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3 Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 Oct 18, 2025
@dmatveev dmatveev added this to the 2025.4 milestone Oct 18, 2025
@as-suvorov
Copy link
Collaborator

Tests for stateless models fail: https://github.com/openvinotoolkit/openvino.genai/actions/runs/18727271892/job/53418373613?pr=2126
Do you plan to support stateless models? If yes I guess

    if (!ov::genai::utils::input_exists(decoder_with_past_model, "cache_position")) {
        add_cache_position_input(decoder_with_past_model);
    }

should be added to original with past model: https://github.com/openvinotoolkit/openvino.genai/pull/2126/files#diff-e9a19fecb7ef8f410831fddd75ee0c086b6cdf3cb1e32a81dfed1f8387a61cfdR1057

If not appropriate tests should be removed.
@AsyaPronina

@AsyaPronina
Copy link
Contributor

Done! Thanks a lot @as-suvorov for a lot of help!

Comment on lines 115 to 116
if (!is_initial_step) {
ov::Tensor cache_position_tensor = request.get_tensor("cache_position");
cache_position_tensor.set_shape({1});
cache_position_tensor.data<int64_t>()[0] = m_cache_position;
if (m_has_cache_position) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!is_initial_step) {
ov::Tensor cache_position_tensor = request.get_tensor("cache_position");
cache_position_tensor.set_shape({1});
cache_position_tensor.data<int64_t>()[0] = m_cache_position;
if (m_has_cache_position) {
if (!is_initial_step && m_has_cache_position) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to address non static with past decoder as well, please revert remove of stateless tests then: bd178b4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done!

@AsyaPronina AsyaPronina force-pushed the b_whisper_unification branch from 6508f95 to 04c89a2 Compare October 23, 2025 12:18
@dmatveev dmatveev changed the title Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 Oct 23, 2025
@AsyaPronina AsyaPronina force-pushed the b_whisper_unification branch from d3841ed to a303604 Compare October 23, 2025 12:39
@as-suvorov
Copy link
Collaborator

To summarize. At this point we have support of non cache_position input models for static and non static pipelines and for with_past and non with_past models. optimum-intel is on commit which non cache_position models. I expect all whisper tests to pass now. We wait for whisper tests to pass and then we revert optimum-intel and transformers deps.

@moslex moslex added the priority: high High piority label Oct 24, 2025
@as-suvorov
Copy link
Collaborator

@AsyaPronina whisper tests passed, do you plan to revert optimum and transformers deps?

@dmatveev
Copy link
Contributor

@AsyaPronina @Wovchena @as-suvorov what's the matter with the tests here, are you aware of the issue/looking for the resolution?

@as-suvorov
Copy link
Collaborator

as-suvorov commented Oct 24, 2025

@dmatveev Whisper pipelines were aligned with non cache_position input models in this PR. We updated optimum-intel and transformers versions as in #2611. We checked only whisper tests with this updated deps and they are passed. Now we need either to revert optimum-intel and transformers versions to master or wait for #2611 to merge.

@Wovchena Wovchena enabled auto-merge October 25, 2025 10:26
@Wovchena Wovchena added this pull request to the merge queue Oct 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 25, 2025
@Wovchena Wovchena added this pull request to the merge queue Oct 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 25, 2025
@Wovchena Wovchena added this pull request to the merge queue Oct 25, 2025
Merged via the queue into openvinotoolkit:master with commit 5b05799 Oct 25, 2025
112 of 114 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants