Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 #2126

eshiryae · 2025-04-28T11:13:30Z

Ticket: 174805

as-suvorov · 2025-05-02T16:48:51Z

src/cpp/src/whisper/models/statefull_decoder.cpp

+    if (device.find("NPU") != std::string::npos) {
+        m_is_npu = true;
+    }
+
+    ov::CompiledModel compiled_model;
+    if (m_is_npu) {


Suggested change

if (device.find("NPU") != std::string::npos) {

m_is_npu = true;

}

ov::CompiledModel compiled_model;

if (m_is_npu) {

ov::CompiledModel compiled_model;

// npu device

if (device.find("NPU") != std::string::npos) {

m_is_npu member seems redundant as used in ctor only

And why .find is needed can't we just compare device == "NPU"?

eshiryae · 2025-06-26T10:11:25Z

src/cpp/src/whisper/whisper.cpp


    // reset input tensor
-    request.set_tensor("input_features", ov::Tensor(ov::element::f32, {0, feature_size, nb_max_frames}));
+    auto m_is_npu = true;


Need to fix this, how to get here that pipeline is running on NPU?

Maybe this reset isn't needed at all? (I don't remember the implementation details)

Discussed with @as-suvorov
Reset is needed here as request refers to mel_data that can be destroyed later (details in #789 (comment))

eshiryae · 2025-06-26T10:45:41Z

Need to merge these changes first:
openvinotoolkit/openvino#31643

Copilot

Pull Request Overview

This PR switches the Whisper implementation to use the ov::genai::WhisperStatefulImpl variant when running on NPU, and updates the associated pipelines and decoder models accordingly.

Removed unnecessary tensor reset for non-NPU inference and replaced it with NPU-specific tensor dimensions.
Added a new encoder reshaping function and updated the decoder APIs to accept an additional shape parameter.
Enhanced NPU configuration in utils with a dedicated whisper-specific setup.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/cpp/src/whisper/whisper.cpp	Updated tensor creation for NPU usage by setting batch dim to 1.
src/cpp/src/whisper/pipeline.cpp	Added static encoder reshaping and modified decoder instantiation.
src/cpp/src/whisper/models/statefull_decoder.{hpp,cpp}	Updated decoder constructor to accept the encoder hidden state shape.
src/cpp/src/whisper/models/decoder.{hpp,cpp}	Updated from_path API to require additional shape parameter.
src/cpp/src/utils.{hpp,cpp}	Modified compile_decoder_for_npu to handle Whisper-specific configurations.

Comments suppressed due to low confidence (2)

src/cpp/src/whisper/models/statefull_decoder.hpp:11

[nitpick] The class name 'WhisperStatefullDecoder' has a double 'l' in 'statefull'; consider renaming it to 'WhisperStatefulDecoder' for consistency and clarity.

class WhisperStatefullDecoder : public WhisperDecoder {

src/cpp/src/whisper/models/decoder.hpp:14

[nitpick] Consider renaming the 'lhs_shape' parameter (introduced in the modified function signature) to 'encoder_hidden_state_shape' to improve clarity on what shape is being passed.

    static std::shared_ptr<WhisperDecoder> from_path(const std::filesystem::path& models_path,

Copilot · 2025-06-26T12:29:56Z

src/cpp/src/whisper/whisper.cpp

+    auto m_is_npu = true;
+    uint8_t batch_size = m_is_npu ? 1 : 0;


[nitpick] The variable 'm_is_npu' is hardcoded to true; if the NPU configuration is always expected, consider directly setting the batch size to 1 to simplify the logic.

Suggested change

auto m_is_npu = true;

uint8_t batch_size = m_is_npu ? 1 : 0;

uint8_t batch_size = 1;

Fixed to set batch size depending on used device

dmatveev · 2025-09-25T14:19:44Z

src/cpp/src/utils.cpp

+    update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"});
+    update_config(config, {"NPUW_FOLD", "NO"});


To enable weight sharing for FP16 models:

Suggested change

update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"});

update_config(config, {"NPUW_FOLD", "NO"});

update_config(config, {"NPUW_FUNCALL_FOR_ALL", "YES"});

update_config(config, {"NPUW_FOLD", "YES"});

update_config(config, {"NPUW_WEIGHTS_BANK", "whisper-shared"});

On top of that, for INT8-SYM:

update_config(config, {"NPUW_DQ", "YES"}); update_config(config, {"NPU_COMILER_DYNAMIC_QUANTIZATION", "YES"});

For asym, we need to consider a third option (CWAI?)

dmatveev · 2025-10-16T13:27:54Z

src/cpp/src/whisper/pipeline.cpp

-    if (device == "NPU") {
+    if (device == "NPU" && properties.count("STATIC_PIPELINE")) {


STATIC pipeline should remain default for NPU Whisper so far

Removed STATIC_PIPELINE property currently

Copilot

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-16T13:43:12Z

src/cpp/src/whisper/whisper.cpp

    // reset input tensor
-    request.set_tensor("input_features", ov::Tensor(ov::element::f32, {0, feature_size, nb_max_frames}));
+    auto devices = request.get_compiled_model().get_property(ov::execution_devices);
+    uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0;


[nitpick] Using devices[0] assumes the execution_devices list is non-empty; prefer devices.front() and guard empty cases. Also, batch_size should use size_t (or at least unsigned int) for clarity since tensor shape indices are size_t; uint8_t can be confusing and risks unintended narrowing if later arithmetic is applied.

Suggested change

uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0;

size_t batch_size = 1; // Default batch size

if (!devices.empty()) {

batch_size = (devices.front() == "NPU") ? 1 : 0;

}

Added assert and switched to size_t

src/cpp/src/whisper/models/statefull_decoder.cpp

src/cpp/src/utils.cpp

src/cpp/src/whisper/pipeline.cpp

as-suvorov · 2025-10-23T08:01:17Z

Tests for stateless models fail: https://github.com/openvinotoolkit/openvino.genai/actions/runs/18727271892/job/53418373613?pr=2126
Do you plan to support stateless models? If yes I guess

    if (!ov::genai::utils::input_exists(decoder_with_past_model, "cache_position")) {
        add_cache_position_input(decoder_with_past_model);
    }

should be added to original with past model: https://github.com/openvinotoolkit/openvino.genai/pull/2126/files#diff-e9a19fecb7ef8f410831fddd75ee0c086b6cdf3cb1e32a81dfed1f8387a61cfdR1057

If not appropriate tests should be removed.
@AsyaPronina

AsyaPronina · 2025-10-23T12:03:07Z

Done! Thanks a lot @as-suvorov for a lot of help!

as-suvorov · 2025-10-23T12:10:29Z

src/cpp/src/whisper/models/with_past_decoder.cpp

    if (!is_initial_step) {
-        ov::Tensor cache_position_tensor = request.get_tensor("cache_position");
-        cache_position_tensor.set_shape({1});
-        cache_position_tensor.data<int64_t>()[0] = m_cache_position;
+        if (m_has_cache_position) {


Suggested change

if (!is_initial_step) {

ov::Tensor cache_position_tensor = request.get_tensor("cache_position");

cache_position_tensor.set_shape({1});

cache_position_tensor.data<int64_t>()[0] = m_cache_position;

if (m_has_cache_position) {

if (!is_initial_step && m_has_cache_position) {

If you want to address non static with past decoder as well, please revert remove of stateless tests then: bd178b4

Thanks! Done!

as-suvorov · 2025-10-23T13:29:53Z

To summarize. At this point we have support of non cache_position input models for static and non static pipelines and for with_past and non with_past models. optimum-intel is on commit which non cache_position models. I expect all whisper tests to pass now. We wait for whisper tests to pass and then we revert optimum-intel and transformers deps.

as-suvorov · 2025-10-24T11:56:11Z

@AsyaPronina whisper tests passed, do you plan to revert optimum and transformers deps?

dmatveev · 2025-10-24T12:24:11Z

@AsyaPronina @Wovchena @as-suvorov what's the matter with the tests here, are you aware of the issue/looking for the resolution?

as-suvorov · 2025-10-24T12:36:16Z

@dmatveev Whisper pipelines were aligned with non cache_position input models in this PR. We updated optimum-intel and transformers versions as in #2611. We checked only whisper tests with this updated deps and they are passed. Now we need either to revert optimum-intel and transformers versions to master or wait for #2611 to merge.

github-actions bot added the category: whisper Whisper pipeline label Apr 28, 2025

ilya-lavrenov assigned as-suvorov Apr 29, 2025

ilya-lavrenov added the category: NPU NPU related topics label May 1, 2025

as-suvorov reviewed May 2, 2025

View reviewed changes

eshiryae force-pushed the b_whisper_unification branch from a378dc9 to 470bc81 Compare June 26, 2025 10:07

github-actions bot added the no-match-files label Jun 26, 2025

eshiryae commented Jun 26, 2025

View reviewed changes

Wovchena requested a review from Copilot June 26, 2025 12:29

Copilot AI reviewed Jun 26, 2025

View reviewed changes

eshiryae marked this pull request as ready for review July 11, 2025 14:28

eshiryae force-pushed the b_whisper_unification branch from 6a9be8b to 217d973 Compare August 11, 2025 10:39

github-actions bot added the category: Whisper samples GenAI Whisper samples label Aug 11, 2025

eshiryae force-pushed the b_whisper_unification branch from 217d973 to 7153bda Compare August 11, 2025 10:43

github-actions bot removed the category: Whisper samples GenAI Whisper samples label Aug 11, 2025

eshiryae force-pushed the b_whisper_unification branch from 7153bda to 7ed0695 Compare September 24, 2025 10:08

dmatveev reviewed Sep 25, 2025

View reviewed changes

eshiryae force-pushed the b_whisper_unification branch from 7ed0695 to 2109ceb Compare October 12, 2025 19:08

dmatveev reviewed Oct 16, 2025

View reviewed changes

Wovchena requested a review from Copilot October 16, 2025 13:41

Copilot AI reviewed Oct 16, 2025

View reviewed changes

AsyaPronina reviewed Oct 17, 2025

View reviewed changes

src/cpp/src/whisper/pipeline.cpp Outdated Show resolved Hide resolved

AsyaPronina reviewed Oct 17, 2025

View reviewed changes

src/cpp/src/whisper/pipeline.cpp Outdated Show resolved Hide resolved

AsyaPronina changed the title ~~Switch NPU Whisper to ov::genai::WhisperStatefulImpl~~ Enable NPU in ov::genai::WhisperStatefulImpl, handle missed "cache_position" Oct 17, 2025

AsyaPronina changed the title ~~Enable NPU in ov::genai::WhisperStatefulImpl, handle missed "cache_position"~~ Enable NPU in ov::genai::WhisperStatefulImpl, handle removed "cache_position" input Oct 17, 2025

AsyaPronina changed the title ~~Enable NPU in ov::genai::WhisperStatefulImpl, handle removed "cache_position" input~~ Enable ov::genai::WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3 Oct 18, 2025

AsyaPronina changed the title ~~Enable ov::genai::WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3~~ Enable WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3 Oct 18, 2025

AsyaPronina changed the title ~~Enable WhisperStatefulImpl for NPU, fixes of StaticWhisperPipeline for transformers 4.53.3~~ Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 Oct 18, 2025

AsyaPronina added the Code Freeze label Oct 18, 2025

dmatveev added this to the 2025.4 milestone Oct 18, 2025

Merge branch 'master' into b_whisper_unification

6b728f5

Added handling of cache_position for stateless Whisper pipeline

7e29a70

Fixed model name to check cache_position input

6614e94

as-suvorov requested changes Oct 23, 2025

View reviewed changes

Polishing

04c89a2

AsyaPronina force-pushed the b_whisper_unification branch from 6508f95 to 04c89a2 Compare October 23, 2025 12:18

dmatveev changed the title ~~Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3~~ Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 Oct 23, 2025

Fixed review comments

a303604

AsyaPronina force-pushed the b_whisper_unification branch from d3841ed to a303604 Compare October 23, 2025 12:39

as-suvorov approved these changes Oct 23, 2025

View reviewed changes

moslex added the priority: high High piority label Oct 24, 2025

Reverted update of requirements.txt

816938a

github-actions bot removed category: GGUF GGUF file reader category: tests dependencies labels Oct 24, 2025

Merge branch 'master' into b_whisper_unification

86719ab

Wovchena enabled auto-merge October 25, 2025 10:26

Merge branch 'master' into b_whisper_unification

979470a

Wovchena added this pull request to the merge queue Oct 25, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 25, 2025

Wovchena added this pull request to the merge queue Oct 25, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 25, 2025

Wovchena added this pull request to the merge queue Oct 25, 2025

Merged via the queue into openvinotoolkit:master with commit 5b05799 Oct 25, 2025
112 of 114 checks passed

	auto m_is_npu = true;
	uint8_t batch_size = m_is_npu ? 1 : 0;
	uint8_t batch_size = 1;

		update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"});
		update_config(config, {"NPUW_FOLD", "NO"});

-    update_config(config, {"NPUW_FUNCALL_FOR_ALL", "NO"});
-    update_config(config, {"NPUW_FOLD", "NO"});
+    update_config(config, {"NPUW_FUNCALL_FOR_ALL", "YES"});
+    update_config(config, {"NPUW_FOLD", "YES"});
+    update_config(config, {"NPUW_WEIGHTS_BANK", "whisper-shared"});

		if (device == "NPU") {
		if (device == "NPU" && properties.count("STATIC_PIPELINE")) {

-    uint8_t batch_size = (devices[0] == "NPU") ? 1 : 0;
+    size_t batch_size = 1; // Default batch size
+    if (!devices.empty()) {
+        batch_size = (devices.front() == "NPU") ? 1 : 0;
+    }

Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 #2126

Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 #2126

Uh oh!

Conversation

eshiryae commented Apr 28, 2025 • edited by as-suvorov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

as-suvorov May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eshiryae commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

as-suvorov commented Oct 23, 2025

Uh oh!

AsyaPronina commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

as-suvorov commented Oct 23, 2025

Uh oh!

as-suvorov commented Oct 24, 2025

Uh oh!

dmatveev commented Oct 24, 2025

Uh oh!

as-suvorov commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

eshiryae commented Apr 28, 2025 •

edited by as-suvorov

Loading

as-suvorov May 2, 2025 •

edited

Loading

eshiryae commented Jun 26, 2025 •

edited

Loading

as-suvorov commented Oct 24, 2025 •

edited

Loading