Skip to content

Conversation

@maxnick
Copy link
Contributor

@maxnick maxnick commented Oct 16, 2025

Details:

In this PR we introduce yet another operation "GatherMatmu", which essentially does gemv operations over the current tokens and the active experts.
As the first step, we perform gemv operation using the dnnl::inner_product. But obviously this solution is suboptimal, as it doesn't give a fine grain control over parallelization, and in the case of many tokens being processed by a specific expert (prefill), having gemm operation may be more optimal as the tokens may be batched and we can do SIMD level parallelization by tokens as well.
Also this PR contains all the essential transformations that allow to enable a few common MoE patterns.

MoE pattern matcher is based on #32183

Related oneDNN fork PR: openvinotoolkit/oneDNN#292

Tickets:

@maxnick maxnick assigned maxnick and v-Golubev and unassigned maxnick Oct 16, 2025
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Oct 16, 2025
@maxnick maxnick force-pushed the cpu_moe_op_support branch from 7d01283 to a6d5711 Compare October 20, 2025 12:09
@maxnick maxnick added this to the 2025.4 milestone Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: transformations OpenVINO Runtime library - Transformations Code Freeze do_not_merge do_not_review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants