Skip to content

Conversation

vshampor
Copy link
Contributor

@vshampor vshampor commented Aug 25, 2025

Details:

  • Implements the XAttention operation using STL and OV reference operations.
  • Meant to be used for testing and debugging purposes only - does not bring performance benefits if used on its own. Actual HW-accelerated implementation would be using this code as reference.
  • [DO NOT MERGE] Example of the XAttention integration in the CPU plugin #31955 contains an example of integrating with the CPU plugin.

Tickets:

@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin labels Aug 25, 2025
@vshampor vshampor force-pushed the xattn_reference branch 2 times, most recently from 9a119ee to e29e853 Compare September 1, 2025 14:26
@github-actions github-actions bot removed the category: CPU OpenVINO CPU plugin label Sep 3, 2025
@vshampor vshampor force-pushed the xattn_reference branch 2 times, most recently from 460a365 to 6be527b Compare September 3, 2025 13:12
@vshampor vshampor marked this pull request as ready for review September 3, 2025 13:13
@vshampor vshampor requested a review from a team as a code owner September 3, 2025 13:13
@yuxu42 yuxu42 requested a review from liubo-intel September 4, 2025 01:03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference tests should be stored in:
src/plugins/template/tests/functional/op_reference/

as others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of these tests. All of these use utilize unnecessary abstractions of OV opset ops and involve the sample plugin to do unit tests. Seeing that there will never be a counterpart to this operation in the OV opset, there is no reason to proliferate the bad design decision to have all reference ops tested through the template plugin just for the sake of consistency.

Comment on lines +72 to +77
OPENVINO_ASSERT(input_shape.size() == 3); // [num_heads, num_tokens, head_size]
OPENVINO_ASSERT(out_shape.size() == 3);
OPENVINO_ASSERT(input_shape[0] == out_shape[0]);
OPENVINO_ASSERT(input_shape[1] % m_stride == 0);
OPENVINO_ASSERT(input_shape[1] / m_stride == out_shape[1]);
OPENVINO_ASSERT(input_shape[2] * m_stride == out_shape[2]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these assert. the shapes and input validation is done during shape inference in operator implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant since there is no shape inference involved by design

* @param out_shape Shape of the output tensor data. Expected shape is strictly equal to
* `reshaped_qk_product_shape`.
*/
void softmax(const T* reshaped_qk_product_data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is required as it is just call for ov::reference::softmax(reshaped_qk_product_data, out, reshaped_qk_product_shape, {2});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reference operation. It is designed to be understandable and testable first and foremost. I provide a set of functions that each directly map to a phase or sub-phase of the intended HW-accelerated kernel. Softmax is one of these phases and therefore has a separate function. It also has an extra shape check, so having a function wrapper has some utility.

@vshampor vshampor requested a review from l-bat September 5, 2025 09:06
}
}

/** Selects the elements of the input tensor along the last two dimensions, indepently along the first dimension, so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Selects the elements of the input tensor along the last two dimensions, indepently along the first dimension, so
/** Selects the elements of the input tensor along the last two dimensions, independently along the first dimension, so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vshampor vshampor requested a review from praasz September 24, 2025 09:20
@vshampor vshampor added this pull request to the merge queue Sep 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Core OpenVINO Core (aka ngraph)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants