-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Snippets][CPU] Support static and dynamic offsets in JIT Gemm and GemmCopyB emitters #31375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Snippets][CPU] Support static and dynamic offsets in JIT Gemm and GemmCopyB emitters #31375
Conversation
cc4e8fb
to
c1ddb94
Compare
@chenhu-wang May I ask you to review this PR please since you're an author of these JIT emitters? |
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_copy_b_emitter.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_copy_b_emitter.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_copy_b_emitter.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_copy_b_emitter.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_emitter.cpp
Show resolved
Hide resolved
744ad13
to
36c75d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_emitter.cpp
Outdated
Show resolved
Hide resolved
@chenhu-wang had to revert gpr related change, because it crashes on some test suites. I have investigated the root cause. The stack should be 16 bytes aligned according to ARM spec:
|
This reverts commit 7dc7b92.
Thank you for sharing. |
|
||
namespace ov::intel_cpu::aarch64::utils { | ||
|
||
size_t get_buffer_cluster_id(const ov::snippets::lowered::ExpressionPort& port) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this helper for aarch64 and helper for x64 are the same, what's about to move them into one common file in emitters/snippets/utils/util.*
for example?
These helpers aren't depended on arch so can be reused on all platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// TODO: support for long offsets is required for MatMulBias tests with bigger shapes | ||
retVector.emplace_back(R"(smoke_Snippets_MatMulBias.*1023.*)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it functional problem or limitation? Can we handle it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, this problem should be resolved in this PR. Let's do our best to find root cause and fix it 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, no
[ RUN ] smoke_Snippets_MatMulBias/MatMulBias.CompareWithRefImpl/IS[0]=[]_([1.2.95.1023])_IS[1]=[]_([1.2.1023.255])_IS[2]=[]_([1.2.95.255])_T[0]=f32_T[1]=f32_MatMul_#N=1_#S=1_targetDevice=CPU
MEM_USAGE=5129428KB
bad err=15 in Xbyak::Error
src/tests/functional/base_func_tests/src/base/ov_subgraph.cpp:97: Failure
Exception from src/inference/src/cpp/core.cpp:112:
Exception from src/inference/src/dev/plugin.cpp:53:
illegal immediate parameter (range error)
unknown file: Failure
C++ exception with description "Exception from src/inference/src/cpp/compiled_model.cpp:56:
CompiledModel was not initialized.
" thrown in the test body.
} | ||
|
||
// For non-zero offsets, apply the offset and then store | ||
Xbyak_aarch64::XReg temp_reg(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's is the magic number 4
? What's happened if ptr_reg
is XReg(4)
too? We can corrupt initial value in XReg(4)
.
To solve this problem, can we pass temp_reg
as the argument of the helper as it's done in push_ptr_with_runtime_offset_on_stack
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
// For large offsets, use a register to hold the offset value | ||
if (ptr_offset > 4095) { // 12-bit immediate limit for add instruction | ||
Xbyak_aarch64::XReg offset_reg(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment: we should avoid initialization of register by explicit numbers in any helpers since they might be called in any code parts. This is unsafe way if somewhere XReg(6)
is already live register during push_ptr_with_static_offset_on_stack
call
Please avoid any initialization of the register by explicit number in helpers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or can we just use TMP_X_0
for example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
size_t runtime_offset); | ||
|
||
/** | ||
* @brief Push data pointer on stack adding static offset `ptr_offset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the comment that these helpers don't allocate stack pointer - the user should guarantee allocated space on stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment should be in the both helpers: push_ptr_with_runtime_offset_on_stack
and push_ptr_with_static_offset_on_stack
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_copy_b_emitter.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied comments
} | ||
|
||
// For non-zero offsets, apply the offset and then store | ||
Xbyak_aarch64::XReg temp_reg(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
// For large offsets, use a register to hold the offset value | ||
if (ptr_offset > 4095) { // 12-bit immediate limit for add instruction | ||
Xbyak_aarch64::XReg offset_reg(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/plugins/intel_cpu/src/emitters/snippets/aarch64/jit_gemm_copy_b_emitter.cpp
Outdated
Show resolved
Hide resolved
1637662
to
9cef4e0
Compare
default: | ||
OV_CPU_JIT_EMITTER_THROW("Uknown type of expression port!"); | ||
} | ||
OV_CPU_JIT_EMITTER_ASSERT(IMPLICATION(ov::snippets::utils::is_dynamic_value(offset), id != SIZE_MAX), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor:
we can reuse existing implication
helper here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
size_t runtime_offset); | ||
|
||
/** | ||
* @brief Push data pointer on stack adding static offset `ptr_offset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment should be in the both helpers: push_ptr_with_runtime_offset_on_stack
and push_ptr_with_static_offset_on_stack
:)
|
||
// Load the runtime offset from abi_param1 (X0) and add it to the pointer | ||
Xbyak_aarch64::XReg abi_param1(0); | ||
Xbyak_aarch64::XReg offset_reg(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see magic XReg(4)
here. Please avoid any initializations of registers by const numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
h->mov(temp_reg, ptr_reg); | ||
h->add_imm(temp_reg, temp_reg, ptr_offset, Xbyak_aarch64::XReg(h->X_TMP_1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h->mov(temp_reg, ptr_reg); | |
h->add_imm(temp_reg, temp_reg, ptr_offset, Xbyak_aarch64::XReg(h->X_TMP_1)); | |
h->add_imm(temp_reg, ptr_reg, ptr_offset, Xbyak_aarch64::XReg(h->X_TMP_1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
// Load the offset value from the runtime parameter location | ||
h->add_imm(temp_reg, abi_param1, runtime_offset, Xbyak_aarch64::XReg(h->X_TMP_1)); | ||
h->ldr(offset_reg, Xbyak_aarch64::ptr(temp_reg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to replace offset_reg
with temp_reg
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// TODO: support for long offsets is required for MatMulBias tests with bigger shapes | ||
retVector.emplace_back(R"(smoke_Snippets_MatMulBias.*1023.*)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed offline, this problem should be resolved in this PR. Let's do our best to find root cause and fix it 😊
Details:
m_memory_offsets
,m_buffer_ids
) to AArch64 Gemm/GemmCopyB emittersTickets: