cpu: aarch64: prevent large post-op kernel generation#4857
cpu: aarch64: prevent large post-op kernel generation#4857
Conversation
|
https://github.com/uxlfoundation/oneDNN/actions/runs/23305833325 Passing nightly here. |
|
The nightly conv test suite is currently a bit rubbish (I need to fix it), would you be able to check a couple of shapes from the models files? E.g. yolo or resnet with some random post ops? The ref->brgconv will always be a win, but I just note that the the only brgconv->brgconv change is a regression (although it is a tiny shape, this could be noise) |
| Label mb_loop_end; | ||
| mov(x16, 0); | ||
| L(mb_loop_begin); | ||
| cmp(x16, mb); |
There was a problem hiding this comment.
This will fail if mb>4096 (see CMP docs). It would be safer to use mov_imm(x16) and sub down to zero.
There was a problem hiding this comment.
Should be fixed now. Thanks for spotting this!
|
I have rerun the benchmarks with batch composed of ResultsFiltered for changes of more than +/- 10%.
|
The generated post-op kernels were being unrolled excessively, leading to extremely large code sizes. This lead to exceptions being thrown during JIT assembly. This patch addresses this defect by using branching and looping instead of unrolling. Resolves: Issue #4089 Signed-off-by: Siddhartha Menon <siddhartha.menon@arm.com>
0bb7484 to
852d208
Compare
Description
Prevents very large post-op kernels from being created. These were causing exceptions to be thrown.
Fixes: Issue #4089
Benchmarks
Generated the test set with
./build/tests/benchdnn/benchdnn --conv --dt=bf16,f32 --stag=axb --dtag=axb --attr-post-ops=sum,sum+tanh,exp,sum+exp,gelu_erf --impl=brgconv:sve --batch=shapes_gemmand filtered for changes of more than +/- 10%.SVE-256:
No changes observed
SVE-128:
Speedups are mostly due to us being able to run previously skipped post-ops as they don't crash after this patch.
Checklist
General
make testandmake test_benchdnn_*) pass locally for each commit?Bug fixes