Conversation
|
I am reviewing this PR. |
| #define SET_VEC_SS(_I_) _I_ ; | ||
| #define ADD_VEC_SS(_I_,_J_) _I_ + _J_ ; | ||
| #define MUL_VEC_SS(_I_,_J_) _I_ * _J_ ; | ||
| #include <arm_fp16.h> |
There was a problem hiding this comment.
Q: Should this be underneath a #if defined?
|
|
||
| typedef __m128 SP_SCALAR_TYPE; | ||
| typedef __m128d DP_SCALAR_TYPE; | ||
| #if defined(AVX512_BF16_AVAIL) |
There was a problem hiding this comment.
Q: It appears that for a defined architecture e.g. x86 you have multiple #if defined's for AVX512_BF16_AVAIL and AVX512_FP16_AVAIL. Why take this approach and not place them all under one AVX512_BF16_AVAIL or AVX512_FP16_AVAIL?
| if ( PAPI_start( EventSet ) != PAPI_OK ) { | ||
| return -1; | ||
| } | ||
| if ( NULL != fp && PAPI_start( EventSet ) != PAPI_OK ) { |
| rB = MUL_VEC_SS(rB,rC); | ||
|
|
||
| /* Stop PAPI counters */ | ||
| if ( NULL != fp && PAPI_stop(EventSet, iterValues) != PAPI_OK ) { |
There was a problem hiding this comment.
Same as above. This is the last instance I will put such that I don't continue to clog up the review, but there are a few more instances that I will not comment on that will need to be changed if you do decide to return the error code returned from PAPI_stop.
| INCFLAGS=-I$(PAPI_DIR)/include | ||
| CFLAGS+=-g -Wall -Wextra | ||
| OPT0=-O0 | ||
| OPT1=-O1 |
There was a problem hiding this comment.
A behavior I noticed in this PR, but appears to occur in master is the following:
- On x86, build CAT for the first time. This will create
cat_collect. - Move to another architecture lets say aarch, go into the
counter_analysis_toolkitrunmake cleanand then runmake. Everything will build successfully. - When I attempt to run
./cat_collectI run into the following:
[tburgess@hopper1 counter_analysis_toolkit]$ ./cat_collect -in event_list.txt -out OUTPUT_DIRECTORY -flops
-bash: ./cat_collect: cannot execute binary file: Exec format error
I have to then remove cat_collect and re run make such that step 3 will run.
eaaa165 to
7f665b3
Compare
This new kernel structure accounts for speculative FLOPs. These changes have been tested on the ARM Neoverse V2 architecture.
This introduces kernels that use dot product intrinsics to trigger BF16 operations. These changes have been tested on the ARM Neoverse V2 and Intel Sapphire Rapids architectures.
7f665b3 to
3b25d11
Compare
Pull Request Description
This pull request:
These changes have been tested on the ARM Neoverse V2 and Intel Sapphire Rapids architectures.
Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short descriptionCommits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests