-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
martin-frbg
wants to merge
64
commits into
OpenMathLib:develop
Choose a base branch
from
martin-frbg:issue5414
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+359
−147
Open
Changes from 35 commits
Commits
Show all changes
64 commits
Select commit
Hold shift + click to select a range
ca22e28
Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S
martin-frbg 22c6607
Use ASMNAME to get symbol name from build system; leave x18 unused as…
martin-frbg 89898fc
Add sgemm_direct_performant for switching between direct and regular …
martin-frbg 08a0032
Build symbol name from build system variables
martin-frbg 53d3bb5
Get symbol name from build system; change b.first to b.mi for AppleCl…
martin-frbg 731f4dd
Add VORTEXM4 settings
martin-frbg e82bcd2
Update ARM64 sgemm_direct object generation
martin-frbg 0203657
Add sgemm_direct_performant for ARM64
martin-frbg de91afd
Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direc…
martin-frbg 202a7a0
Separate VORTEXM4 from VORTEX and ARMV9SME
martin-frbg e76c390
Add sgemm_direct_performant for ARM64
martin-frbg ef0b883
Add sgemm_direct_performant for ARM64
martin-frbg ccfd017
Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list
martin-frbg b0a00fb
Add minimal compiler flags for VORTEXM4
martin-frbg 3097046
Add VORTEXM4 target
martin-frbg 4e2a8c1
Split VORTEXM4 from VORTEX target due to SME support
martin-frbg 18f9582
Add VORTEXM4
martin-frbg ca542f3
Add VORTEXM4
martin-frbg a4f5fec
Add compiler options for VORTEXM4
martin-frbg c794d0a
Add VORTEXM4
martin-frbg 4328c91
relax requirements in compiler SME capability check
martin-frbg 426b5f2
Add compiler options for VORTEXM4
martin-frbg 0bc19a1
Update SME kernel details
martin-frbg bf98e44
Add VORTEXM4 to DYNAMIC_ARCH list
martin-frbg 4609732
Relax version number requirement for AppleClang
martin-frbg 05dbb54
Delete misplaced file
martin-frbg 107c883
Update SME-related kernels
martin-frbg 501728a
adjust register 20 accesses to 21 after moving x18
martin-frbg edaa73f
Hide the local 2VLx2VL symbol as static is insufficient for this with…
martin-frbg 1ee8879
Add VORTEXM4
martin-frbg 7f89c6f
smh-based direct sgemm currently requires leading dimensions to be sa…
martin-frbg 8e50b8d
Add d8 to d15 to clobber lists as the code does not expressly save them
martin-frbg b4fc09e
Add registers d8 to d15 to clobber lists as the code does not express…
martin-frbg 1b88c9c
remove debugging printouts
martin-frbg 2b5d8c7
remove debugging printout
martin-frbg fc516af
Merge branch 'develop' into issue5414
martin-frbg ba9d2d2
remove sme from M4 Fortran flags as gfortran couples it with sve
martin-frbg b3d0bc4
Update Makefile.L3
martin-frbg 4ae3e37
restore 2VLx2VL naming
martin-frbg c889558
Rework for DYNAMIC_ARCH use and use of SGEMM functions by SSYMM
martin-frbg 20f5ed1
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 47a66ae
Update limits based on benchmarking the SME code on Apple M4
martin-frbg 9bfc361
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 8211db6
Don't enable SME for VortexM4 when the compiler is gcc (which does no…
martin-frbg 2346d0b
Add HAVE_SME for VortexM4 only with non-gcc compilers
martin-frbg d7b0fcc
Enable SME-based kernels for VortexM4 with clang-based compilers only
martin-frbg 643a0b5
Allow VortexM4 on the direct_SME fast path only for clang-based compi…
martin-frbg e01b109
Allow VortexM4 on the same fast path only with non-gcc compilers
martin-frbg f4ee3ae
Allow VortexM4 on the SME fast path only with non-gcc compilers
martin-frbg 1b591ea
export HAVE_SME setting and exclude VortexM4 from DYNAMIC_ARCH if gcc…
martin-frbg 83d3e0e
fix copy/paste
martin-frbg 682f61e
Add prototype for gotoblas_corename
martin-frbg ea85b66
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 9c0965b
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 8c0b13c
Merge branch 'OpenMathLib:develop' into issue5414
martin-frbg 7d35bf6
Add cpuid for Apple M5 (from a PR to the archspec project)
martin-frbg 7e44f62
fix sequence of arm64 sgemm_direct_performance and sgemm_direct_ab
martin-frbg b0bd49a
Add compiler guard around the M4 HAVE_SME property
martin-frbg 4af1870
Only add dedicated VORTEXM4 if building with LLVM
martin-frbg b185c9a
small fixes for separating sme and dummy parts
martin-frbg a683287
rework for dynamic_arch
martin-frbg 705259c
remove redundant HAVE_SME
martin-frbg 7ab8dc1
rework ARM64 SME dependency handling
martin-frbg c3c857c
fix sequence
martin-frbg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -111,6 +111,7 @@ THUNDERX2T99 | |
| TSV110 | ||
| THUNDERX3T110 | ||
| VORTEX | ||
| VORTEXM4 | ||
| A64FX | ||
| ARMV8SVE | ||
| ARMV9SME | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For RowMajor, shouldn't the leading dimension check be (lda==k && ldb==n && ldc==n) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally yes but arguments have already been reshuffled at this point (I think - I'll recheck when I get back to this later this week)