Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423

martin-frbg · 2025-08-18T08:58:33Z

eventually fixes #5414

… reserved on MacOS

…kernels

…ang compatibility

…t_performant for ARM64

matcraje · 2025-08-20T09:56:25Z

kernel/arm64/sgemm_direct_sme1_2VLx2VL.S

-#define C6           x22 //Constant6: N*SVLs
+#define C2           x19 //Constant2: N + SVLs
+#define C3           x20 //Constant3: K*SVLs + SVLs
+#define C4           x21 //Constant4: SVLs-2


Modifying x20 to x21 will require below dependent changes.
At line 65: sub w21, w21, #2
At line 202: cmp w13, w21

Oops, sorry, I had already corrected this locally but pushed the wrong version. Unfortunately this correction has no effect on the wrong xscblat3 test results seen for M odd (and contrary to my expectations this PR also does not fix the divergence between SGEMM and SGEMMT seen in test_sgemmt of utest/openblas_utest_ext that was flagged in #5414)

… gcc

…me as matrix dimension

…ly save them

matcraje · 2025-08-25T05:04:15Z

interface/gemm.c

+if (strcmp(gotoblas_corename(), "armv9sme") == 0 || strcmp(gotoblas_corename(), "vortexm4") == 0)
+// if (support_sme1())
+#endif
+  if (order == CblasRowMajor && m==lda && n ==ldb && k==ldc && beta == 0 && alpha == 1.0 && TransA == CblasNoTrans && TransB == CblasNoTrans&& SGEMM_DIRECT_PERFORMANT(m,n,k)) {


For RowMajor, shouldn't the leading dimension check be (lda==k && ldb==n && ldc==n) ?

normally yes but arguments have already been reshuffled at this point (I think - I'll recheck when I get back to this later this week)

matcraje · 2025-08-25T05:12:37Z

kernel/arm64/sgemm_direct_performant.c

+
+
+
+int CNAME(BLASLONG M, BLASLONG N, BLASLONG K)


So, this helper function checks for when the SME implementation wouldn't be performant?
Are these checks applicable explicitly for Apple M4 only?

In principle I'd expect them to be relevant for future SME hardware as well - I think it is unlikely that the direct path will outperform Goto's block algorithm at any matrix size and shape (and we should have an SME GEMM kernel compatible with that at some point - there already is a draft PR that only lacks the TRMM part).
This is just a quick copy of the x86_64 implementation for now, so numbers will need to be tuned once we're certain that the codes are correct

martin-frbg added 17 commits August 18, 2025 01:25

Rename sgemm_direct_sme1.S to sgemm_direct_sme1_2VLx2VL.S

ca22e28

Use ASMNAME to get symbol name from build system; leave x18 unused as…

22c6607

… reserved on MacOS

Add sgemm_direct_performant for switching between direct and regular …

89898fc

…kernels

Build symbol name from build system variables

08a0032

Get symbol name from build system; change b.first to b.mi for AppleCl…

53d3bb5

…ang compatibility

Add VORTEXM4 settings

731f4dd

Update ARM64 sgemm_direct object generation

e82bcd2

Add sgemm_direct_performant for ARM64

0203657

Move SGEMM_DIRECT after the CBLAS parameter check and add sgemm_direc…

de91afd

…t_performant for ARM64

Separate VORTEXM4 from VORTEX and ARMV9SME

202a7a0

Add sgemm_direct_performant for ARM64

e76c390

Add sgemm_direct_performant for ARM64

ef0b883

Enable SME on MacOS and add VORTEXM4 to DYNAMIC_ARCH list

ccfd017

Add minimal compiler flags for VORTEXM4

b0a00fb

Add VORTEXM4 target

3097046

Split VORTEXM4 from VORTEX target due to SME support

4e2a8c1

Add VORTEXM4

18f9582

martin-frbg added this to the 0.3.31 milestone Aug 18, 2025

martin-frbg mentioned this pull request Aug 18, 2025

Support for SGEMM_DIRECT Kernel based on SME1 #5084

Merged

martin-frbg added 10 commits August 18, 2025 08:41

Add VORTEXM4

ca542f3

Add compiler options for VORTEXM4

a4f5fec

Add VORTEXM4

c794d0a

relax requirements in compiler SME capability check

4328c91

Add compiler options for VORTEXM4

426b5f2

Update SME kernel details

0bc19a1

Add VORTEXM4 to DYNAMIC_ARCH list

bf98e44

Relax version number requirement for AppleClang

4609732

Delete misplaced file

05dbb54

Update SME-related kernels

107c883

matcraje reviewed Aug 20, 2025

View reviewed changes

martin-frbg added 8 commits August 20, 2025 06:24

adjust register 20 accesses to 21 after moving x18

501728a

Hide the local 2VLx2VL symbol as static is insufficient for this with…

edaa73f

… gcc

Add VORTEXM4

1ee8879

smh-based direct sgemm currently requires leading dimensions to be sa…

7f89c6f

…me as matrix dimension

Add d8 to d15 to clobber lists as the code does not expressly save them

8e50b8d

Add registers d8 to d15 to clobber lists as the code does not express…

b4fc09e

…ly save them

remove debugging printouts

1b88c9c

remove debugging printout

2b5d8c7

matcraje reviewed Aug 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423

Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423

Uh oh!

martin-frbg commented Aug 18, 2025 •

edited

Loading

Uh oh!

matcraje Aug 20, 2025

Uh oh!

martin-frbg Aug 20, 2025 •

edited

Loading

Uh oh!

matcraje Aug 25, 2025

Uh oh!

martin-frbg Aug 25, 2025

Uh oh!

matcraje Aug 25, 2025

Uh oh!

martin-frbg Aug 25, 2025

Uh oh!

Uh oh!

Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423

Are you sure you want to change the base?

Split VORTEXM4 from VORTEX target and fix SGEMM_DIRECT support for SME-capable targets #5423

Uh oh!

Conversation

martin-frbg commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matcraje Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

martin-frbg Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matcraje Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

martin-frbg Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

matcraje Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

martin-frbg Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martin-frbg commented Aug 18, 2025 •

edited

Loading

martin-frbg Aug 20, 2025 •

edited

Loading