Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E` #5426

hideaki-motoki · 2025-08-21T13:25:38Z

Resolves #5417.
This change improves the performance of [SD]AXPY on both A64FX and Graviton3E.
The graphs below show the single thread performance improvement of [D]AXPY on A64FX and Graviton3E, respectively.

The performance improved by 2.57 times on the A64FX and 1.13 times on the Graviton3E.
I have confirmed that this optimization also yields performance benefits for Level 2 BLAS kernels that utilize [SD]AXPY, such as [SD]SPMV and [SD]GER.

snadampal · 2025-08-21T16:29:36Z

kernel/arm64/KERNEL.NEOVERSEV1

@@ -32,6 +32,10 @@ SGEMVNKERNEL = gemv_n_sve_v1x3.c
 DGEMVNKERNEL = gemv_n_sve_v1x3.c
 SGEMVTKERNEL = gemv_t_sve_v1x3.c
 DGEMVTKERNEL = gemv_t_sve_v1x3.c
+
+SAXPYKERNEL = axpy_sve.c
+DAXPYKERNEL = axpy_sve.c


since you have used the SVL for the implementation instead of hardcoding the vector width, the kernel should work on NEOVERSEV2 as well. Please check this on Graviton4 and add it to KERNEL.NEOVERSEV2 as well.

snadampal · 2025-08-21T16:30:13Z

kernel/arm64/axpy_sve.c

+  BLASLONG sve_size = SV_COUNT();
+
+  if (n < 0) return (0);
+  if (da == 0.0) return (0);


why can't these two checks be combined into one?

Thank you for your comments.
There was another way you mentioned, but I followed kernel/arm/axpy.c#L45-L46.

snadampal · 2025-08-21T16:34:56Z

Hi @hideaki-motoki , thanks for the PR! I have added few comments.

hideaki-motoki added 2 commits August 21, 2025 20:56

Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E

855945b

Merge remote-tracking branch 'upstream/develop' into issue5417_axpy_sve

e23f9c6

martin-frbg added this to the 0.3.31 milestone Aug 21, 2025

snadampal reviewed Aug 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E` #5426

Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E` #5426

hideaki-motoki commented Aug 21, 2025

Uh oh!

snadampal Aug 21, 2025 •

edited

Loading

Uh oh!

snadampal Aug 21, 2025 •

edited

Loading

Uh oh!

hideaki-motoki Aug 22, 2025

Uh oh!

snadampal commented Aug 21, 2025

Uh oh!

Uh oh!

Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E #5426

Are you sure you want to change the base?

Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E #5426

Conversation

hideaki-motoki commented Aug 21, 2025

Uh oh!

snadampal Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snadampal Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hideaki-motoki Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

snadampal commented Aug 21, 2025

Uh oh!

Uh oh!

Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E` #5426

Implementing SVE in `[SD]AXPY` Kernels for `A64FX` and `Graviton3E` #5426

snadampal Aug 21, 2025 •

edited

Loading

snadampal Aug 21, 2025 •

edited

Loading