Skip to content

Conversation

yuanjia111
Copy link
Contributor

1. The following changes were made:
(1) Adjust LMUL=2 to LMUL=8;
(2) The optimization is for the scenario of inc_x=1, mainly to increase the parallel processing of n-directional data;
(3) Adjusted code format.

2. All BLAS tests passed:
image
image
image

3.The performance verified on K1 [C908, vlen = 256].
(1)Using the built-in benchmark for testing, the optimized performance data is as follows:
image
image
(2)The complete performance comparison data before and after optimization is as follows:
image
image

@yuanjia111 yuanjia111 changed the title Optimize the gemv_t_vector.c kernel for RISCV64_ZVL256B targets Optimize the gemv_t_vector.c kernel for RISCV64_ZVL256B target Aug 22, 2025
@martin-frbg martin-frbg added this to the 0.3.31 milestone Aug 24, 2025
@ChipKerchner
Copy link
Contributor

Great job!

@martin-frbg martin-frbg merged commit da7d0f4 into OpenMathLib:develop Aug 25, 2025
91 of 95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants