Add intel simd #1703

Raimo33 · 2025-07-18T11:26:24Z

This adds sse2, avx2 and avx512 support to the library in general, wherever it yields an improvement as per the benchmarks.
As discussed in #1700

Raimo33 · 2025-07-18T11:48:16Z

another CI flow should be added for compiling with simd flags. Not all together. one with each:

-msse2
-mavx2
-mavx512f

Raimo33 · 2025-07-18T11:58:09Z

arm has different SIMD instruction set, called NEON (https://developer.arm.com/architectures/instruction-sets/intrinsics/).
would be nice to have a separate PR implementing that as well. Maybe after this is merged

Raimo33 · 2025-07-18T15:18:09Z

To precompute simd constants at the start, the best solution I found was doing something like this:

#ifdef __AVX512F__
  static __m512i _512_vec_ones;
  static __m512i _512_vec_zeros;
#endif

#ifdef __AVX2__
  static __m256i _256_vec_ones;
  static __m256i _256_vec_zeros;
#endif

#ifdef __SSE2__
  static __m128i _128_vec_ones;
  static __m128i _128_vec_zeros;
#endif

CONSTRUCTOR void ff_deserializer_init(void)
{
#ifdef __AVX512F__
  _512_vec_ones   = _mm512_set1_epi8('1');
  _512_vec_zeros  = _mm512_set1_epi8('0');
  _512_vec_equals = _mm512_set1_epi8('=');
#endif

#ifdef __AVX2__
  _256_vec_ones   = _mm256_set1_epi8('1');
  _256_vec_zeros  = _mm256_set1_epi8('0');
#endif

#ifdef __SSE2__
  _128_vec_ones   = _mm_set1_epi8('1');
  _128_vec_zeros  = _mm_set1_epi8('0');
#endif
}

where CONSTRUCTOR is __attribute__((constructor))

Raimo33 · 2025-07-18T17:00:38Z

I'm constantly getting these warnings. Apparently they're harmless since I always use loadu and storeu, but for some reason the compiler doesn't like them.

warning: cast increases required alignment of target type [-Wcast-align]
  653 |         _mm256_storeu_si256((__m256i *)r->v, out);

The only fixes I found are:

aligning everything to 64bytes (impossible, breaks even some of my avx logic)
suppress the warning globally
suppress the warning inline each time

Raimo33 changed the title ~~Add simd~~ Add intel simd Jul 18, 2025

Raimo33 and others added 16 commits August 2, 2025 00:35

Add simd to field5x52

d4cfad0

Fix declaration after statement

47e803d

Add MSVC bswap

3b4f8d2

Fix undeclared vars

c636aad

Fix header conflict in ARM

c963872

Add bswap define

63ec51b

Remove .vscode

ee535bf

Fix endianess dependent code

22e9772

Fix endian detection macro

c351b78

Optimize write_be and read_be

5d8a8f9

Optimize mul_cmp

d00ba2d

Optimize sha256_initialize, Add TODOs

58316e9

Add simd to scalar_4x64, Add TODOs

f974f03

Add TODOs

f16528f

Add TODOs [skip ci]

9756db9

Remove junk folders [skip ci]

4b21212

Raimo33 force-pushed the simd branch from 03ddbc2 to 4b21212 Compare August 1, 2025 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add intel simd #1703

Add intel simd #1703

Uh oh!

Raimo33 commented Jul 18, 2025 •

edited

Loading

Uh oh!

Raimo33 commented Jul 18, 2025 •

edited

Loading

Uh oh!

Raimo33 commented Jul 18, 2025

Uh oh!

Raimo33 commented Jul 18, 2025

Uh oh!

Raimo33 commented Jul 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add intel simd #1703

Are you sure you want to change the base?

Add intel simd #1703

Uh oh!

Conversation

Raimo33 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Raimo33 commented Jul 18, 2025

Uh oh!

Raimo33 commented Jul 18, 2025

Uh oh!

Raimo33 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Raimo33 commented Jul 18, 2025 •

edited

Loading

Raimo33 commented Jul 18, 2025 •

edited

Loading

Raimo33 commented Jul 18, 2025 •

edited

Loading