Skip to content

Add intel simd #1703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

Add intel simd #1703

wants to merge 16 commits into from

Conversation

Raimo33
Copy link

@Raimo33 Raimo33 commented Jul 18, 2025

This adds sse2, avx2 and avx512 support to the library in general, wherever it yields an improvement as per the benchmarks.
As discussed in #1700

@Raimo33
Copy link
Author

Raimo33 commented Jul 18, 2025

another CI flow should be added for compiling with simd flags. Not all together. one with each:

  1. -msse2
  2. -mavx2
  3. -mavx512f

@Raimo33 Raimo33 changed the title Add simd Add intel simd Jul 18, 2025
@Raimo33
Copy link
Author

Raimo33 commented Jul 18, 2025

arm has different SIMD instruction set, called NEON (https://developer.arm.com/architectures/instruction-sets/intrinsics/).
would be nice to have a separate PR implementing that as well. Maybe after this is merged

@Raimo33
Copy link
Author

Raimo33 commented Jul 18, 2025

To precompute simd constants at the start, the best solution I found was doing something like this:

#ifdef __AVX512F__
  static __m512i _512_vec_ones;
  static __m512i _512_vec_zeros;
#endif

#ifdef __AVX2__
  static __m256i _256_vec_ones;
  static __m256i _256_vec_zeros;
#endif

#ifdef __SSE2__
  static __m128i _128_vec_ones;
  static __m128i _128_vec_zeros;
#endif

CONSTRUCTOR void ff_deserializer_init(void)
{
#ifdef __AVX512F__
  _512_vec_ones   = _mm512_set1_epi8('1');
  _512_vec_zeros  = _mm512_set1_epi8('0');
  _512_vec_equals = _mm512_set1_epi8('=');
#endif

#ifdef __AVX2__
  _256_vec_ones   = _mm256_set1_epi8('1');
  _256_vec_zeros  = _mm256_set1_epi8('0');
#endif

#ifdef __SSE2__
  _128_vec_ones   = _mm_set1_epi8('1');
  _128_vec_zeros  = _mm_set1_epi8('0');
#endif
}

where CONSTRUCTOR is __attribute__((constructor))

@Raimo33
Copy link
Author

Raimo33 commented Jul 18, 2025

I'm constantly getting these warnings. Apparently they're harmless since I always use loadu and storeu, but for some reason the compiler doesn't like them.

warning: cast increases required alignment of target type [-Wcast-align]
  653 |         _mm256_storeu_si256((__m256i *)r->v, out);

The only fixes I found are:

  1. aligning everything to 64bytes (impossible, breaks even some of my avx logic)
  2. suppress the warning globally
  3. suppress the warning inline each time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant