Skip to content

Releases: DLTcollab/sse2neon

v1.9.1

01 Jan 11:47

Choose a tag to compare

What's Changed

  • CI: Fix clang-18 installation on ARM64 runner by @jserv in #754
  • Remove defined(_BIG_ENDIAN) from endianness check by @clausecker in #753
  • Hoist AES constants to file scope by @jserv in #755
  • CI: Add Android NDK build verification by @jserv in #756
  • Add libFuzzer-based fuzz testing infrastructure by @jserv in #757
  • Eliminate old-style cast warnings for C++ compilation by @jserv in #758

Full Changelog: v1.9.0...v1.9.1

v1.9.0

22 Dec 03:37

Choose a tag to compare

Caveats

New Features

  • Windows Arm64EC support
  • _mm_monitor, _mm_mwait SSE3 intrinsics

Optimizations

  • _mm_aesdec_si128 T-table optimization
  • _mm_crc32_u16/u32/u64 on Armv8-A 32-bit
  • Vectorized equal_any, EQUAL_ORDERED, RANGES aggregation for SSE4.2 String operations
  • _mm_movemask_epi8 rework with vaddv
  • _mm_dp_ps, _mm_cvttpd_pi32 vectorization

Bug Fixes

  • _mm_round_ps/pd preserve infinity/NaN on Arm32
  • Rounding mode handling corrections
  • _mm_cmpestra off-by-one boundary
  • _mm_aesdeclast_si128 bus error on ARMv7
  • _mm_{malloc,free} mismatch with LLVM/MinGW
  • Strict aliasing violations in ARMv7

What's Changed

  • Fix mismatched mm{malloc,free} with LLVM/MinGW by @invertego in #659
  • Enable -Wconversion by @marcin-serwin in #661
  • Fix missing parenthesis by @coco875 in #662
  • Add parentheses for macro parameters by @toxieainc in #663
  • Fix GitHub Action status badge by @Mes0903 in #664
  • Support Windows Arm64EC by @mcfi in #665
  • Fix Arm64EC build related to _mm_prefetch by @mcfi in #666
  • Update README.md with Windows Arm64EC instructions by @mcfi in #667
  • Run more tests when building with modern GCC by @rathann in #672
  • Rename _() macro to _SSE2NEON() to avoid conflict with gettext by @rouault in #675
  • Improve readability of _mm_movemask_epi8 by @ThisAccountHasBeenSuspended in #674
  • Improve readability of _mm_movemask_pi8 & _mm_movemask_ps by @ThisAccountHasBeenSuspended in #677
  • CI: Update dependencies by @jserv in #680
  • Fix bug with _mm_round and _MM_FROUND_NO_EXC by @mg-mburetorp in #679
  • Fix problem with _MM_GET_ROUNDING_MODE on certain platforms by @mg-mburetorp in #678
  • Migrate the code to use C++ style cast operators by @marcin-serwin in #671
  • Fix variable shadowing by @jserv in #681
  • Optimize aesdeclast on Armv7-A by @Cuda-Chen in #682
  • Fix Bus error in _mm_aesdeclast_si128 on ARMv7 by @jserv in #683
  • Fix variable shadowing in _mm_aeskeygenassist_si128 by @jserv in #684
  • Fix missing target pragma for ARMv8-A 32-bit by @jserv in #685
  • Optimize _mm_shuffle_ps_3202 with vtrn by @jserv in #686
  • Bump clang-format version requirement to 20+ by @jserv in #687
  • Extend _mm_setcsr/_mm_getcsr to handle FZ and DAZ modes by @jserv in #688
  • Remove obsolete FIXME in _mm_mulhi_epi16 by @jserv in #689
  • Optimize _mm_aesdec_si128 ARMv7-A with T-table by @jserv in #690
  • Consolidate platform detection by @jserv in #691
  • Reduce redundant vreinterpret calls by @jserv in #692
  • Add comprehensive precision flag documentation by @jserv in #693
  • Add IEEE-754 floating-point edge case tests by @jserv in #694
  • CI: Add UBSan support by @jserv in #695
  • CI: Add clang-tidy static analysis by @jserv in #697
  • Add performance tier analysis by @jserv in #698
  • Enhance tier analysis with Clang AST by @jserv in #699
  • Improve _mm_dp_ps with vectorized AArch64 path by @jserv in #700
  • Vectorize _mm_cvttpd_pi32 for AArch64 by @jserv in #701
  • Implement _SIDD_MASKED_POSITIVE_POLARITY by @jserv in #702
  • Vectorize equal_any aggregation for SSE4.2 string by @jserv in #703
  • Rework _mm_movemask_epi8 with vaddv horizontal by @jserv in #704
  • Add cast validation script with strict aliasing by @jserv in #706
  • Refactor validation calls to array-based macros by @jserv in #705
  • Vectorize EQUAL_ORDERED using vextq diagonal by @jserv in #707
  • Improve RANGES aggregation using vrev for pair-AND by @jserv in #709
  • Refactor AES S-box lookups by @jserv in #710
  • Optimize _mm_crc32_u16 on Armv8-A 32-bit platform by @Cuda-Chen in #708
  • CI: Fix perf-tier comment posting for fork PRs by @jserv in #711
  • Optimize CRC-32C base macro by @Cuda-Chen in #713
  • Fix strict aliasing violations in ARMv7 by @jserv in #714
  • Enforce minimum compiler versions by @jserv in #715
  • Optimize _mm_crc32_u32 on Armv8-A 32-bit platform by @Cuda-Chen in #716
  • Remove obsolete optimization warning by @jserv in #717
  • Fix typo in _mm_movehl_ps architecture check by @jserv in #718
  • Add compile-time constant validation for immediate by @jserv in #719
  • Standardize preprocessor macro usage by @jserv in #720
  • Add compile-time guard for little-endian requirement by @jserv in #722
  • Fix _mm_round_ps to preserve infinity/NaN on Arm32 by @jserv in #724
  • CI: Improve workflows for speed and coverage by @jserv in #725
  • Fix -Wconversion warnings in SSE4.2 by @jserv in #726
  • Fix parenthesis mismatch in CRC32C crypto fallback by @jserv in #727
  • Optimize ARMv7 horizontal reduction in PCMPXSTR by @jserv in #728
  • Add compile-time range validation by @jserv in #729
  • Improve readability of variable shift clamping by @jserv in #730
  • Optimize _mm_crc32_u64 on Armv8-A 32-bit platform by @Cuda-Chen in #723
  • Fix float-to-integer conversion saturation for x86 by @jserv in #731
  • Add denormal flush-to-zero mode tests by @jserv in #732
  • Refine Makefile convenience targets by @jserv in #733
  • Add iOS compatibility wrapper for vcreate_u64 by @jserv in #735
  • Add differential testing for x86/ARM semantic verification by @jserv in #736
  • CI: Add uninitialized variable warning checks by @jserv in #737
  • CI: Skip unnecessary clang installation by @jserv in #738
  • Add Windows ARM64EC CI coverage by @jserv in #734
  • CI: Do differential testing with scalar fallback by @jserv in #739
  • Fix incorrect NaN generation for FP tests by @jserv in #740
  • Add MXCSR exception flag macros for compatibility by @jserv in #741
  • Add coverage verification infrastructure by @jserv in #742
  • Add NIST FIPS 197 AES-256 test vectors by @jserv in #743
  • Add _mm_monitor and _mm_mwait SSE3 intrinsics by @jserv in #744
  • Improve non-temporal store/load stream intrinsics by @jserv in #745
  • Verify IEEE-754 signed zero by @jserv in #746
  • Implement _mm_undefin...
Read more

v1.8.0

25 Dec 16:09

Choose a tag to compare

What's Changed

  • Fix Clang showing incorrect GCC version warning by @brechtvl in #623
  • Restore options for precision of div/rcp/sqrt/rsqrt by @brechtvl in #626
  • Optimize CRC intrinisics for targets lacking of CRC extension by @Cuda-Chen in #627
  • test: Avoid errors when cross compile by gcc-8.3/9.2 by @howjmay in #625
  • Improve unsupported target message by @ankith26 in #630
  • Fix with _mm_div_ps when SSE2NEON_PRECISE_DIV=1 by @sergeyvfx in #631
  • Use unaligned data types for unaligned intrinsics by @Logikable in #632
  • fix: Fix uninitialized parameters by @howjmay in #636
  • fix: Disable optimization to avoid pontential errors by @howjmay in #640
  • Fix minor typos in the sse2neon header by @ankith26 in #641
  • fix: Fix strict-aliasing errors by @howjmay in #638
  • fix test_mm_dp_pd test by @alexorlov124 in #643
  • Add support for clang-cl on Windows by @anthony-linaro in #633
  • Fix performance regression after OPTNONE changes by @sergeyvfx in #646
  • Allow optimization and use fesetround(), fegetround() by @howjmay in #642
  • CI: Bump dependency by @jserv in #650
  • Allow to specify -DSSE2NEON_SUPPRESS_WARNINGS to avoid the #warning about optimization issues by @rouault in #651
  • Add _MM_SHUFFLE2() macro for shuffle parameter for _mm_shuffle_pd() by @rouault in #652
  • README.md: mention GDAL by @rouault in #654
  • CI: Update Arm GNU Toolchain and use Ubuntu 24.04 by @jserv in #656
  • Fix undefined mm{malloc,free} with LLVM/MinGW by @jserv in #657

New Contributors

Full Changelog: v1.7.0...v1.8.0

v1.7.0

25 Dec 20:41

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.6.0...v1.7.0

v1.6.0

26 Dec 08:02
31cb30b

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.1...v1.6.0

v1.5.1

02 May 21:56

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0

27 Nov 08:42

Choose a tag to compare

Around 94% of the SSE intrinsics are implemented in the release.
The rest of the unimplemented intrinsics are:

  • Exception related macros
  • _mm_clflush()
  • Memory barrier intrinsics
  • String comparison intrinsics