Releases · DLTcollab/sse2neon

01 Jan 11:47

jserv

v1.9.1

92f6de1

v1.9.1 Latest

Latest

What's Changed

CI: Fix clang-18 installation on ARM64 runner by @jserv in #754
Remove defined(_BIG_ENDIAN) from endianness check by @clausecker in #753
Hoist AES constants to file scope by @jserv in #755
CI: Add Android NDK build verification by @jserv in #756
Add libFuzzer-based fuzz testing infrastructure by @jserv in #757
Eliminate old-style cast warnings for C++ compilation by @jserv in #758

Full Changelog: v1.9.0...v1.9.1

Contributors

clausecker and jserv

Assets 2

22 Dec 03:37

jserv

v1.9.0

8d1d9f1

v1.9.0

Caveats

New Features

Windows Arm64EC support
_mm_monitor, _mm_mwait SSE3 intrinsics

Optimizations

_mm_aesdec_si128 T-table optimization
_mm_crc32_u16/u32/u64 on Armv8-A 32-bit
Vectorized equal_any, EQUAL_ORDERED, RANGES aggregation for SSE4.2 String operations
_mm_movemask_epi8 rework with vaddv
_mm_dp_ps, _mm_cvttpd_pi32 vectorization

Bug Fixes

_mm_round_ps/pd preserve infinity/NaN on Arm32
Rounding mode handling corrections
_mm_cmpestra off-by-one boundary
_mm_aesdeclast_si128 bus error on ARMv7
_mm_{malloc,free} mismatch with LLVM/MinGW
Strict aliasing violations in ARMv7

What's Changed

Fix mismatched mm{malloc,free} with LLVM/MinGW by @invertego in #659
Enable -Wconversion by @marcin-serwin in #661
Fix missing parenthesis by @coco875 in #662
Add parentheses for macro parameters by @toxieainc in #663
Fix GitHub Action status badge by @Mes0903 in #664
Support Windows Arm64EC by @mcfi in #665
Fix Arm64EC build related to _mm_prefetch by @mcfi in #666
Update README.md with Windows Arm64EC instructions by @mcfi in #667
Run more tests when building with modern GCC by @rathann in #672
Rename _() macro to _SSE2NEON() to avoid conflict with gettext by @rouault in #675
Improve readability of _mm_movemask_epi8 by @ThisAccountHasBeenSuspended in #674
Improve readability of _mm_movemask_pi8 & _mm_movemask_ps by @ThisAccountHasBeenSuspended in #677
CI: Update dependencies by @jserv in #680
Fix bug with _mm_round and _MM_FROUND_NO_EXC by @mg-mburetorp in #679
Fix problem with _MM_GET_ROUNDING_MODE on certain platforms by @mg-mburetorp in #678
Migrate the code to use C++ style cast operators by @marcin-serwin in #671
Fix variable shadowing by @jserv in #681
Optimize aesdeclast on Armv7-A by @Cuda-Chen in #682
Fix Bus error in _mm_aesdeclast_si128 on ARMv7 by @jserv in #683
Fix variable shadowing in _mm_aeskeygenassist_si128 by @jserv in #684
Fix missing target pragma for ARMv8-A 32-bit by @jserv in #685
Optimize _mm_shuffle_ps_3202 with vtrn by @jserv in #686
Bump clang-format version requirement to 20+ by @jserv in #687
Extend _mm_setcsr/_mm_getcsr to handle FZ and DAZ modes by @jserv in #688
Remove obsolete FIXME in _mm_mulhi_epi16 by @jserv in #689
Optimize _mm_aesdec_si128 ARMv7-A with T-table by @jserv in #690
Consolidate platform detection by @jserv in #691
Reduce redundant vreinterpret calls by @jserv in #692
Add comprehensive precision flag documentation by @jserv in #693
Add IEEE-754 floating-point edge case tests by @jserv in #694
CI: Add UBSan support by @jserv in #695
CI: Add clang-tidy static analysis by @jserv in #697
Add performance tier analysis by @jserv in #698
Enhance tier analysis with Clang AST by @jserv in #699
Improve _mm_dp_ps with vectorized AArch64 path by @jserv in #700
Vectorize _mm_cvttpd_pi32 for AArch64 by @jserv in #701
Implement _SIDD_MASKED_POSITIVE_POLARITY by @jserv in #702
Vectorize equal_any aggregation for SSE4.2 string by @jserv in #703
Rework _mm_movemask_epi8 with vaddv horizontal by @jserv in #704
Add cast validation script with strict aliasing by @jserv in #706
Refactor validation calls to array-based macros by @jserv in #705
Vectorize EQUAL_ORDERED using vextq diagonal by @jserv in #707
Improve RANGES aggregation using vrev for pair-AND by @jserv in #709
Refactor AES S-box lookups by @jserv in #710
Optimize _mm_crc32_u16 on Armv8-A 32-bit platform by @Cuda-Chen in #708
CI: Fix perf-tier comment posting for fork PRs by @jserv in #711
Optimize CRC-32C base macro by @Cuda-Chen in #713
Fix strict aliasing violations in ARMv7 by @jserv in #714
Enforce minimum compiler versions by @jserv in #715
Optimize _mm_crc32_u32 on Armv8-A 32-bit platform by @Cuda-Chen in #716
Remove obsolete optimization warning by @jserv in #717
Fix typo in _mm_movehl_ps architecture check by @jserv in #718
Add compile-time constant validation for immediate by @jserv in #719
Standardize preprocessor macro usage by @jserv in #720
Add compile-time guard for little-endian requirement by @jserv in #722
Fix _mm_round_ps to preserve infinity/NaN on Arm32 by @jserv in #724
CI: Improve workflows for speed and coverage by @jserv in #725
Fix -Wconversion warnings in SSE4.2 by @jserv in #726
Fix parenthesis mismatch in CRC32C crypto fallback by @jserv in #727
Optimize ARMv7 horizontal reduction in PCMPXSTR by @jserv in #728
Add compile-time range validation by @jserv in #729
Improve readability of variable shift clamping by @jserv in #730
Optimize _mm_crc32_u64 on Armv8-A 32-bit platform by @Cuda-Chen in #723
Fix float-to-integer conversion saturation for x86 by @jserv in #731
Add denormal flush-to-zero mode tests by @jserv in #732
Refine Makefile convenience targets by @jserv in #733
Add iOS compatibility wrapper for vcreate_u64 by @jserv in #735
Add differential testing for x86/ARM semantic verification by @jserv in #736
CI: Add uninitialized variable warning checks by @jserv in #737
CI: Skip unnecessary clang installation by @jserv in #738
Add Windows ARM64EC CI coverage by @jserv in #734
CI: Do differential testing with scalar fallback by @jserv in #739
Fix incorrect NaN generation for FP tests by @jserv in #740
Add MXCSR exception flag macros for compatibility by @jserv in #741
Add coverage verification infrastructure by @jserv in #742
Add NIST FIPS 197 AES-256 test vectors by @jserv in #743
Add _mm_monitor and _mm_mwait SSE3 intrinsics by @jserv in #744
Improve non-temporal store/load stream intrinsics by @jserv in #745
Verify IEEE-754 signed zero by @jserv in #746
Implement _mm_undefin...

Contributors

jserv, rouault, and 10 other contributors

Assets 2

25 Dec 16:09

jserv

v1.8.0

3cf6976

v1.8.0

What's Changed

Fix Clang showing incorrect GCC version warning by @brechtvl in #623
Restore options for precision of div/rcp/sqrt/rsqrt by @brechtvl in #626
Optimize CRC intrinisics for targets lacking of CRC extension by @Cuda-Chen in #627
test: Avoid errors when cross compile by gcc-8.3/9.2 by @howjmay in #625
Improve unsupported target message by @ankith26 in #630
Fix with _mm_div_ps when SSE2NEON_PRECISE_DIV=1 by @sergeyvfx in #631
Use unaligned data types for unaligned intrinsics by @Logikable in #632
fix: Fix uninitialized parameters by @howjmay in #636
fix: Disable optimization to avoid pontential errors by @howjmay in #640
Fix minor typos in the sse2neon header by @ankith26 in #641
fix: Fix strict-aliasing errors by @howjmay in #638
fix test_mm_dp_pd test by @alexorlov124 in #643
Add support for clang-cl on Windows by @anthony-linaro in #633
Fix performance regression after OPTNONE changes by @sergeyvfx in #646
Allow optimization and use fesetround(), fegetround() by @howjmay in #642
CI: Bump dependency by @jserv in #650
Allow to specify -DSSE2NEON_SUPPRESS_WARNINGS to avoid the #warning about optimization issues by @rouault in #651
Add _MM_SHUFFLE2() macro for shuffle parameter for _mm_shuffle_pd() by @rouault in #652
README.md: mention GDAL by @rouault in #654
CI: Update Arm GNU Toolchain and use Ubuntu 24.04 by @jserv in #656
Fix undefined mm{malloc,free} with LLVM/MinGW by @jserv in #657

New Contributors

@ankith26 made their first contribution in #630
@sergeyvfx made their first contribution in #631
@Logikable made their first contribution in #632
@alexorlov124 made their first contribution in #643

Full Changelog: v1.7.0...v1.8.0

Contributors

sergeyvfx, brechtvl, and 8 other contributors

Assets 2

25 Dec 20:41

jserv

v1.7.0

1a577cf

v1.7.0

What's Changed

refactor: Add missing ARM64 implementation by @howjmay in #576
test: Build/run with crypto and/or crc by @howjmay in #574
doc: Describe the right coverage of SSE2NEON_PRECISE_MINMAX by @howjmay in #578
refactor: Reimplement _mm_movelh_ps for Arm64 by @howjmay in #579
tests: Cover all immediate numbers by @howjmay in #584
test: Use macro for validate results by @howjmay in #585
Improve precision of mm{rsqrt,sqrt,rcp,div}_{ps,ss} conversions by @Cuda-Chen in #580
Fix MSVC compile issues by @toxieainc in #588
Tweak MSVC ifdef guard for _BitScanForward64 by @aqrit in #592
Add notice that NEON handles certain IEEE single-precision values by @Cuda-Chen in #593
Add infinity test in test_mm_{max,min}_{pd,sd} by @Cuda-Chen in #594
Remove Kahan algorithm in _mm_dp_ps by @Cuda-Chen in #597
MSVC support by @anthony-linaro in #596
test: Cover all the valid imm range in tests by @howjmay in #586
Add test running for MSVC to CI by @anthony-linaro in #598
Align result to SSE when input is 0.0f/-0.0f in mm_rsqrt{ps, ss} by @Cuda-Chen in #599
fix: Fix exceeding width of type warning by @howjmay in #601
docs: Fix the typos by @howjmay in #603
docs: Fix the typos by @spacemiqote in #605
Fix build for gcc-13 and 32 bit arm systems. by @balister in #609
Fix unused parameters warning by @anakinxc in #610
Fixed gcc strict prototype and other build errors by @mnjdhl in #611
Fix _mm_cmplt_sd and _mm_cmpnlt_sd test cases by @Cuda-Chen in #612
disambiguate vector type to avoid errors depending on lax conversion … by @JoachimSchurig in #614
docs: fix typo failback by @howjmay in #616
Introduce fast and deterministic RNG by @Cuda-Chen in #615
fix: Fix typo nand by @howjmay in #617
fix: Fix MSVC warnings by @howjmay in #604
Add A32 support in CI by @Cuda-Chen in #620
Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 by @aqrit in #621

New Contributors

@anthony-linaro made their first contribution in #596
@spacemiqote made their first contribution in #605
@anakinxc made their first contribution in #610
@mnjdhl made their first contribution in #611
@JoachimSchurig made their first contribution in #614

Full Changelog: v1.6.0...v1.7.0

Contributors

balister, aqrit, and 8 other contributors

Assets 2

26 Dec 08:02

jserv

v1.6.0

31cb30b

v1.6.0

What's Changed

100% intrinsics coverage for SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.
Implement _rdtsc by @Cuda-Chen in #532
Improve _mm_srai_epi32 to handle complex arguments by @Developer-Ecosystem-Engineering in #533
Implement _mm_cmpestri and _mm_cmpestrm by @Cuda-Chen in #534
Implement five _mm_cmpestr by @Cuda-Chen in #552
Implement _mm_cmpistri and _mm_cmpistrm by @Cuda-Chen in #553
Implement five _mm_cmpistr by @Cuda-Chen in #555
tests: Fix warnings raised by clang++ by @Cuda-Chen in #540
Exclude _mm_malloc/free definitions on Windows by @invertego in #541
Remove designated initialization of an array by @invertego in #542
Reintroduce ext-based implementations for shift intrinsics by @AymenQ in #543
Improve performance of float-to-integer intrinsics by @AymenQ in #546
Support __builtin_shuffle as an alternative to __builtin_shufflevector by @AymenQ in #545
Improve performance of various intrinsics by @AymenQ in #549
Vectorize _mm_minpos_epu16 by @AymenQ in #551
Align _mm_prefetch behavior to document by @howjmay in #550
Add clang/Windows build by @invertego in #556
Test all valid immediates in _mm_dp_pd by @Cuda-Chen in #557
Optimize _mm_aesenclast_si128 for Arm64 by @howjmay in #561
Implement _mm_aesdec_si128 by @howjmay in #559
Implement _mm_aesdeclast_si128 by @howjmay in #565
Implement _mm_aesimc_si128 by @howjmay in #567
Optimize aeskeygenassist_si128 for Arm64 by @howjmay in #569
Update Intel intrinsics document links by @howjmay in #570