Releases: DLTcollab/sse2neon
Releases · DLTcollab/sse2neon
v1.9.1
What's Changed
- CI: Fix clang-18 installation on ARM64 runner by @jserv in #754
- Remove
defined(_BIG_ENDIAN)from endianness check by @clausecker in #753 - Hoist AES constants to file scope by @jserv in #755
- CI: Add Android NDK build verification by @jserv in #756
- Add libFuzzer-based fuzz testing infrastructure by @jserv in #757
- Eliminate old-style cast warnings for C++ compilation by @jserv in #758
Full Changelog: v1.9.0...v1.9.1
v1.9.0
Caveats
New Features
- Windows Arm64EC support
- _mm_monitor, _mm_mwait SSE3 intrinsics
Optimizations
_mm_aesdec_si128T-table optimization_mm_crc32_u16/u32/u64on Armv8-A 32-bit- Vectorized equal_any, EQUAL_ORDERED, RANGES aggregation for SSE4.2 String operations
_mm_movemask_epi8rework with vaddv_mm_dp_ps,_mm_cvttpd_pi32vectorization
Bug Fixes
_mm_round_ps/pdpreserve infinity/NaN on Arm32- Rounding mode handling corrections
_mm_cmpestraoff-by-one boundary_mm_aesdeclast_si128bus error on ARMv7_mm_{malloc,free}mismatch with LLVM/MinGW- Strict aliasing violations in ARMv7
What's Changed
- Fix mismatched mm{malloc,free} with LLVM/MinGW by @invertego in #659
- Enable
-Wconversionby @marcin-serwin in #661 - Fix missing parenthesis by @coco875 in #662
- Add parentheses for macro parameters by @toxieainc in #663
- Fix GitHub Action status badge by @Mes0903 in #664
- Support Windows Arm64EC by @mcfi in #665
- Fix Arm64EC build related to _mm_prefetch by @mcfi in #666
- Update README.md with Windows Arm64EC instructions by @mcfi in #667
- Run more tests when building with modern GCC by @rathann in #672
- Rename _() macro to _SSE2NEON() to avoid conflict with gettext by @rouault in #675
- Improve readability of
_mm_movemask_epi8by @ThisAccountHasBeenSuspended in #674 - Improve readability of
_mm_movemask_pi8&_mm_movemask_psby @ThisAccountHasBeenSuspended in #677 - CI: Update dependencies by @jserv in #680
- Fix bug with _mm_round and _MM_FROUND_NO_EXC by @mg-mburetorp in #679
- Fix problem with _MM_GET_ROUNDING_MODE on certain platforms by @mg-mburetorp in #678
- Migrate the code to use C++ style cast operators by @marcin-serwin in #671
- Fix variable shadowing by @jserv in #681
- Optimize aesdeclast on Armv7-A by @Cuda-Chen in #682
- Fix Bus error in _mm_aesdeclast_si128 on ARMv7 by @jserv in #683
- Fix variable shadowing in _mm_aeskeygenassist_si128 by @jserv in #684
- Fix missing target pragma for ARMv8-A 32-bit by @jserv in #685
- Optimize _mm_shuffle_ps_3202 with vtrn by @jserv in #686
- Bump clang-format version requirement to 20+ by @jserv in #687
- Extend _mm_setcsr/_mm_getcsr to handle FZ and DAZ modes by @jserv in #688
- Remove obsolete FIXME in _mm_mulhi_epi16 by @jserv in #689
- Optimize _mm_aesdec_si128 ARMv7-A with T-table by @jserv in #690
- Consolidate platform detection by @jserv in #691
- Reduce redundant vreinterpret calls by @jserv in #692
- Add comprehensive precision flag documentation by @jserv in #693
- Add IEEE-754 floating-point edge case tests by @jserv in #694
- CI: Add UBSan support by @jserv in #695
- CI: Add clang-tidy static analysis by @jserv in #697
- Add performance tier analysis by @jserv in #698
- Enhance tier analysis with Clang AST by @jserv in #699
- Improve _mm_dp_ps with vectorized AArch64 path by @jserv in #700
- Vectorize _mm_cvttpd_pi32 for AArch64 by @jserv in #701
- Implement _SIDD_MASKED_POSITIVE_POLARITY by @jserv in #702
- Vectorize equal_any aggregation for SSE4.2 string by @jserv in #703
- Rework _mm_movemask_epi8 with vaddv horizontal by @jserv in #704
- Add cast validation script with strict aliasing by @jserv in #706
- Refactor validation calls to array-based macros by @jserv in #705
- Vectorize EQUAL_ORDERED using vextq diagonal by @jserv in #707
- Improve RANGES aggregation using vrev for pair-AND by @jserv in #709
- Refactor AES S-box lookups by @jserv in #710
- Optimize _mm_crc32_u16 on Armv8-A 32-bit platform by @Cuda-Chen in #708
- CI: Fix perf-tier comment posting for fork PRs by @jserv in #711
- Optimize CRC-32C base macro by @Cuda-Chen in #713
- Fix strict aliasing violations in ARMv7 by @jserv in #714
- Enforce minimum compiler versions by @jserv in #715
- Optimize _mm_crc32_u32 on Armv8-A 32-bit platform by @Cuda-Chen in #716
- Remove obsolete optimization warning by @jserv in #717
- Fix typo in _mm_movehl_ps architecture check by @jserv in #718
- Add compile-time constant validation for immediate by @jserv in #719
- Standardize preprocessor macro usage by @jserv in #720
- Add compile-time guard for little-endian requirement by @jserv in #722
- Fix _mm_round_ps to preserve infinity/NaN on Arm32 by @jserv in #724
- CI: Improve workflows for speed and coverage by @jserv in #725
- Fix -Wconversion warnings in SSE4.2 by @jserv in #726
- Fix parenthesis mismatch in CRC32C crypto fallback by @jserv in #727
- Optimize ARMv7 horizontal reduction in PCMPXSTR by @jserv in #728
- Add compile-time range validation by @jserv in #729
- Improve readability of variable shift clamping by @jserv in #730
- Optimize _mm_crc32_u64 on Armv8-A 32-bit platform by @Cuda-Chen in #723
- Fix float-to-integer conversion saturation for x86 by @jserv in #731
- Add denormal flush-to-zero mode tests by @jserv in #732
- Refine Makefile convenience targets by @jserv in #733
- Add iOS compatibility wrapper for vcreate_u64 by @jserv in #735
- Add differential testing for x86/ARM semantic verification by @jserv in #736
- CI: Add uninitialized variable warning checks by @jserv in #737
- CI: Skip unnecessary clang installation by @jserv in #738
- Add Windows ARM64EC CI coverage by @jserv in #734
- CI: Do differential testing with scalar fallback by @jserv in #739
- Fix incorrect NaN generation for FP tests by @jserv in #740
- Add MXCSR exception flag macros for compatibility by @jserv in #741
- Add coverage verification infrastructure by @jserv in #742
- Add NIST FIPS 197 AES-256 test vectors by @jserv in #743
- Add _mm_monitor and _mm_mwait SSE3 intrinsics by @jserv in #744
- Improve non-temporal store/load stream intrinsics by @jserv in #745
- Verify IEEE-754 signed zero by @jserv in #746
- Implement _mm_undefin...
v1.8.0
What's Changed
- Fix Clang showing incorrect GCC version warning by @brechtvl in #623
- Restore options for precision of div/rcp/sqrt/rsqrt by @brechtvl in #626
- Optimize CRC intrinisics for targets lacking of CRC extension by @Cuda-Chen in #627
- test: Avoid errors when cross compile by gcc-8.3/9.2 by @howjmay in #625
- Improve unsupported target message by @ankith26 in #630
- Fix with _mm_div_ps when SSE2NEON_PRECISE_DIV=1 by @sergeyvfx in #631
- Use unaligned data types for unaligned intrinsics by @Logikable in #632
- fix: Fix uninitialized parameters by @howjmay in #636
- fix: Disable optimization to avoid pontential errors by @howjmay in #640
- Fix minor typos in the sse2neon header by @ankith26 in #641
- fix: Fix strict-aliasing errors by @howjmay in #638
- fix test_mm_dp_pd test by @alexorlov124 in #643
- Add support for clang-cl on Windows by @anthony-linaro in #633
- Fix performance regression after OPTNONE changes by @sergeyvfx in #646
- Allow optimization and use fesetround(), fegetround() by @howjmay in #642
- CI: Bump dependency by @jserv in #650
- Allow to specify -DSSE2NEON_SUPPRESS_WARNINGS to avoid the #warning about optimization issues by @rouault in #651
- Add _MM_SHUFFLE2() macro for shuffle parameter for _mm_shuffle_pd() by @rouault in #652
- README.md: mention GDAL by @rouault in #654
- CI: Update Arm GNU Toolchain and use Ubuntu 24.04 by @jserv in #656
- Fix undefined mm{malloc,free} with LLVM/MinGW by @jserv in #657
New Contributors
- @ankith26 made their first contribution in #630
- @sergeyvfx made their first contribution in #631
- @Logikable made their first contribution in #632
- @alexorlov124 made their first contribution in #643
Full Changelog: v1.7.0...v1.8.0
v1.7.0
What's Changed
- refactor: Add missing ARM64 implementation by @howjmay in #576
- test: Build/run with crypto and/or crc by @howjmay in #574
- doc: Describe the right coverage of SSE2NEON_PRECISE_MINMAX by @howjmay in #578
- refactor: Reimplement _mm_movelh_ps for Arm64 by @howjmay in #579
- tests: Cover all immediate numbers by @howjmay in #584
- test: Use macro for validate results by @howjmay in #585
- Improve precision of mm{rsqrt,sqrt,rcp,div}_{ps,ss} conversions by @Cuda-Chen in #580
- Fix MSVC compile issues by @toxieainc in #588
- Tweak MSVC ifdef guard for _BitScanForward64 by @aqrit in #592
- Add notice that NEON handles certain IEEE single-precision values by @Cuda-Chen in #593
- Add infinity test in
test_mm_{max,min}_{pd,sd}by @Cuda-Chen in #594 - Remove Kahan algorithm in
_mm_dp_psby @Cuda-Chen in #597 - MSVC support by @anthony-linaro in #596
- test: Cover all the valid imm range in tests by @howjmay in #586
- Add test running for MSVC to CI by @anthony-linaro in #598
- Align result to SSE when input is 0.0f/-0.0f in mm_rsqrt{ps, ss} by @Cuda-Chen in #599
- fix: Fix exceeding width of type warning by @howjmay in #601
- docs: Fix the typos by @howjmay in #603
- docs: Fix the typos by @spacemiqote in #605
- Fix build for gcc-13 and 32 bit arm systems. by @balister in #609
- Fix unused parameters warning by @anakinxc in #610
- Fixed gcc strict prototype and other build errors by @mnjdhl in #611
- Fix
_mm_cmplt_sdand_mm_cmpnlt_sdtest cases by @Cuda-Chen in #612 - disambiguate vector type to avoid errors depending on lax conversion … by @JoachimSchurig in #614
- docs: fix typo failback by @howjmay in #616
- Introduce fast and deterministic RNG by @Cuda-Chen in #615
- fix: Fix typo nand by @howjmay in #617
- fix: Fix MSVC warnings by @howjmay in #604
- Add A32 support in CI by @Cuda-Chen in #620
- Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 by @aqrit in #621
New Contributors
- @anthony-linaro made their first contribution in #596
- @spacemiqote made their first contribution in #605
- @anakinxc made their first contribution in #610
- @mnjdhl made their first contribution in #611
- @JoachimSchurig made their first contribution in #614
Full Changelog: v1.6.0...v1.7.0
v1.6.0
What's Changed
- 100% intrinsics coverage for SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.
- Implement
_rdtscby @Cuda-Chen in #532 - Improve
_mm_srai_epi32to handle complex arguments by @Developer-Ecosystem-Engineering in #533 - Implement
_mm_cmpestriand_mm_cmpestrmby @Cuda-Chen in #534 - Implement five
_mm_cmpestrby @Cuda-Chen in #552 - Implement
_mm_cmpistriand_mm_cmpistrmby @Cuda-Chen in #553 - Implement five
_mm_cmpistrby @Cuda-Chen in #555 - tests: Fix warnings raised by clang++ by @Cuda-Chen in #540
- Exclude
_mm_malloc/freedefinitions on Windows by @invertego in #541 - Remove designated initialization of an array by @invertego in #542
- Reintroduce
ext-based implementations for shift intrinsics by @AymenQ in #543 - Improve performance of float-to-integer intrinsics by @AymenQ in #546
- Support
__builtin_shuffleas an alternative to__builtin_shufflevectorby @AymenQ in #545 - Improve performance of various intrinsics by @AymenQ in #549
- Vectorize
_mm_minpos_epu16by @AymenQ in #551 - Align
_mm_prefetchbehavior to document by @howjmay in #550 - Add clang/Windows build by @invertego in #556
- Test all valid immediates in
_mm_dp_pdby @Cuda-Chen in #557 - Optimize
_mm_aesenclast_si128for Arm64 by @howjmay in #561 - Implement
_mm_aesdec_si128by @howjmay in #559 - Implement
_mm_aesdeclast_si128by @howjmay in #565 - Implement
_mm_aesimc_si128by @howjmay in #567 - Optimize
aeskeygenassist_si128for Arm64 by @howjmay in #569 - Update Intel intrinsics document links by @howjmay in #570
New Contributors
- @Cuda-Chen made their first contribution in #532
- @Developer-Ecosystem-Engineering made their first contribution in #533
- @balister made their first contribution in #535
- @invertego made their first contribution in #541
- @AymenQ made their first contribution in #543
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's Changed
- fix: Fix dividing zero error in validateFloatError by @howjmay in #515
- Fix compilation with standardized C compilers by @jserv in #516
- Fix _mm_storel_epi64 by @andrewevstyukhin in #517
- Add support for 32-bit targets on ARMv8 architectures by @jonathanhue in #520
- Use CRC and directed rounding intrinsics on A32 by @jonathanhue in #522
- fix: Fix alignment in tests by @howjmay in #523
New Contributors
- @sleepybishop made their first contribution in #508
- @luzpaz made their first contribution in #509
- @andrewevstyukhin made their first contribution in #517
- @jonathanhue made their first contribution in #520
Full Changelog: v1.5.0...v1.5.1