Skip to content

Conversation

@valadaptive
Copy link
Contributor

This builds on top of #115. There are no functional changes to the generated code (besides what #115 does), but cleans up the fearless_simd_gen code:

  • The Arch trait has been removed. It operated at the wrong level of abstraction--it makes no sense to call e.g. mk_avx2::make_method with any Arch implementation other than X86.

  • Many code generation functions in the AVX2 and SSE4.2 modules used to pass in the vector type along with its scalar and total bit widths. The former provides the latter, so we can stop passing all three in and just pass in the vector type.

@valadaptive
Copy link
Contributor Author

I've rebased this now that #115 has landed. I've made a couple more changes: in addition to removing the Arch trait, I've now removed the marker structs that used to implement it. I've also made a few more functions pub(crate), fixing most (but not all) unreachable_pub lints.

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I've only done a cursory review, but the changes can all be reasoned about locally, and nothing jumped out as a big change.
This PR is much easier to review with whitespace hidden.

@valadaptive valadaptive added this pull request to the merge queue Nov 14, 2025
Merged via the queue into linebender:main with commit 9039b44 Nov 14, 2025
18 checks passed
@valadaptive valadaptive deleted the x86-cleanups branch November 14, 2025 21:57
github-merge-queue bot pushed a commit that referenced this pull request Nov 16, 2025
Stacked on top of #116, because it touches some of the codegen stuff I
cleaned up in that PR. it's unfortunate that GitHub doesn't have stacked
PRs.

We have the `Bytes` trait, which lets us cast SIMD types to and from raw
bytes (currently using `mem::transmute`). We can use its `bitcast`
method instead of pulling in bytemuck for the "reinterpret" operations
on `Fallback`.

On the x86 side, we can use the `_mm_cast[...]` intrinsics. All the x86
integer types are `__m128i` or `__m256i`, so conversions between integer
widths are no-ops.

While working on this, I noticed that there are "reinterpret signed as
unsigned" ops, but no corresponding "reinterpret unsigned as signed"
ops. Are the reinterpret ops worth it at this point if we have the
`Bytes` trait?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants