Skip to content

Conversation

DJMcNab
Copy link
Member

@DJMcNab DJMcNab commented Oct 14, 2025

The core contributions of this PR are:

  • A trait which a (should be zero-sized) struct can implement, which indicates that it is a type-level proof that a set of target features are enabled.
  • The trampoline macro, which validates a #[target_feature(enable = "xxx")] string against values of one or more of these, ensuring at compile time that a call to a #[target_feature] function will be safe; and then calling it.
  • A corresponding struct for each target feature on x86[-64], which are code generated.

The state of this feature is:

  • It is not used for implementing the Fearless SIMD crate.
  • The x86-64-v{1,2,3,4} level implementations do not exist/are extremely incomplete.
  • Some docs are missing (these are however not the most critical docs, it's only docs on the groupings of x86 features).
  • It does not have support for aarch64 in the architecture levels. This is not hard, it's just data wrangling.

There is also an open licensing question, around the docs taken from the Rust reference. My preference would be to copy https://github.com/rust-lang/reference/blob/1d930e1d5a27e114b4d22a50b0b6cd3771b92e31/LICENSE-MIT#L1 into our LICENSE-MIT, which avoids having to make a decision about copyright-ability here.

My proposed next steps are:

  • Discuss this at Renderer Office Hours tomorrow: Done
  • If we decide this is a direction we want to follow, clean up and land this PR.
  • Follow-up with:
    • aarch64 support
    • Automatic selection/an enum of x86-64 levels
    • Using it in the implementation of Fearless SIMD itself

For review:

  • You can mostly ignore the contents of fearless_simd_core/x86/xxx/xxx.rs, as these are entirely automatically generated. The exception is fearless_simd_core/x86/xxx/mod.rs, which are hand-written, but don't have any logic.

Discussed on Zulip: #simd > Removing `safe-wrappers`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (and trampoline.rs) contains the main code needed to understand this PR.

/// See the module level docs [self].
///
/// We require static lifetimes as this is primarily internal to the macro.
pub const fn is_feature_subset<const N: usize>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs the most careful review, because its correctness is being relied upon for safety.

@DJMcNab
Copy link
Member Author

DJMcNab commented Oct 15, 2025

The "glamour shot" of this PR is that given:

#[target_feature(enable = "sse")]
fn sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4] {
let a: __m128 = bytemuck::must_cast(a);
let b: __m128 = bytemuck::must_cast(b);
bytemuck::must_cast(_mm_mul_ps(a, b))
}

You can run:

let Some(sse) = x86::v1::Sse::try_new() else {
panic!("Example code")
};
let a = [10_f32, 20_f32, 30_f32, 40_f32];
let b = [4_f32, 5_f32, 6_f32, 7_f32];
// Both of these example expansions, the former using the shorthand form:
let res =
trampoline!(Sse = sse => "sse", sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4]);
assert_eq!(res, [40_f32, 100_f32, 180_f32, 280_f32]);

To entirely safely and soundly use Rust's SIMD intrinsics.


To help guide review, the core contribution of this PR is a way to talk about target features in the type system. This is implemented through this trait:

/// Token that a set of target feature is available.
///
/// Note that this trait is only meaningful when there are values of this type.
/// That is, to enable the target features in `FEATURES`, you *must* have a value
/// of this type.
///
/// Values which implement this trait are used in the second argument to [`trampoline!`],
/// which is a safe abstraction over enabling target features.
///
/// # Safety
///
/// To construct a value of a type implementing this trait, you must have proven that each
/// target feature in `FEATURES` is available.
pub unsafe trait TargetFeatureToken: Copy {
/// The set of target features which the current CPU has, if
/// you have a value of this type.
const FEATURES: &[&str];
/// Enable the target features in `FEATURES` for a single run of `f`, and run it.
///
/// `f` must be marked `#[inline(always)]` for this to work.
///
/// Note that this does *not* enable the target features on the Rust side (e.g. for calling).
/// To do so, you should instead use [`trampoline!`] directly - this is a convenience wrapper around `trampoline`
/// for cases where the dispatch of simd values is handled elsewhere.
fn vectorize<R>(self, f: impl FnOnce() -> R) -> R;
}

Implementing TargetFeatureToken indicates that a token represents one or more target feature being enabled. This token can be used in the new trampoline! macro, to safely use one or more tokens to run code in a #[target_feature(enable = "..."))] context. This works by validating the user-provided target feature string, which makes sure that the provided tokens justify executing that function. An example of these being used is:

let a = [10_f32, 20_f32, 30_f32, 40_f32];
let b = [4_f32, 5_f32, 6_f32, 7_f32];
// Both of these example expansions, the former using the shorthand form:
let res =
trampoline!(Sse = sse => "sse", sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4]);
assert_eq!(res, [40_f32, 100_f32, 180_f32, 280_f32]);

In this example, the SSE x86 functionality for multiplying is proven to be safe, and then executed and ran.
The contents of fearless_simd_core/lib.rs are the core contribution of this PR, plus the infra code in trampoline.rs which makes it work.

Separately, in this PR, we have the functionality for properly using this on the x86_64 (and also plain x86) architectures. This is the contents of the x86 folder. This involves:

  • A token struct for each target feature which Rust supports, with the trivially correct safety checks for constructing them.
  • A struct for each of x86-64-v{1,2,3,4}, which are the micro-architecture levels of x86. These levels are in v1/level.rs, etc.

Every file in that folder (except for mod.rs files) is automatically generated by the binary crate of the fearless_simd_core/gen package (after rustfmt is ran). As such, there isn't really any significant logic in those files.

@DJMcNab DJMcNab marked this pull request as ready for review October 16, 2025 14:27
@ajakubowicz-canva ajakubowicz-canva self-requested a review October 19, 2025 23:42
@taj-p taj-p self-requested a review October 22, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant