[BACKEND] Improved swizzling when there is not enough vectorisation #8240

lezcano · 2025-09-19T20:29:34Z

In view of the corrected computation for bank conflicts discussed in
#8200, we use the point that
anything that goes into vbasis in the same bank does not create
conflicts to expose asymmetric vectorisation whenever it would not
create more bank conflicts.

The new heuristic avoids PRMTs whenever possible on one of the
directions by choosing registers within bank 0 that are already
contiguous in the register file.

I still need to benchmark and write comprehensive tests. Will do that on
Monday.

In view of the corrected computation for bank conflicts discussed in #8200, we use the point that anything that goes into vbasis in the same bank does not create conflicts to expose asymmetric vectorisation whenever it would not create more bank conflicts. The new heuristic avoids PRMTs whenever possible on one of the directions by choosing registers within bank 0 that are already contiguous in the register file. I still need to benchmark and write comprehensive tests. Will do that on Monday.

Jokeren · 2025-09-19T20:34:18Z

The new heuristic avoids PRMTs whenever possible on one of the directions by choosing registers within bank 0

Can you add a mlir test?

chatgpt-codex-connector

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

chatgpt-codex-connector · 2025-09-19T20:37:13Z

lib/Tools/GenericSwizzling.cpp

+  // Append the extra vectorisation bases
+  vbasis.append(vecSrc ? intersectAfterBank0(regSrc, vbasis, laneDstSet)
+                       : intersectAfterBank0(regDst, vbasis, laneSrcSet));


[P1] Prevent vector basis from exceeding 128‑bit limit

The new getVbasis caps the initial intersection to log2(128/bitwidth) but then unconditionally appends more entries from intersectAfterBank0. When the intersection already fills the 128‑bit budget (e.g. three bases for fp16 or two for fp32), this append can grow vbasis beyond maxVecBases (five bases for fp16, four for fp32), which corresponds to 256–512 bit vector widths. Downstream lowering assumes loads/stores are at most 128 bits and uses vbasis.size() to pick instruction widths; returning a longer basis will make the swizzling code attempt to emit vector instructions that do not exist. The previous implementation always truncated vbasis after filling. Consider re-clamping vbasis after the append or skipping the append once vbasis.size() has reached the maximum.

Useful? React with 👍 / 👎.

lezcano · 2025-09-19T20:59:52Z

an MLIR test would be difficult, as the PRMTs are created at a PTX level. I plan to add plenty of tests from layouts that exercise this path and make sure that the PRMTs as computed from the shared memory layout decrease from the previous state of things to this one.

Jokeren · 2025-09-19T21:02:30Z

Maybe we can use gluon and check the SASS codegen

lezcano · 2025-09-19T21:47:31Z

I could 100% write those test in gluon, good point.

chatgpt-codex-connector bot reviewed Sep 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BACKEND] Improved swizzling when there is not enough vectorisation #8240

[BACKEND] Improved swizzling when there is not enough vectorisation #8240

lezcano commented Sep 19, 2025

Uh oh!

Jokeren commented Sep 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Sep 19, 2025

Uh oh!

lezcano commented Sep 19, 2025

Uh oh!

Jokeren commented Sep 19, 2025

Uh oh!

lezcano commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BACKEND] Improved swizzling when there is not enough vectorisation #8240

Are you sure you want to change the base?

[BACKEND] Improved swizzling when there is not enough vectorisation #8240

Conversation

lezcano commented Sep 19, 2025

Uh oh!

Jokeren commented Sep 19, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

lezcano commented Sep 19, 2025

Uh oh!

Jokeren commented Sep 19, 2025

Uh oh!

lezcano commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants