Skip to content

[RISCV] add load/store misched/PostRA subtarget features #149409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -94,14 +94,24 @@ static cl::opt<bool>
cl::desc("Enable the loop data prefetch pass"),
cl::init(true));

static cl::opt<bool> EnableMISchedLoadStoreClustering(
"riscv-misched-load-store-clustering", cl::Hidden,
cl::desc("Enable load and store clustering in the machine scheduler"),
static cl::opt<bool> EnableMISchedLoadClustering(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these come from subtarget instead of command line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I miss something? We don't have subtarget features for these?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my point. Should we add Subtarget features instead? The description of the PR suggests that whether this is profitable is CPU specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make sense to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By Subtarget feature are we talking about a a feature, like FeatureUnalignedScalarMem, or a tune like TunePostRAScheduler (took the examples from sifive_p670)? Seems to me that this would be more like a tuning like PostRAScheduler.

Also, do we want to keep the default as is (enable all load/store clustering in both stages) and add features to disable it, or the other way around? Having everything disabled by default and adding subtarget features to enable what you want is cleaner, but we'll change behavior for everyone that doesn't add the new features in their target defs. Perhaps this is a good thing (we'll force people to make a decision instead of relying on defaults) ....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked how TunePostRAScheduler is implemented and it has both a subtarget feature and also a flag (post-RA-scheduler) that can be used to override the target definition. I like this design because I give the flexibility of the command line (for debugging and whatnot) while also giving each processor a convenient way of setting clustering preferences.

If no one opposes I'll go this direction, but I guess we'll want to turn the default to 'false' and let each processor define what clustering they want

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default settings of load/store clustering breaks 290+ tests in check-llvm. This shows that there's a LOT of code and logic built on top of the existing default. I don't want to touch this viper nest in this work so I'll keep the default as is. This means that the subtarget features will disable the default settings instead of enabling - TuneNoDefaultUnroll would be an example.

Also, after adding both the command line options and subtarget features I realized that I might be complicating things too much, given that we can override the subtarget features via "mattr=-" anyway, so I'll change the code to send just the subtarget features. I'll send a new version shortly.

"riscv-misched-load-clustering", cl::Hidden,
cl::desc("Enable load clustering in the machine scheduler"),
cl::init(true));

static cl::opt<bool> EnablePostMISchedLoadStoreClustering(
"riscv-postmisched-load-store-clustering", cl::Hidden,
cl::desc("Enable PostRA load and store clustering in the machine scheduler"),
static cl::opt<bool> EnableMISchedStoreClustering(
"riscv-misched-store-clustering", cl::Hidden,
cl::desc("Enable store clustering in the machine scheduler"),
cl::init(true));

static cl::opt<bool> EnablePostMISchedLoadClustering(
"riscv-postmisched-load-clustering", cl::Hidden,
cl::desc("Enable PostRA load clustering in the machine scheduler"),
cl::init(true));

static cl::opt<bool> EnablePostMISchedStoreClustering(
"riscv-postmisched-store-clustering", cl::Hidden,
cl::desc("Enable PostRA store clustering in the machine scheduler"),
cl::init(true));

static cl::opt<bool>
Expand Down Expand Up @@ -300,12 +310,14 @@ bool RISCVTargetMachine::isNoopAddrSpaceCast(unsigned SrcAS,
ScheduleDAGInstrs *
RISCVTargetMachine::createMachineScheduler(MachineSchedContext *C) const {
ScheduleDAGMILive *DAG = createSchedLive(C);
if (EnableMISchedLoadStoreClustering) {

if (EnableMISchedLoadClustering)
DAG->addMutation(createLoadClusterDAGMutation(
DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));

if (EnableMISchedStoreClustering)
DAG->addMutation(createStoreClusterDAGMutation(
DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
}

const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
if (!DisableVectorMaskMutation && ST.hasVInstructions())
Expand All @@ -317,12 +329,14 @@ RISCVTargetMachine::createMachineScheduler(MachineSchedContext *C) const {
ScheduleDAGInstrs *
RISCVTargetMachine::createPostMachineScheduler(MachineSchedContext *C) const {
ScheduleDAGMI *DAG = createSchedPostRA(C);
if (EnablePostMISchedLoadStoreClustering) {

if (EnablePostMISchedLoadClustering)
DAG->addMutation(createLoadClusterDAGMutation(
DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));

if (EnablePostMISchedStoreClustering)
DAG->addMutation(createStoreClusterDAGMutation(
DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
}

return DAG;
}
Expand Down
41 changes: 39 additions & 2 deletions llvm/test/CodeGen/RISCV/misched-load-clustering.ll
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add new tests checking the situation where only either of the flags is enabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add two new tests disabling load and store clustering respectively

Original file line number Diff line number Diff line change
@@ -1,16 +1,37 @@
; REQUIRES: asserts
; RUN: llc -mtriple=riscv32 -verify-misched -riscv-misched-load-store-clustering=false \
; RUN: llc -mtriple=riscv32 -verify-misched -riscv-misched-load-clustering=false \
; RUN: -riscv-misched-store-clustering=false \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=NOCLUSTER %s
; RUN: llc -mtriple=riscv64 -verify-misched -riscv-misched-load-store-clustering=false \
; RUN: llc -mtriple=riscv64 -verify-misched -riscv-misched-load-clustering=false \
; RUN: -riscv-misched-store-clustering=false \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=NOCLUSTER %s
;
; RUN: llc -mtriple=riscv32 -verify-misched \
; RUN: -riscv-misched-load-clustering=false \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=STCLUSTER %s
; RUN: llc -mtriple=riscv64 -verify-misched \
; RUN: -riscv-misched-load-clustering=false \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=STCLUSTER %s
;
; RUN: llc -mtriple=riscv32 -verify-misched \
; RUN: -riscv-misched-store-clustering=false \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=LDCLUSTER %s
; RUN: llc -mtriple=riscv64 -verify-misched \
; RUN: -riscv-misched-store-clustering=false \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=LDCLUSTER %s
;
; RUN: llc -mtriple=riscv32 -verify-misched \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=DEFAULTCLUSTER %s
; RUN: llc -mtriple=riscv64 -verify-misched \
; RUN: -debug-only=machine-scheduler -o - 2>&1 < %s \
; RUN: | FileCheck -check-prefix=DEFAULTCLUSTER %s


define i32 @load_clustering_1(ptr nocapture %p) {
Expand All @@ -22,13 +43,29 @@ define i32 @load_clustering_1(ptr nocapture %p) {
; NOCLUSTER: SU(4): %4:gpr = LW %0:gpr, 4
; NOCLUSTER: SU(5): %6:gpr = LW %0:gpr, 16
;
; STCLUSTER: ********** MI Scheduling **********
; STCLUSTER-LABEL: load_clustering_1:%bb.0
; STCLUSTER: *** Final schedule for %bb.0 ***
; STCLUSTER: SU(1): %1:gpr = LW %0:gpr, 12
; STCLUSTER: SU(2): %2:gpr = LW %0:gpr, 8
; STCLUSTER: SU(4): %4:gpr = LW %0:gpr, 4
; STCLUSTER: SU(5): %6:gpr = LW %0:gpr, 16
;
; LDCLUSTER: ********** MI Scheduling **********
; LDCLUSTER-LABEL: load_clustering_1:%bb.0
; LDCLUSTER: *** Final schedule for %bb.0 ***
; LDCLUSTER: SU(4): %4:gpr = LW %0:gpr, 4
; LDCLUSTER: SU(2): %2:gpr = LW %0:gpr, 8
; LDCLUSTER: SU(1): %1:gpr = LW %0:gpr, 12
; LDCLUSTER: SU(5): %6:gpr = LW %0:gpr, 16
;
; DEFAULTCLUSTER: ********** MI Scheduling **********
; DEFAULTCLUSTER-LABEL: load_clustering_1:%bb.0
; DEFAULTCLUSTER: *** Final schedule for %bb.0 ***
; DEFAULTCLUSTER: SU(4): %4:gpr = LW %0:gpr, 4
; DEFAULTCLUSTER: SU(2): %2:gpr = LW %0:gpr, 8
; DEFAULTCLUSTER: SU(1): %1:gpr = LW %0:gpr, 12
; DEFAULTCLUSTER: SU(5): %6:gpr = LW %0:gpr, 16
entry:
%arrayidx0 = getelementptr inbounds i32, ptr %p, i32 3
%val0 = load i32, ptr %arrayidx0
Expand Down
6 changes: 4 additions & 2 deletions llvm/test/CodeGen/RISCV/misched-mem-clustering.mir
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# RUN: llc -mtriple=riscv64 -x mir -mcpu=sifive-p470 -verify-misched -enable-post-misched=false \
# RUN: -riscv-postmisched-load-store-clustering=false -debug-only=machine-scheduler \
# RUN: -riscv-postmisched-load-clustering=false \
# RUN: -riscv-postmisched-store-clustering=false -debug-only=machine-scheduler \
# RUN: -start-before=machine-scheduler -stop-after=postmisched -misched-regpressure=false -o - 2>&1 < %s \
# RUN: | FileCheck -check-prefix=NOPOSTMISCHED %s
# RUN: llc -mtriple=riscv64 -x mir -mcpu=sifive-p470 -mattr=+use-postra-scheduler -verify-misched -enable-post-misched=true \
# RUN: -riscv-postmisched-load-store-clustering=false -debug-only=machine-scheduler \
# RUN: -riscv-postmisched-load-clustering=false \
# RUN: -riscv-postmisched-store-clustering=false -debug-only=machine-scheduler \
# RUN: -start-before=machine-scheduler -stop-after=postmisched -misched-regpressure=false -o - 2>&1 < %s \
# RUN: | FileCheck -check-prefix=NOCLUSTER %s
# RUN: llc -mtriple=riscv64 -x mir -mcpu=sifive-p470 -mattr=+use-postra-scheduler -verify-misched -enable-post-misched=true \
Expand Down
Loading