-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Flang][OpenMP] Enable no-loop kernels #155818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
efdd979
564410d
8efb5e0
91e5e58
b34a653
6176c95
d2e88db
b32ac37
e8a23ce
52ebd55
78165d7
31f87cd
a15711d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2106,6 +2106,29 @@ Operation *TargetOp::getInnermostCapturedOmpOp() { | |||||
}); | ||||||
} | ||||||
|
||||||
/// Check if we can promote SPMD kernel to No-Loop kernel | ||||||
|
/// Check if we can promote SPMD kernel to No-Loop kernel | |
/// Check if we can promote SPMD kernel to No-Loop kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// num_teams clause can break no-loop teams/threads assumption | |
// num_teams clause can break no-loop teams/threads assumption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// reduction kernels are slower in no-loop mode | |
// Reduction kernels are slower in no-loop mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// check if the user allows the promotion of kernels to no-loop mode | |
// Check if the user allows the promotion of kernels to no-loop mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2590,13 +2590,27 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder, | |
} | ||
|
||
builder.SetInsertPoint(*regionBlock, (*regionBlock)->begin()); | ||
|
||
bool noLoopMode = false; | ||
omp::TargetOp targetOp = wsloopOp->getParentOfType<mlir::omp::TargetOp>(); | ||
if (targetOp) { | ||
Operation *targetCapturedOp = targetOp.getInnermostCapturedOmpOp(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we check here that the captured op is the !$omp target teams distribute parallel do
do i=1,10
!$omp do
do j=1,i
end do
end do I'm not actually sure about what is the expected behavior of this, but I imagine that no-loop would just refer to the outer loop, as it's the one for which the trip count can be evaluated in the host. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Sergio,
Yes. There is assumption in
There are 2 issues:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the clarifications, Dominik.
What I mean is adding something like this: Operation *targetCapturedOp = targetOp.getInnermostCapturedOmpOp();
if (*loopOp == targetCapturedOp) {
omp::TargetRegionFlags kernelFlags = targetOp.getKernelExecFlags(targetCapturedOp);
...
} We need that because, if not, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in commit: d2e88db |
||
omp::TargetRegionFlags kernelFlags = | ||
targetOp.getKernelExecFlags(targetCapturedOp); | ||
if (omp::bitEnumContainsAll(kernelFlags, | ||
omp::TargetRegionFlags::spmd | | ||
omp::TargetRegionFlags::no_loop) && | ||
!omp::bitEnumContainsAny(kernelFlags, omp::TargetRegionFlags::generic)) | ||
noLoopMode = true; | ||
} | ||
|
||
llvm::OpenMPIRBuilder::InsertPointOrErrorTy wsloopIP = | ||
ompBuilder->applyWorkshareLoop( | ||
ompLoc.DL, loopInfo, allocaIP, loopNeedsBarrier, | ||
convertToScheduleKind(schedule), chunk, isSimd, | ||
scheduleMod == omp::ScheduleModifier::monotonic, | ||
scheduleMod == omp::ScheduleModifier::nonmonotonic, isOrdered, | ||
workshareLoopType); | ||
workshareLoopType, noLoopMode); | ||
|
||
if (failed(handleError(wsloopIP, opInst))) | ||
return failure(); | ||
|
@@ -5365,6 +5379,12 @@ initTargetDefaultAttrs(omp::TargetOp targetOp, Operation *capturedOp, | |
? llvm::omp::OMP_TGT_EXEC_MODE_GENERIC_SPMD | ||
: llvm::omp::OMP_TGT_EXEC_MODE_GENERIC | ||
: llvm::omp::OMP_TGT_EXEC_MODE_SPMD; | ||
if (omp::bitEnumContainsAll(kernelFlags, | ||
omp::TargetRegionFlags::spmd | | ||
omp::TargetRegionFlags::no_loop) && | ||
!omp::bitEnumContainsAny(kernelFlags, omp::TargetRegionFlags::generic)) | ||
attrs.ExecFlags = llvm::omp::OMP_TGT_EXEC_MODE_SPMD_NO_LOOP; | ||
|
||
attrs.MinTeams = minTeamsVal; | ||
attrs.MaxTeams.front() = maxTeamsVal; | ||
attrs.MinThreads = 1; | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,43 @@ | ||||||
! Check if the first OpenMP GPU kernel is promoted to no-loop mode. | ||||||
|
||||||
! The second cannot be promoted due to the limit on the number of teams. | ||||||
! REQUIRES: flang, amdgpu | ||||||
|
! REQUIRES: flang, amdgpu | |
! REQUIRES: flang |
Shouldn't this still be correct with nvidia GPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be correct with Nvida GPU. I fixed the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find these HelpTexts more confusing than helpful. Proposal:
(please adapt as you see fit)
The name "assume_threads/teams_oversubscription" seems disconnected from this function at first, but I think it comes from allowing oversubscription, where the "subscription" is the number of available teams/threads as determined by
thread_limit
,num_teams
,OMP_THREAD_LIMIT
, etc., and threfore allowing the optimzer to oversubscribe this amount. Maybe there is a way to make the description more connected to the flag name.Avoid "Flang" in the HelpText. It might also made available in Clang on day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I changed the description.