-
Notifications
You must be signed in to change notification settings - Fork 14.7k
[MLIR][OpenMP] Add MLIR Lowering Support for dist_schedule #152736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-openmp Author: Jack Styles (Stylie777) Changes
There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop. Patch is 36.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152736.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 79f25bb05f20e..4117e112367c6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {}
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {}
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {}
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {}
-def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; }
+def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {}
+def OMP_SCHEDULE_Default : EnumVal<"default", 8, 0> { let isDefault = 1; }
def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";
@@ -469,6 +470,7 @@ def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
OMP_SCHEDULE_Guided,
OMP_SCHEDULE_Auto,
OMP_SCHEDULE_Runtime,
+ OMP_SCHEDULE_Distribute,
OMP_SCHEDULE_Default
];
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index f70659120e1e6..395df392babde 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -1096,11 +1096,13 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.
/// \param LoopType Type of workshare loop.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- omp::WorksharingLoopType LoopType, bool NeedsBarrier);
+ omp::WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule = false, omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop a statically-scheduled workshare loop with a
/// user-specified chunk size.
@@ -1113,13 +1115,20 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after the
/// loop.
/// \param ChunkSize The user-specified chunk size.
+ /// \param SchedType Optional type of scheduling to be passed to the init function.
+ /// \param DistScheduleChunkSize The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticChunkedWorkshareLoop(DebugLoc DL,
CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
bool NeedsBarrier,
- Value *ChunkSize);
+ Value *ChunkSize,
+ omp::OMPScheduleType SchedType = omp::OMPScheduleType::UnorderedStaticChunked,
+ Value *DistScheduleChunkSize = nullptr,
+ omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
///
@@ -1139,6 +1148,8 @@ class OpenMPIRBuilder {
/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleChunk The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyDynamicWorkshareLoop(DebugLoc DL,
@@ -1146,7 +1157,8 @@ class OpenMPIRBuilder {
InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,
bool NeedsBarrier,
- Value *Chunk = nullptr);
+ Value *Chunk = nullptr,
+ Value *DistScheduleChunk = nullptr);
/// Create alternative version of the loop to support if clause
///
@@ -1197,6 +1209,9 @@ class OpenMPIRBuilder {
/// present.
/// \param LoopType Information about type of loop worksharing.
/// It corresponds to type of loop workshare OpenMP pragma.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ ///
+ /// \param ChunkSize The chunk size for dist_schedule loop
///
/// \returns Point where to insert code after the workshare construct.
LLVM_ABI InsertPointOrErrorTy applyWorkshareLoop(
@@ -1207,7 +1222,9 @@ class OpenMPIRBuilder {
bool HasMonotonicModifier = false, bool HasNonmonotonicModifier = false,
bool HasOrderedClause = false,
omp::WorksharingLoopType LoopType =
- omp::WorksharingLoopType::ForStaticLoop);
+ omp::WorksharingLoopType::ForStaticLoop,
+ bool HasDistSchedule = false,
+ Value* DistScheduleChunkSize = nullptr);
/// Tile a loop nest.
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index ea027e48fa2f1..18da0d772912f 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -136,6 +136,8 @@ static bool isValidWorkshareLoopScheduleType(OMPScheduleType SchedType) {
case OMPScheduleType::NomergeOrderedRuntime:
case OMPScheduleType::NomergeOrderedAuto:
case OMPScheduleType::NomergeOrderedTrapezoidal:
+ case OMPScheduleType::OrderedDistributeChunked:
+ case OMPScheduleType::OrderedDistribute:
break;
default:
return false;
@@ -170,7 +172,7 @@ static const omp::GV &getGridValue(const Triple &T, Function *Kernel) {
/// arguments.
static OMPScheduleType
getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
- bool HasSimdModifier) {
+ bool HasSimdModifier, bool HasDistScheduleChunks) {
// Currently, the default schedule it static.
switch (ClauseKind) {
case OMP_SCHEDULE_Default:
@@ -187,6 +189,9 @@ getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
case OMP_SCHEDULE_Runtime:
return HasSimdModifier ? OMPScheduleType::BaseRuntimeSimd
: OMPScheduleType::BaseRuntime;
+ case OMP_SCHEDULE_Distribute:
+ return HasDistScheduleChunks ? OMPScheduleType::BaseDistributeChunked
+ : OMPScheduleType::BaseDistribute;
}
llvm_unreachable("unhandled schedule clause argument");
}
@@ -255,9 +260,10 @@ getOpenMPMonotonicityScheduleType(OMPScheduleType ScheduleType,
static OMPScheduleType
computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
bool HasSimdModifier, bool HasMonotonicModifier,
- bool HasNonmonotonicModifier, bool HasOrderedClause) {
- OMPScheduleType BaseSchedule =
- getOpenMPBaseScheduleType(ClauseKind, HasChunks, HasSimdModifier);
+ bool HasNonmonotonicModifier, bool HasOrderedClause,
+ bool HasDistScheduleChunks) {
+ OMPScheduleType BaseSchedule = getOpenMPBaseScheduleType(
+ ClauseKind, HasChunks, HasSimdModifier, HasDistScheduleChunks);
OMPScheduleType OrderedSchedule =
getOpenMPOrderingScheduleType(BaseSchedule, HasOrderedClause);
OMPScheduleType Result = getOpenMPMonotonicityScheduleType(
@@ -4637,7 +4643,8 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M,
OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- WorksharingLoopType LoopType, bool NeedsBarrier) {
+ WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule,
+ OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) &&
"Require dedicated allocate IP");
@@ -4693,15 +4700,26 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
// Call the "init" function and update the trip count of the loop with the
// value it produced.
- SmallVector<Value *, 10> Args(
- {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound});
- if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
- Value *PDistUpperBound =
- Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
- Args.push_back(PDistUpperBound);
+ auto BuildInitCall = [LoopType, SrcLoc, ThreadNum, PLastIter, PLowerBound,
+ PUpperBound, IVTy, PStride, One, Zero,
+ StaticInit](Value *SchedulingType, auto &Builder) {
+ SmallVector<Value *, 10> Args({SrcLoc, ThreadNum, SchedulingType, PLastIter,
+ PLowerBound, PUpperBound});
+ if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
+ Value *PDistUpperBound =
+ Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
+ Args.push_back(PDistUpperBound);
+ }
+ Args.append({PStride, One, Zero});
+ Builder.CreateCall(StaticInit, Args);
+ };
+ BuildInitCall(SchedulingType, Builder);
+ if (HasDistSchedule &&
+ LoopType != WorksharingLoopType::DistributeStaticLoop) {
+ Constant *DistScheduleSchedType = ConstantInt::get(
+ I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute));
+ BuildInitCall(DistScheduleSchedType, Builder);
}
- Args.append({PStride, One, Zero});
- Builder.CreateCall(StaticInit, Args);
Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);
Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);
Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound);
@@ -4740,14 +4758,42 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
return AfterIP;
}
+static void addAccessGroupMetadata(BasicBlock *Block, MDNode *AccessGroup,
+ LoopInfo &LI);
+static void addLoopMetadata(CanonicalLoopInfo *Loop,
+ArrayRef<Metadata *> Properties);
+
+static void applyParallelAccessesMetadata(CanonicalLoopInfo *CLI, LLVMContext &Ctx, Loop *Loop, LoopInfo &LoopInfo, SmallVector<Metadata *> &LoopMDList) {
+ SmallSet<BasicBlock *, 8> Reachable;
+
+ // Get the basic blocks from the loop in which memref instructions
+ // can be found.
+ // TODO: Generalize getting all blocks inside a CanonicalizeLoopInfo,
+ // preferably without running any passes.
+ for (BasicBlock *Block : Loop->getBlocks()) {
+ if (Block == CLI->getCond() ||
+ Block == CLI->getHeader())
+ continue;
+ Reachable.insert(Block);
+ }
+
+ // Add access group metadata to memory-access instructions.
+ MDNode *AccessGroup = MDNode::getDistinct(Ctx, {});
+ for (BasicBlock *BB : Reachable)
+ addAccessGroupMetadata(BB, AccessGroup, LoopInfo);
+ // TODO: If the loop has existing parallel access metadata, have
+ // to combine two lists.
+ LoopMDList.push_back(MDNode::get(
+ Ctx, {MDString::get(Ctx, "llvm.loop.parallel_accesses"), AccessGroup}));
+}
+
OpenMPIRBuilder::InsertPointOrErrorTy
-OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
- CanonicalLoopInfo *CLI,
- InsertPointTy AllocaIP,
- bool NeedsBarrier,
- Value *ChunkSize) {
+OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(
+ DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
+ bool NeedsBarrier, Value *ChunkSize, OMPScheduleType SchedType,
+ Value *DistScheduleChunkSize, OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
- assert(ChunkSize && "Chunk size is required");
+ assert(ChunkSize || DistScheduleChunkSize && "Chunk size is required");
LLVMContext &Ctx = CLI->getFunction()->getContext();
Value *IV = CLI->getIndVar();
@@ -4761,6 +4807,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *Zero = ConstantInt::get(InternalIVTy, 0);
Constant *One = ConstantInt::get(InternalIVTy, 1);
+ Function *F = CLI->getFunction();
+ FunctionAnalysisManager FAM;
+ FAM.registerPass([]() { return DominatorTreeAnalysis(); });
+ FAM.registerPass([]() { return PassInstrumentationAnalysis(); });
+ LoopAnalysis LIA;
+ LoopInfo &&LI = LIA.run(*F, FAM);
+ Loop *L = LI.getLoopFor(CLI->getHeader());
+ SmallVector<Metadata *> LoopMDList;
+ if (ChunkSize || DistScheduleChunkSize)
+ applyParallelAccessesMetadata(CLI, Ctx, L, LI, LoopMDList);
+ addLoopMetadata(CLI, LoopMDList);
+
// Declare useful OpenMP runtime functions.
FunctionCallee StaticInit =
getKmpcForStaticInitForType(InternalIVTy, M, *this);
@@ -4783,13 +4841,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Builder.SetCurrentDebugLocation(DL);
// TODO: Detect overflow in ubsan or max-out with current tripcount.
- Value *CastedChunkSize =
- Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize");
+ Value *CastedChunkSize = Builder.CreateZExtOrTrunc(
+ ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize");
+ Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc(
+ DistScheduleChunkSize ? DistScheduleChunkSize : Zero, InternalIVTy,
+ "distschedulechunksize");
Value *CastedTripCount =
Builder.CreateZExt(OrigTripCount, InternalIVTy, "tripcount");
- Constant *SchedulingType = ConstantInt::get(
- I32Type, static_cast<int>(OMPScheduleType::UnorderedStaticChunked));
+ Constant *SchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(SchedType));
+ Constant *DistSchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(DistScheduleSchedType));
Builder.CreateStore(Zero, PLowerBound);
Value *OrigUpperBound = Builder.CreateSub(CastedTripCount, One);
Builder.CreateStore(OrigUpperBound, PUpperBound);
@@ -4801,12 +4864,25 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *SrcLocStr = getOrCreateSrcLocStr(DL, SrcLocStrSize);
Value *SrcLoc = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
Value *ThreadNum = getOrCreateThreadID(SrcLoc);
- Builder.CreateCall(StaticInit,
- {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
- /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
- /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
- /*pstride=*/PStride, /*incr=*/One,
- /*chunk=*/CastedChunkSize});
+ auto BuildInitCall =
+ [StaticInit, SrcLoc, ThreadNum, PLastIter, PLowerBound, PUpperBound,
+ PStride, One](Value *SchedulingType, Value *ChunkSize, auto &Builder) {
+ Builder.CreateCall(
+ StaticInit, {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
+ /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
+ /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
+ /*pstride=*/PStride, /*incr=*/One,
+ /*chunk=*/ChunkSize});
+ };
+ BuildInitCall(SchedulingType, CastedChunkSize, Builder);
+ if (DistScheduleSchedType != OMPScheduleType::None &&
+ SchedType != OMPScheduleType::OrderedDistributeChunked &&
+ SchedType != OMPScheduleType::OrderedDistribute) {
+ // We want to emit a second init function call for the dist_schedule clause
+ // to the Distribute construct. This should only be done however if a
+ // Workshare Loop is nested within a Distribute Construct
+ BuildInitCall(DistSchedulingType, CastestDistScheduleChunkSize, Builder);
+ }
// Load values written by the "init" function.
Value *FirstChunkStart =
@@ -5130,31 +5206,47 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyWorkshareLoop(
bool NeedsBarrier, omp::ScheduleKind SchedKind, Value *ChunkSize,
bool HasSimdModifier, bool HasMonotonicModifier,
bool HasNonmonotonicModifier, bool HasOrderedClause,
- WorksharingLoopType LoopType) {
+ WorksharingLoopType LoopType, bool HasDistSchedule,
+ Value *DistScheduleChunkSize) {
if (Config.isTargetDevice())
return applyWorkshareLoopTarget(DL, CLI, AllocaIP, LoopType);
OMPScheduleType EffectiveScheduleType = computeOpenMPScheduleType(
SchedKind, ChunkSize, HasSimdModifier, HasMonotonicModifier,
- HasNonmonotonicModifier, HasOrderedClause);
+ HasNonmonotonicModifier, HasOrderedClause, DistScheduleChunkSize);
bool IsOrdered = (EffectiveScheduleType & OMPScheduleType::ModifierOrdered) ==
OMPScheduleType::ModifierOrdered;
+ OMPScheduleType DistScheduleSchedType = OMPScheduleType::None;
+ if (HasDistSchedule) {
+ DistScheduleSchedType = DistScheduleChunkSize
+ ? OMPScheduleType::OrderedDistributeChunked
+ : OMPScheduleType::OrderedDistribute;
+ }
switch (EffectiveScheduleType & ~OMPScheduleType::ModifierMask) {
case OMPScheduleType::BaseStatic:
- assert(!ChunkSize && "No chunk size with static-chunked schedule");
- if (IsOrdered)
+ case OMPScheduleType::BaseDistribute:
+ assert(!ChunkSize || !DistScheduleChunkSize &&
+ "No chunk size with static-chunked schedule");
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier);
+ if (DistScheduleChunkSize)
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
+ return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier,
+ HasDistSchedule);
case OMPScheduleType::BaseStaticChunked:
- if (IsOrdered)
+ case OMPScheduleType::BaseDistributeChunked:
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticChunkedWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier,
- ChunkSize);
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
case OMPScheduleType::BaseRuntime:
case OMPScheduleType::BaseAuto:
@@ -5230,7 +5322,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
OMPScheduleType SchedType,
- bool NeedsBarrier, Value *Chunk) {
+ bool NeedsBarrier, Value *Chunk,
+ Value *DistScheduleChunk) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!...
[truncated]
|
@llvm/pr-subscribers-mlir-llvm Author: Jack Styles (Stylie777) Changes
There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop. Patch is 36.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152736.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 79f25bb05f20e..4117e112367c6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {}
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {}
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {}
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {}
-def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; }
+def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {}
+def OMP_SCHEDULE_Default : EnumVal<"default", 8, 0> { let isDefault = 1; }
def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";
@@ -469,6 +470,7 @@ def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
OMP_SCHEDULE_Guided,
OMP_SCHEDULE_Auto,
OMP_SCHEDULE_Runtime,
+ OMP_SCHEDULE_Distribute,
OMP_SCHEDULE_Default
];
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index f70659120e1e6..395df392babde 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -1096,11 +1096,13 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.
/// \param LoopType Type of workshare loop.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- omp::WorksharingLoopType LoopType, bool NeedsBarrier);
+ omp::WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule = false, omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop a statically-scheduled workshare loop with a
/// user-specified chunk size.
@@ -1113,13 +1115,20 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after the
/// loop.
/// \param ChunkSize The user-specified chunk size.
+ /// \param SchedType Optional type of scheduling to be passed to the init function.
+ /// \param DistScheduleChunkSize The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticChunkedWorkshareLoop(DebugLoc DL,
CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
bool NeedsBarrier,
- Value *ChunkSize);
+ Value *ChunkSize,
+ omp::OMPScheduleType SchedType = omp::OMPScheduleType::UnorderedStaticChunked,
+ Value *DistScheduleChunkSize = nullptr,
+ omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
///
@@ -1139,6 +1148,8 @@ class OpenMPIRBuilder {
/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleChunk The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyDynamicWorkshareLoop(DebugLoc DL,
@@ -1146,7 +1157,8 @@ class OpenMPIRBuilder {
InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,
bool NeedsBarrier,
- Value *Chunk = nullptr);
+ Value *Chunk = nullptr,
+ Value *DistScheduleChunk = nullptr);
/// Create alternative version of the loop to support if clause
///
@@ -1197,6 +1209,9 @@ class OpenMPIRBuilder {
/// present.
/// \param LoopType Information about type of loop worksharing.
/// It corresponds to type of loop workshare OpenMP pragma.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ ///
+ /// \param ChunkSize The chunk size for dist_schedule loop
///
/// \returns Point where to insert code after the workshare construct.
LLVM_ABI InsertPointOrErrorTy applyWorkshareLoop(
@@ -1207,7 +1222,9 @@ class OpenMPIRBuilder {
bool HasMonotonicModifier = false, bool HasNonmonotonicModifier = false,
bool HasOrderedClause = false,
omp::WorksharingLoopType LoopType =
- omp::WorksharingLoopType::ForStaticLoop);
+ omp::WorksharingLoopType::ForStaticLoop,
+ bool HasDistSchedule = false,
+ Value* DistScheduleChunkSize = nullptr);
/// Tile a loop nest.
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index ea027e48fa2f1..18da0d772912f 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -136,6 +136,8 @@ static bool isValidWorkshareLoopScheduleType(OMPScheduleType SchedType) {
case OMPScheduleType::NomergeOrderedRuntime:
case OMPScheduleType::NomergeOrderedAuto:
case OMPScheduleType::NomergeOrderedTrapezoidal:
+ case OMPScheduleType::OrderedDistributeChunked:
+ case OMPScheduleType::OrderedDistribute:
break;
default:
return false;
@@ -170,7 +172,7 @@ static const omp::GV &getGridValue(const Triple &T, Function *Kernel) {
/// arguments.
static OMPScheduleType
getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
- bool HasSimdModifier) {
+ bool HasSimdModifier, bool HasDistScheduleChunks) {
// Currently, the default schedule it static.
switch (ClauseKind) {
case OMP_SCHEDULE_Default:
@@ -187,6 +189,9 @@ getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
case OMP_SCHEDULE_Runtime:
return HasSimdModifier ? OMPScheduleType::BaseRuntimeSimd
: OMPScheduleType::BaseRuntime;
+ case OMP_SCHEDULE_Distribute:
+ return HasDistScheduleChunks ? OMPScheduleType::BaseDistributeChunked
+ : OMPScheduleType::BaseDistribute;
}
llvm_unreachable("unhandled schedule clause argument");
}
@@ -255,9 +260,10 @@ getOpenMPMonotonicityScheduleType(OMPScheduleType ScheduleType,
static OMPScheduleType
computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
bool HasSimdModifier, bool HasMonotonicModifier,
- bool HasNonmonotonicModifier, bool HasOrderedClause) {
- OMPScheduleType BaseSchedule =
- getOpenMPBaseScheduleType(ClauseKind, HasChunks, HasSimdModifier);
+ bool HasNonmonotonicModifier, bool HasOrderedClause,
+ bool HasDistScheduleChunks) {
+ OMPScheduleType BaseSchedule = getOpenMPBaseScheduleType(
+ ClauseKind, HasChunks, HasSimdModifier, HasDistScheduleChunks);
OMPScheduleType OrderedSchedule =
getOpenMPOrderingScheduleType(BaseSchedule, HasOrderedClause);
OMPScheduleType Result = getOpenMPMonotonicityScheduleType(
@@ -4637,7 +4643,8 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M,
OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- WorksharingLoopType LoopType, bool NeedsBarrier) {
+ WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule,
+ OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) &&
"Require dedicated allocate IP");
@@ -4693,15 +4700,26 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
// Call the "init" function and update the trip count of the loop with the
// value it produced.
- SmallVector<Value *, 10> Args(
- {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound});
- if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
- Value *PDistUpperBound =
- Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
- Args.push_back(PDistUpperBound);
+ auto BuildInitCall = [LoopType, SrcLoc, ThreadNum, PLastIter, PLowerBound,
+ PUpperBound, IVTy, PStride, One, Zero,
+ StaticInit](Value *SchedulingType, auto &Builder) {
+ SmallVector<Value *, 10> Args({SrcLoc, ThreadNum, SchedulingType, PLastIter,
+ PLowerBound, PUpperBound});
+ if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
+ Value *PDistUpperBound =
+ Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
+ Args.push_back(PDistUpperBound);
+ }
+ Args.append({PStride, One, Zero});
+ Builder.CreateCall(StaticInit, Args);
+ };
+ BuildInitCall(SchedulingType, Builder);
+ if (HasDistSchedule &&
+ LoopType != WorksharingLoopType::DistributeStaticLoop) {
+ Constant *DistScheduleSchedType = ConstantInt::get(
+ I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute));
+ BuildInitCall(DistScheduleSchedType, Builder);
}
- Args.append({PStride, One, Zero});
- Builder.CreateCall(StaticInit, Args);
Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);
Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);
Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound);
@@ -4740,14 +4758,42 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
return AfterIP;
}
+static void addAccessGroupMetadata(BasicBlock *Block, MDNode *AccessGroup,
+ LoopInfo &LI);
+static void addLoopMetadata(CanonicalLoopInfo *Loop,
+ArrayRef<Metadata *> Properties);
+
+static void applyParallelAccessesMetadata(CanonicalLoopInfo *CLI, LLVMContext &Ctx, Loop *Loop, LoopInfo &LoopInfo, SmallVector<Metadata *> &LoopMDList) {
+ SmallSet<BasicBlock *, 8> Reachable;
+
+ // Get the basic blocks from the loop in which memref instructions
+ // can be found.
+ // TODO: Generalize getting all blocks inside a CanonicalizeLoopInfo,
+ // preferably without running any passes.
+ for (BasicBlock *Block : Loop->getBlocks()) {
+ if (Block == CLI->getCond() ||
+ Block == CLI->getHeader())
+ continue;
+ Reachable.insert(Block);
+ }
+
+ // Add access group metadata to memory-access instructions.
+ MDNode *AccessGroup = MDNode::getDistinct(Ctx, {});
+ for (BasicBlock *BB : Reachable)
+ addAccessGroupMetadata(BB, AccessGroup, LoopInfo);
+ // TODO: If the loop has existing parallel access metadata, have
+ // to combine two lists.
+ LoopMDList.push_back(MDNode::get(
+ Ctx, {MDString::get(Ctx, "llvm.loop.parallel_accesses"), AccessGroup}));
+}
+
OpenMPIRBuilder::InsertPointOrErrorTy
-OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
- CanonicalLoopInfo *CLI,
- InsertPointTy AllocaIP,
- bool NeedsBarrier,
- Value *ChunkSize) {
+OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(
+ DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
+ bool NeedsBarrier, Value *ChunkSize, OMPScheduleType SchedType,
+ Value *DistScheduleChunkSize, OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
- assert(ChunkSize && "Chunk size is required");
+ assert(ChunkSize || DistScheduleChunkSize && "Chunk size is required");
LLVMContext &Ctx = CLI->getFunction()->getContext();
Value *IV = CLI->getIndVar();
@@ -4761,6 +4807,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *Zero = ConstantInt::get(InternalIVTy, 0);
Constant *One = ConstantInt::get(InternalIVTy, 1);
+ Function *F = CLI->getFunction();
+ FunctionAnalysisManager FAM;
+ FAM.registerPass([]() { return DominatorTreeAnalysis(); });
+ FAM.registerPass([]() { return PassInstrumentationAnalysis(); });
+ LoopAnalysis LIA;
+ LoopInfo &&LI = LIA.run(*F, FAM);
+ Loop *L = LI.getLoopFor(CLI->getHeader());
+ SmallVector<Metadata *> LoopMDList;
+ if (ChunkSize || DistScheduleChunkSize)
+ applyParallelAccessesMetadata(CLI, Ctx, L, LI, LoopMDList);
+ addLoopMetadata(CLI, LoopMDList);
+
// Declare useful OpenMP runtime functions.
FunctionCallee StaticInit =
getKmpcForStaticInitForType(InternalIVTy, M, *this);
@@ -4783,13 +4841,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Builder.SetCurrentDebugLocation(DL);
// TODO: Detect overflow in ubsan or max-out with current tripcount.
- Value *CastedChunkSize =
- Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize");
+ Value *CastedChunkSize = Builder.CreateZExtOrTrunc(
+ ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize");
+ Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc(
+ DistScheduleChunkSize ? DistScheduleChunkSize : Zero, InternalIVTy,
+ "distschedulechunksize");
Value *CastedTripCount =
Builder.CreateZExt(OrigTripCount, InternalIVTy, "tripcount");
- Constant *SchedulingType = ConstantInt::get(
- I32Type, static_cast<int>(OMPScheduleType::UnorderedStaticChunked));
+ Constant *SchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(SchedType));
+ Constant *DistSchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(DistScheduleSchedType));
Builder.CreateStore(Zero, PLowerBound);
Value *OrigUpperBound = Builder.CreateSub(CastedTripCount, One);
Builder.CreateStore(OrigUpperBound, PUpperBound);
@@ -4801,12 +4864,25 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *SrcLocStr = getOrCreateSrcLocStr(DL, SrcLocStrSize);
Value *SrcLoc = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
Value *ThreadNum = getOrCreateThreadID(SrcLoc);
- Builder.CreateCall(StaticInit,
- {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
- /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
- /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
- /*pstride=*/PStride, /*incr=*/One,
- /*chunk=*/CastedChunkSize});
+ auto BuildInitCall =
+ [StaticInit, SrcLoc, ThreadNum, PLastIter, PLowerBound, PUpperBound,
+ PStride, One](Value *SchedulingType, Value *ChunkSize, auto &Builder) {
+ Builder.CreateCall(
+ StaticInit, {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
+ /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
+ /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
+ /*pstride=*/PStride, /*incr=*/One,
+ /*chunk=*/ChunkSize});
+ };
+ BuildInitCall(SchedulingType, CastedChunkSize, Builder);
+ if (DistScheduleSchedType != OMPScheduleType::None &&
+ SchedType != OMPScheduleType::OrderedDistributeChunked &&
+ SchedType != OMPScheduleType::OrderedDistribute) {
+ // We want to emit a second init function call for the dist_schedule clause
+ // to the Distribute construct. This should only be done however if a
+ // Workshare Loop is nested within a Distribute Construct
+ BuildInitCall(DistSchedulingType, CastestDistScheduleChunkSize, Builder);
+ }
// Load values written by the "init" function.
Value *FirstChunkStart =
@@ -5130,31 +5206,47 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyWorkshareLoop(
bool NeedsBarrier, omp::ScheduleKind SchedKind, Value *ChunkSize,
bool HasSimdModifier, bool HasMonotonicModifier,
bool HasNonmonotonicModifier, bool HasOrderedClause,
- WorksharingLoopType LoopType) {
+ WorksharingLoopType LoopType, bool HasDistSchedule,
+ Value *DistScheduleChunkSize) {
if (Config.isTargetDevice())
return applyWorkshareLoopTarget(DL, CLI, AllocaIP, LoopType);
OMPScheduleType EffectiveScheduleType = computeOpenMPScheduleType(
SchedKind, ChunkSize, HasSimdModifier, HasMonotonicModifier,
- HasNonmonotonicModifier, HasOrderedClause);
+ HasNonmonotonicModifier, HasOrderedClause, DistScheduleChunkSize);
bool IsOrdered = (EffectiveScheduleType & OMPScheduleType::ModifierOrdered) ==
OMPScheduleType::ModifierOrdered;
+ OMPScheduleType DistScheduleSchedType = OMPScheduleType::None;
+ if (HasDistSchedule) {
+ DistScheduleSchedType = DistScheduleChunkSize
+ ? OMPScheduleType::OrderedDistributeChunked
+ : OMPScheduleType::OrderedDistribute;
+ }
switch (EffectiveScheduleType & ~OMPScheduleType::ModifierMask) {
case OMPScheduleType::BaseStatic:
- assert(!ChunkSize && "No chunk size with static-chunked schedule");
- if (IsOrdered)
+ case OMPScheduleType::BaseDistribute:
+ assert(!ChunkSize || !DistScheduleChunkSize &&
+ "No chunk size with static-chunked schedule");
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier);
+ if (DistScheduleChunkSize)
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
+ return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier,
+ HasDistSchedule);
case OMPScheduleType::BaseStaticChunked:
- if (IsOrdered)
+ case OMPScheduleType::BaseDistributeChunked:
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticChunkedWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier,
- ChunkSize);
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
case OMPScheduleType::BaseRuntime:
case OMPScheduleType::BaseAuto:
@@ -5230,7 +5322,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
OMPScheduleType SchedType,
- bool NeedsBarrier, Value *Chunk) {
+ bool NeedsBarrier, Value *Chunk,
+ Value *DistScheduleChunk) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!...
[truncated]
|
@llvm/pr-subscribers-mlir-openmp Author: Jack Styles (Stylie777) Changes
There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop. Patch is 36.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/152736.diff 9 Files Affected:
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 79f25bb05f20e..4117e112367c6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {}
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {}
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {}
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {}
-def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; }
+def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {}
+def OMP_SCHEDULE_Default : EnumVal<"default", 8, 0> { let isDefault = 1; }
def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";
@@ -469,6 +470,7 @@ def OMPC_Schedule : Clause<[Spelling<"schedule">]> {
OMP_SCHEDULE_Guided,
OMP_SCHEDULE_Auto,
OMP_SCHEDULE_Runtime,
+ OMP_SCHEDULE_Distribute,
OMP_SCHEDULE_Default
];
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index f70659120e1e6..395df392babde 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -1096,11 +1096,13 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after
/// the loop.
/// \param LoopType Type of workshare loop.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- omp::WorksharingLoopType LoopType, bool NeedsBarrier);
+ omp::WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule = false, omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop a statically-scheduled workshare loop with a
/// user-specified chunk size.
@@ -1113,13 +1115,20 @@ class OpenMPIRBuilder {
/// \param NeedsBarrier Indicates whether a barrier must be inserted after the
/// loop.
/// \param ChunkSize The user-specified chunk size.
+ /// \param SchedType Optional type of scheduling to be passed to the init function.
+ /// \param DistScheduleChunkSize The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleSchedType Defines the Schedule Type for the Distribute loop. Defaults to None if no Distribute loop is present.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyStaticChunkedWorkshareLoop(DebugLoc DL,
CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
bool NeedsBarrier,
- Value *ChunkSize);
+ Value *ChunkSize,
+ omp::OMPScheduleType SchedType = omp::OMPScheduleType::UnorderedStaticChunked,
+ Value *DistScheduleChunkSize = nullptr,
+ omp::OMPScheduleType DistScheduleSchedType = omp::OMPScheduleType::None);
/// Modifies the canonical loop to be a dynamically-scheduled workshare loop.
///
@@ -1139,6 +1148,8 @@ class OpenMPIRBuilder {
/// the loop.
/// \param Chunk The size of loop chunk considered as a unit when
/// scheduling. If \p nullptr, defaults to 1.
+ /// \param DistScheduleChunk The size of dist_shcedule chunk considered as a unit when
+ /// scheduling. If \p nullptr, defaults to 1.
///
/// \returns Point where to insert code after the workshare construct.
InsertPointOrErrorTy applyDynamicWorkshareLoop(DebugLoc DL,
@@ -1146,7 +1157,8 @@ class OpenMPIRBuilder {
InsertPointTy AllocaIP,
omp::OMPScheduleType SchedType,
bool NeedsBarrier,
- Value *Chunk = nullptr);
+ Value *Chunk = nullptr,
+ Value *DistScheduleChunk = nullptr);
/// Create alternative version of the loop to support if clause
///
@@ -1197,6 +1209,9 @@ class OpenMPIRBuilder {
/// present.
/// \param LoopType Information about type of loop worksharing.
/// It corresponds to type of loop workshare OpenMP pragma.
+ /// \param HasDistSchedule Defines if the clause being lowered is dist_schedule as this is handled slightly differently
+ ///
+ /// \param ChunkSize The chunk size for dist_schedule loop
///
/// \returns Point where to insert code after the workshare construct.
LLVM_ABI InsertPointOrErrorTy applyWorkshareLoop(
@@ -1207,7 +1222,9 @@ class OpenMPIRBuilder {
bool HasMonotonicModifier = false, bool HasNonmonotonicModifier = false,
bool HasOrderedClause = false,
omp::WorksharingLoopType LoopType =
- omp::WorksharingLoopType::ForStaticLoop);
+ omp::WorksharingLoopType::ForStaticLoop,
+ bool HasDistSchedule = false,
+ Value* DistScheduleChunkSize = nullptr);
/// Tile a loop nest.
///
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index ea027e48fa2f1..18da0d772912f 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -136,6 +136,8 @@ static bool isValidWorkshareLoopScheduleType(OMPScheduleType SchedType) {
case OMPScheduleType::NomergeOrderedRuntime:
case OMPScheduleType::NomergeOrderedAuto:
case OMPScheduleType::NomergeOrderedTrapezoidal:
+ case OMPScheduleType::OrderedDistributeChunked:
+ case OMPScheduleType::OrderedDistribute:
break;
default:
return false;
@@ -170,7 +172,7 @@ static const omp::GV &getGridValue(const Triple &T, Function *Kernel) {
/// arguments.
static OMPScheduleType
getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
- bool HasSimdModifier) {
+ bool HasSimdModifier, bool HasDistScheduleChunks) {
// Currently, the default schedule it static.
switch (ClauseKind) {
case OMP_SCHEDULE_Default:
@@ -187,6 +189,9 @@ getOpenMPBaseScheduleType(llvm::omp::ScheduleKind ClauseKind, bool HasChunks,
case OMP_SCHEDULE_Runtime:
return HasSimdModifier ? OMPScheduleType::BaseRuntimeSimd
: OMPScheduleType::BaseRuntime;
+ case OMP_SCHEDULE_Distribute:
+ return HasDistScheduleChunks ? OMPScheduleType::BaseDistributeChunked
+ : OMPScheduleType::BaseDistribute;
}
llvm_unreachable("unhandled schedule clause argument");
}
@@ -255,9 +260,10 @@ getOpenMPMonotonicityScheduleType(OMPScheduleType ScheduleType,
static OMPScheduleType
computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
bool HasSimdModifier, bool HasMonotonicModifier,
- bool HasNonmonotonicModifier, bool HasOrderedClause) {
- OMPScheduleType BaseSchedule =
- getOpenMPBaseScheduleType(ClauseKind, HasChunks, HasSimdModifier);
+ bool HasNonmonotonicModifier, bool HasOrderedClause,
+ bool HasDistScheduleChunks) {
+ OMPScheduleType BaseSchedule = getOpenMPBaseScheduleType(
+ ClauseKind, HasChunks, HasSimdModifier, HasDistScheduleChunks);
OMPScheduleType OrderedSchedule =
getOpenMPOrderingScheduleType(BaseSchedule, HasOrderedClause);
OMPScheduleType Result = getOpenMPMonotonicityScheduleType(
@@ -4637,7 +4643,8 @@ static FunctionCallee getKmpcForStaticInitForType(Type *Ty, Module &M,
OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
- WorksharingLoopType LoopType, bool NeedsBarrier) {
+ WorksharingLoopType LoopType, bool NeedsBarrier, bool HasDistSchedule,
+ OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!isConflictIP(AllocaIP, CLI->getPreheaderIP()) &&
"Require dedicated allocate IP");
@@ -4693,15 +4700,26 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
// Call the "init" function and update the trip count of the loop with the
// value it produced.
- SmallVector<Value *, 10> Args(
- {SrcLoc, ThreadNum, SchedulingType, PLastIter, PLowerBound, PUpperBound});
- if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
- Value *PDistUpperBound =
- Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
- Args.push_back(PDistUpperBound);
+ auto BuildInitCall = [LoopType, SrcLoc, ThreadNum, PLastIter, PLowerBound,
+ PUpperBound, IVTy, PStride, One, Zero,
+ StaticInit](Value *SchedulingType, auto &Builder) {
+ SmallVector<Value *, 10> Args({SrcLoc, ThreadNum, SchedulingType, PLastIter,
+ PLowerBound, PUpperBound});
+ if (LoopType == WorksharingLoopType::DistributeForStaticLoop) {
+ Value *PDistUpperBound =
+ Builder.CreateAlloca(IVTy, nullptr, "p.distupperbound");
+ Args.push_back(PDistUpperBound);
+ }
+ Args.append({PStride, One, Zero});
+ Builder.CreateCall(StaticInit, Args);
+ };
+ BuildInitCall(SchedulingType, Builder);
+ if (HasDistSchedule &&
+ LoopType != WorksharingLoopType::DistributeStaticLoop) {
+ Constant *DistScheduleSchedType = ConstantInt::get(
+ I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute));
+ BuildInitCall(DistScheduleSchedType, Builder);
}
- Args.append({PStride, One, Zero});
- Builder.CreateCall(StaticInit, Args);
Value *LowerBound = Builder.CreateLoad(IVTy, PLowerBound);
Value *InclusiveUpperBound = Builder.CreateLoad(IVTy, PUpperBound);
Value *TripCountMinusOne = Builder.CreateSub(InclusiveUpperBound, LowerBound);
@@ -4740,14 +4758,42 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyStaticWorkshareLoop(
return AfterIP;
}
+static void addAccessGroupMetadata(BasicBlock *Block, MDNode *AccessGroup,
+ LoopInfo &LI);
+static void addLoopMetadata(CanonicalLoopInfo *Loop,
+ArrayRef<Metadata *> Properties);
+
+static void applyParallelAccessesMetadata(CanonicalLoopInfo *CLI, LLVMContext &Ctx, Loop *Loop, LoopInfo &LoopInfo, SmallVector<Metadata *> &LoopMDList) {
+ SmallSet<BasicBlock *, 8> Reachable;
+
+ // Get the basic blocks from the loop in which memref instructions
+ // can be found.
+ // TODO: Generalize getting all blocks inside a CanonicalizeLoopInfo,
+ // preferably without running any passes.
+ for (BasicBlock *Block : Loop->getBlocks()) {
+ if (Block == CLI->getCond() ||
+ Block == CLI->getHeader())
+ continue;
+ Reachable.insert(Block);
+ }
+
+ // Add access group metadata to memory-access instructions.
+ MDNode *AccessGroup = MDNode::getDistinct(Ctx, {});
+ for (BasicBlock *BB : Reachable)
+ addAccessGroupMetadata(BB, AccessGroup, LoopInfo);
+ // TODO: If the loop has existing parallel access metadata, have
+ // to combine two lists.
+ LoopMDList.push_back(MDNode::get(
+ Ctx, {MDString::get(Ctx, "llvm.loop.parallel_accesses"), AccessGroup}));
+}
+
OpenMPIRBuilder::InsertPointOrErrorTy
-OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
- CanonicalLoopInfo *CLI,
- InsertPointTy AllocaIP,
- bool NeedsBarrier,
- Value *ChunkSize) {
+OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(
+ DebugLoc DL, CanonicalLoopInfo *CLI, InsertPointTy AllocaIP,
+ bool NeedsBarrier, Value *ChunkSize, OMPScheduleType SchedType,
+ Value *DistScheduleChunkSize, OMPScheduleType DistScheduleSchedType) {
assert(CLI->isValid() && "Requires a valid canonical loop");
- assert(ChunkSize && "Chunk size is required");
+ assert(ChunkSize || DistScheduleChunkSize && "Chunk size is required");
LLVMContext &Ctx = CLI->getFunction()->getContext();
Value *IV = CLI->getIndVar();
@@ -4761,6 +4807,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *Zero = ConstantInt::get(InternalIVTy, 0);
Constant *One = ConstantInt::get(InternalIVTy, 1);
+ Function *F = CLI->getFunction();
+ FunctionAnalysisManager FAM;
+ FAM.registerPass([]() { return DominatorTreeAnalysis(); });
+ FAM.registerPass([]() { return PassInstrumentationAnalysis(); });
+ LoopAnalysis LIA;
+ LoopInfo &&LI = LIA.run(*F, FAM);
+ Loop *L = LI.getLoopFor(CLI->getHeader());
+ SmallVector<Metadata *> LoopMDList;
+ if (ChunkSize || DistScheduleChunkSize)
+ applyParallelAccessesMetadata(CLI, Ctx, L, LI, LoopMDList);
+ addLoopMetadata(CLI, LoopMDList);
+
// Declare useful OpenMP runtime functions.
FunctionCallee StaticInit =
getKmpcForStaticInitForType(InternalIVTy, M, *this);
@@ -4783,13 +4841,18 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Builder.SetCurrentDebugLocation(DL);
// TODO: Detect overflow in ubsan or max-out with current tripcount.
- Value *CastedChunkSize =
- Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize");
+ Value *CastedChunkSize = Builder.CreateZExtOrTrunc(
+ ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize");
+ Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc(
+ DistScheduleChunkSize ? DistScheduleChunkSize : Zero, InternalIVTy,
+ "distschedulechunksize");
Value *CastedTripCount =
Builder.CreateZExt(OrigTripCount, InternalIVTy, "tripcount");
- Constant *SchedulingType = ConstantInt::get(
- I32Type, static_cast<int>(OMPScheduleType::UnorderedStaticChunked));
+ Constant *SchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(SchedType));
+ Constant *DistSchedulingType =
+ ConstantInt::get(I32Type, static_cast<int>(DistScheduleSchedType));
Builder.CreateStore(Zero, PLowerBound);
Value *OrigUpperBound = Builder.CreateSub(CastedTripCount, One);
Builder.CreateStore(OrigUpperBound, PUpperBound);
@@ -4801,12 +4864,25 @@ OpenMPIRBuilder::applyStaticChunkedWorkshareLoop(DebugLoc DL,
Constant *SrcLocStr = getOrCreateSrcLocStr(DL, SrcLocStrSize);
Value *SrcLoc = getOrCreateIdent(SrcLocStr, SrcLocStrSize);
Value *ThreadNum = getOrCreateThreadID(SrcLoc);
- Builder.CreateCall(StaticInit,
- {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
- /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
- /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
- /*pstride=*/PStride, /*incr=*/One,
- /*chunk=*/CastedChunkSize});
+ auto BuildInitCall =
+ [StaticInit, SrcLoc, ThreadNum, PLastIter, PLowerBound, PUpperBound,
+ PStride, One](Value *SchedulingType, Value *ChunkSize, auto &Builder) {
+ Builder.CreateCall(
+ StaticInit, {/*loc=*/SrcLoc, /*global_tid=*/ThreadNum,
+ /*schedtype=*/SchedulingType, /*plastiter=*/PLastIter,
+ /*plower=*/PLowerBound, /*pupper=*/PUpperBound,
+ /*pstride=*/PStride, /*incr=*/One,
+ /*chunk=*/ChunkSize});
+ };
+ BuildInitCall(SchedulingType, CastedChunkSize, Builder);
+ if (DistScheduleSchedType != OMPScheduleType::None &&
+ SchedType != OMPScheduleType::OrderedDistributeChunked &&
+ SchedType != OMPScheduleType::OrderedDistribute) {
+ // We want to emit a second init function call for the dist_schedule clause
+ // to the Distribute construct. This should only be done however if a
+ // Workshare Loop is nested within a Distribute Construct
+ BuildInitCall(DistSchedulingType, CastestDistScheduleChunkSize, Builder);
+ }
// Load values written by the "init" function.
Value *FirstChunkStart =
@@ -5130,31 +5206,47 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::applyWorkshareLoop(
bool NeedsBarrier, omp::ScheduleKind SchedKind, Value *ChunkSize,
bool HasSimdModifier, bool HasMonotonicModifier,
bool HasNonmonotonicModifier, bool HasOrderedClause,
- WorksharingLoopType LoopType) {
+ WorksharingLoopType LoopType, bool HasDistSchedule,
+ Value *DistScheduleChunkSize) {
if (Config.isTargetDevice())
return applyWorkshareLoopTarget(DL, CLI, AllocaIP, LoopType);
OMPScheduleType EffectiveScheduleType = computeOpenMPScheduleType(
SchedKind, ChunkSize, HasSimdModifier, HasMonotonicModifier,
- HasNonmonotonicModifier, HasOrderedClause);
+ HasNonmonotonicModifier, HasOrderedClause, DistScheduleChunkSize);
bool IsOrdered = (EffectiveScheduleType & OMPScheduleType::ModifierOrdered) ==
OMPScheduleType::ModifierOrdered;
+ OMPScheduleType DistScheduleSchedType = OMPScheduleType::None;
+ if (HasDistSchedule) {
+ DistScheduleSchedType = DistScheduleChunkSize
+ ? OMPScheduleType::OrderedDistributeChunked
+ : OMPScheduleType::OrderedDistribute;
+ }
switch (EffectiveScheduleType & ~OMPScheduleType::ModifierMask) {
case OMPScheduleType::BaseStatic:
- assert(!ChunkSize && "No chunk size with static-chunked schedule");
- if (IsOrdered)
+ case OMPScheduleType::BaseDistribute:
+ assert(!ChunkSize || !DistScheduleChunkSize &&
+ "No chunk size with static-chunked schedule");
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier);
+ if (DistScheduleChunkSize)
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
+ return applyStaticWorkshareLoop(DL, CLI, AllocaIP, LoopType, NeedsBarrier,
+ HasDistSchedule);
case OMPScheduleType::BaseStaticChunked:
- if (IsOrdered)
+ case OMPScheduleType::BaseDistributeChunked:
+ if (IsOrdered && !HasDistSchedule)
return applyDynamicWorkshareLoop(DL, CLI, AllocaIP, EffectiveScheduleType,
NeedsBarrier, ChunkSize);
// FIXME: Monotonicity ignored?
- return applyStaticChunkedWorkshareLoop(DL, CLI, AllocaIP, NeedsBarrier,
- ChunkSize);
+ return applyStaticChunkedWorkshareLoop(
+ DL, CLI, AllocaIP, NeedsBarrier, ChunkSize, EffectiveScheduleType,
+ DistScheduleChunkSize, DistScheduleSchedType);
case OMPScheduleType::BaseRuntime:
case OMPScheduleType::BaseAuto:
@@ -5230,7 +5322,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::applyDynamicWorkshareLoop(DebugLoc DL, CanonicalLoopInfo *CLI,
InsertPointTy AllocaIP,
OMPScheduleType SchedType,
- bool NeedsBarrier, Value *Chunk) {
+ bool NeedsBarrier, Value *Chunk,
+ Value *DistScheduleChunk) {
assert(CLI->isValid() && "Requires a valid canonical loop");
assert(!...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
`dist_schedule` was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for the `dist_schedule` clause to be lowered to LLVM IR when used in an `omp.distribute` section. Support is also added for `dist_schedule` to be used when the loop nest is embedded within a Workshare Loop. There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library. Add llvm loop metadata Update implementation to support processing in workshare loop.
464cd87
to
08ed236
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you update flang/docs/OpenMPSupport.md
to show dist_schedule is now supported.
@@ -458,7 +458,8 @@ def OMP_SCHEDULE_Dynamic : EnumVal<"dynamic", 3, 1> {} | |||
def OMP_SCHEDULE_Guided : EnumVal<"guided", 4, 1> {} | |||
def OMP_SCHEDULE_Auto : EnumVal<"auto", 5, 1> {} | |||
def OMP_SCHEDULE_Runtime : EnumVal<"runtime", 6, 1> {} | |||
def OMP_SCHEDULE_Default : EnumVal<"default", 7, 0> { let isDefault = 1; } | |||
def OMP_SCHEDULE_Distribute : EnumVal<"distribute", 7, 1> {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the change to ensure the schedule type passed to the openmp runtime matches what clang does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
LoopType != WorksharingLoopType::DistributeStaticLoop) { | ||
Constant *DistScheduleSchedType = ConstantInt::get( | ||
I32Type, static_cast<int>(omp::OMPScheduleType::OrderedDistribute)); | ||
BuildInitCall(DistScheduleSchedType, Builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two init calls in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang does have 2 init calls, one for schedule
and one for dist_schedule
when they are used together. If dist_schedule
is used on its own, only 1 init call will be emitted. (I think I may need to add a test for this).
Builder.CreateZExtOrTrunc(ChunkSize, InternalIVTy, "chunksize"); | ||
Value *CastedChunkSize = Builder.CreateZExtOrTrunc( | ||
ChunkSize ? ChunkSize : Zero, InternalIVTy, "chunksize"); | ||
Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value *CastestDistScheduleChunkSize = Builder.CreateZExtOrTrunc( | |
Value *CastedDistScheduleChunkSize = Builder.CreateZExtOrTrunc( |
// We want to emit a second init function call for the dist_schedule clause | ||
// to the Distribute construct. This should only be done however if a | ||
// Workshare Loop is nested within a Distribute Construct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising to me, although I am no expert on the OpenMP runtime. Is this something you saw in clang's output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, this comment is left over from a different approach and I should have removed this before opening the PR.
EDIT: Nope, wrong comment I was thinking of. I am looking into this and seeing how to better match the behaviour of Clang.
I have converted this to a draft for the time being while I investigate some inconsistencies between MLIR generation with this patch and Clang. |
dist_schedule
was previously supported in Flang/Clang but was not implemented in MLIR, instead a user would get a "not yet implemented" error. This patch adds support for thedist_schedule
clause to be lowered to LLVM IR when used in anomp.distribute
section. Support is also added fordist_schedule
to be used when the loop nest is embedded within a Workshare Loop.There has needed to be some rework required to ensure that MLIR/LLVM emits the correct Schedule Type for the clause, as it uses a different schedule type to other OpenMP directives/clauses in the runtime library.
Add llvm loop metadata
Update implementation to support processing in workshare loop.