Skip to content

Commit b6c5ff8

Browse files
committed
[OpenMPOpt] Make parallel regions reachable from new DeviceRTL loop functions
This patch updates the OpenMP optimization pass to know about the new DeviceRTL functions for loop constructs. This change marks these functions as potentially containing parallel regions, which fixes a current bug with the state machine rewrite optimization. It previously failed to identify parallel regions located inside of the callbacks passed to these new DeviceRTL functions, causing the resulting code to skip executing these parallel regions. As a result, Generic kernels produced by Flang that contain parallel regions now work properly. One known related issue not fixed by this patch is that the presence of calls to these functions will prevent the SPMD-ization of Generic kernels by OpenMPOpt. Previously, this was due to assuming there was no parallel region. This is changed by this patch, but instead we now mark it temporarily as unsupported in an SPMD context. The reason is that, without additional changes, code intended for the main thread of the team located outside of the parallel region would not be guarded properly, resulting in race conditions and generally invalid behavior.
1 parent b6e9849 commit b6c5ff8

File tree

3 files changed

+191
-0
lines changed

3 files changed

+191
-0
lines changed

llvm/lib/Transforms/IPO/OpenMPOpt.cpp

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5020,6 +5020,28 @@ struct AAKernelInfoCallSite : AAKernelInfo {
50205020
case OMPRTL___kmpc_free_shared:
50215021
// Return without setting a fixpoint, to be resolved in updateImpl.
50225022
return;
5023+
case OMPRTL___kmpc_distribute_static_loop_4:
5024+
case OMPRTL___kmpc_distribute_static_loop_4u:
5025+
case OMPRTL___kmpc_distribute_static_loop_8:
5026+
case OMPRTL___kmpc_distribute_static_loop_8u:
5027+
case OMPRTL___kmpc_distribute_for_static_loop_4:
5028+
case OMPRTL___kmpc_distribute_for_static_loop_4u:
5029+
case OMPRTL___kmpc_distribute_for_static_loop_8:
5030+
case OMPRTL___kmpc_distribute_for_static_loop_8u:
5031+
case OMPRTL___kmpc_for_static_loop_4:
5032+
case OMPRTL___kmpc_for_static_loop_4u:
5033+
case OMPRTL___kmpc_for_static_loop_8:
5034+
case OMPRTL___kmpc_for_static_loop_8u:
5035+
// Parallel regions might be reached by these calls, as they take a
5036+
// callback argument potentially arbitrary user-provided code.
5037+
ReachedUnknownParallelRegions.insert(&CB);
5038+
// TODO: The presence of these calls on their own does not prevent a
5039+
// kernel from being SPMD-izable. We mark it as such because we need
5040+
// further changes in order to also consider the contents of the
5041+
// callbacks passed to them.
5042+
SPMDCompatibilityTracker.indicatePessimisticFixpoint();
5043+
SPMDCompatibilityTracker.insert(&CB);
5044+
break;
50235045
default:
50245046
// Unknown OpenMP runtime calls cannot be executed in SPMD-mode,
50255047
// generally. However, they do not hide parallel regions.
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
! Offloading test for generic target regions containing different kinds of
2+
! loop constructs inside.
3+
! REQUIRES: flang, amdgpu
4+
5+
! RUN: %libomptarget-compile-fortran-run-and-check-generic
6+
program main
7+
integer :: i1, i2, n1, n2, counter
8+
9+
n1 = 100
10+
n2 = 50
11+
12+
counter = 0
13+
!$omp target map(tofrom:counter)
14+
!$omp teams distribute reduction(+:counter)
15+
do i1=1, n1
16+
counter = counter + 1
17+
end do
18+
!$omp end target
19+
20+
! CHECK: 1 100
21+
print '(I2" "I0)', 1, counter
22+
23+
counter = 0
24+
!$omp target map(tofrom:counter)
25+
!$omp parallel do reduction(+:counter)
26+
do i1=1, n1
27+
counter = counter + 1
28+
end do
29+
!$omp parallel do reduction(+:counter)
30+
do i1=1, n1
31+
counter = counter + 1
32+
end do
33+
!$omp end target
34+
35+
! CHECK: 2 200
36+
print '(I2" "I0)', 2, counter
37+
38+
counter = 0
39+
!$omp target map(tofrom:counter)
40+
counter = counter + 1
41+
!$omp parallel do reduction(+:counter)
42+
do i1=1, n1
43+
counter = counter + 1
44+
end do
45+
counter = counter + 1
46+
!$omp parallel do reduction(+:counter)
47+
do i1=1, n1
48+
counter = counter + 1
49+
end do
50+
counter = counter + 1
51+
!$omp end target
52+
53+
! CHECK: 3 203
54+
print '(I2" "I0)', 3, counter
55+
56+
counter = 0
57+
!$omp target map(tofrom: counter)
58+
counter = counter + 1
59+
!$omp parallel do reduction(+:counter)
60+
do i1=1, n1
61+
counter = counter + 1
62+
end do
63+
counter = counter + 1
64+
!$omp end target
65+
66+
! CHECK: 4 102
67+
print '(I2" "I0)', 4, counter
68+
69+
70+
counter = 0
71+
!$omp target teams distribute reduction(+:counter)
72+
do i1=1, n1
73+
!$omp parallel do reduction(+:counter)
74+
do i2=1, n2
75+
counter = counter + 1
76+
end do
77+
end do
78+
79+
! CHECK: 5 5000
80+
print '(I2" "I0)', 5, counter
81+
82+
counter = 0
83+
!$omp target teams distribute reduction(+:counter)
84+
do i1=1, n1
85+
counter = counter + 1
86+
!$omp parallel do reduction(+:counter)
87+
do i2=1, n2
88+
counter = counter + 1
89+
end do
90+
counter = counter + 1
91+
end do
92+
93+
! CHECK: 6 5200
94+
print '(I2" "I0)', 6, counter
95+
96+
counter = 0
97+
!$omp target teams distribute reduction(+:counter)
98+
do i1=1, n1
99+
!$omp parallel do reduction(+:counter)
100+
do i2=1, n2
101+
counter = counter + 1
102+
end do
103+
!$omp parallel do reduction(+:counter)
104+
do i2=1, n2
105+
counter = counter + 1
106+
end do
107+
end do
108+
109+
! CHECK: 7 10000
110+
print '(I2" "I0)', 7, counter
111+
112+
counter = 0
113+
!$omp target teams distribute reduction(+:counter)
114+
do i1=1, n1
115+
counter = counter + 1
116+
!$omp parallel do reduction(+:counter)
117+
do i2=1, n2
118+
counter = counter + 1
119+
end do
120+
counter = counter + 1
121+
!$omp parallel do reduction(+:counter)
122+
do i2=1, n2
123+
counter = counter + 1
124+
end do
125+
counter = counter + 1
126+
end do
127+
128+
! CHECK: 8 10300
129+
print '(I2" "I0)', 8, counter
130+
end program
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
! Offloading test for generic target regions containing different kinds of
2+
! loop constructs inside.
3+
! REQUIRES: flang, amdgpu
4+
5+
! RUN: %libomptarget-compile-fortran-run-and-check-generic
6+
program main
7+
integer :: i1, n1, counter
8+
9+
n1 = 100
10+
11+
counter = 0
12+
!$omp target parallel do reduction(+:counter)
13+
do i1=1, n1
14+
counter = counter + 1
15+
end do
16+
17+
! CHECK: 1 100
18+
print '(I2" "I0)', 1, counter
19+
20+
counter = 0
21+
!$omp target map(tofrom:counter)
22+
!$omp parallel do reduction(+:counter)
23+
do i1=1, n1
24+
counter = counter + 1
25+
end do
26+
!$omp end target
27+
28+
! CHECK: 2 100
29+
print '(I2" "I0)', 2, counter
30+
31+
counter = 0
32+
!$omp target teams distribute parallel do reduction(+:counter)
33+
do i1=1, n1
34+
counter = counter + 1
35+
end do
36+
37+
! CHECK: 3 100
38+
print '(I2" "I0)', 3, counter
39+
end program

0 commit comments

Comments
 (0)