Skip to content

Commit 67da385

Browse files
[SYCL][NFCI] Refactor device code split implementation once again (#8833)
#### Intro This is a refactoring of how we perform device code split in `sycl-post-link`, which is intended to solve several existing issues with the current implementation: 1. increased peak RAM consumption by `sycl-post-link` 2. bad scaling with more and more split "dimensions" being added 3. increased tests maintenance cost due to non-deterministic order (between commits) of output files produced by `sycl-post-link` #### A bit more context about the issues above: (1) Increase peak RAM consumption is caused by the fact that we currently preserve **all** splits in-memory, even though we can process them on-by-one and discard them as soon as we stored them to a disk. This was implemented as a memory consumption optimization in #5021, but it got accidentally reverted in #7302 as an attempt to workaround (2). (2) is pretty much summarized in our source code: https://github.com/intel/llvm/blob/afebb2543ccecb89f83c84b68fba7616bbab89ac/llvm/tools/sycl-post-link/sycl-post-link.cpp#L806-L811 (3) is caused by a bad implementation decision made in #7302: because every split is now identified by a hash, every time you add a new split "dimension"/new feature to an account, it results in different hashes for existing tests. Just look how many unrelated tests had to be updated in #7512, #8056 and #8167 #### Now to the PR itself: It introduces a new infrastructure for categorizing/grouping kernel functions: instead of using hashes, we now build a string description for each kernel function and then group kernels with the same description string together. String description is built by a new entity: it accepts a set of rules, where each rule is a simple function which returns a string for passed `llvm::Function`. Results of all rules are concatenated together and rules are invoked in a stable order of their registration. There is a simple API for building those rules. It provides some predefined rules for the most popular use cases like turning a function attribute or a metadata into a string descriptor for the function. There is also a possibility to pass a custom callback there to implement more complicated logic. #### How does this PR help with issues above? (1) and (2) are fixed in conjunction: `sycl-post-link` was refactored to avoid storing more than one split module at a time and that is possible because the PR unifies per-scope and optional-kernel-features splitters into a single generic splitter. The new API for kernels categorization seems to be flexible enough to provide that infrastructure so merged splitters still look OK code-wise. (3) is caused by using string identifiers instead of hashes as well as by using a data structure which sorts identifiers. #### Any other benefits from this PR? About 50 lines of code less to support :) Extending device code split for more optional features would be even easier than it is now: instead of adding several changes to various places around `UsedOptionalFeatures` structure, it will be just adding a 1-3 lines of code. Please also note that `UsedOptionalFeatures` contains tons of inconsistencies in its implementation, which will all gone with this PR: in `operator==` we don't use hash and instead compare certain fields directly (and we do miss some of them); `generateModuleName` method skips some of optional features and ignores them. Cross-module `device_global` usages checks should now work at all split dimensions (except for ESIMD). #### Any potential downsides? With current `UsedOptionalFeatures` there is a possibility to embed various information (used aspects, `large-grf` flag, etc.) directly during device code split to avoid re-gathering that information later when we generate properties. With the suggested approach, it would be harder to do, because it doesn't seem to naturally fit to the proposed infrastructure: see changes I did around `large-grf` in this PR. However, we have never actually implemented this and re-querying some metadata from function doesn't seem like a bottleneck, so it should really be a very minor and only theoretical downside.
1 parent 33facd0 commit 67da385

20 files changed

+432
-385
lines changed

llvm/include/llvm/SYCLLowerIR/SYCLUtils.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ namespace llvm {
2222
namespace sycl {
2323
namespace utils {
2424
constexpr char ATTR_SYCL_MODULE_ID[] = "sycl-module-id";
25+
constexpr char ATTR_SYCL_OPTLEVEL[] = "sycl-optlevel";
2526

2627
using CallGraphNodeAction = ::std::function<void(Function *)>;
2728
using CallGraphFunctionFilter =

llvm/test/tools/sycl-post-link/assert/property-1.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212
; RUN: FileCheck %s -input-file=%t_0.prop --implicit-check-not TheKernel2
1313
;
1414
; RUN: sycl-post-link -split=kernel -symbols -S < %s -o %t.table
15-
; RUN: FileCheck %s -input-file=%t_0.prop --check-prefixes=CHECK-K1
16-
; RUN: FileCheck %s -input-file=%t_1.prop --check-prefixes=CHECK-K2
17-
; RUN: FileCheck %s -input-file=%t_2.prop --check-prefixes=CHECK-K3
15+
; RUN: FileCheck %s -input-file=%t_0.prop --check-prefixes=CHECK-K3
16+
; RUN: FileCheck %s -input-file=%t_1.prop --check-prefixes=CHECK-K1
17+
; RUN: FileCheck %s -input-file=%t_2.prop --check-prefixes=CHECK-K2
1818

1919
; SYCL source:
2020
; void foo() {

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-1.ll

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,29 +10,29 @@
1010
; RUN: sycl-post-link -split=auto -symbols -S < %s -o %t.table
1111
; RUN: FileCheck %s -input-file=%t_0.ll --check-prefixes CHECK-M0-IR \
1212
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
13-
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M1-IR \
13+
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M1-IR \
1414
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
15-
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M2-IR \
15+
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M2-IR \
1616
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
1717
; RUN: FileCheck %s -input-file=%t_0.sym --check-prefixes CHECK-M0-SYMS \
1818
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
19-
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M1-SYMS \
19+
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M1-SYMS \
2020
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
21-
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M2-SYMS \
21+
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M2-SYMS \
2222
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
2323

2424
; RUN: sycl-post-link -split=source -symbols -S < %s -o %t.table
2525
; RUN: FileCheck %s -input-file=%t_0.ll --check-prefixes CHECK-M0-IR \
2626
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
27-
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M1-IR \
27+
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M1-IR \
2828
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
29-
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M2-IR \
29+
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M2-IR \
3030
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
3131
; RUN: FileCheck %s -input-file=%t_0.sym --check-prefixes CHECK-M0-SYMS \
3232
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
33-
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M1-SYMS \
33+
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M1-SYMS \
3434
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
35-
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M2-SYMS \
35+
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M2-SYMS \
3636
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
3737

3838
; RUN: sycl-post-link -split=kernel -symbols -S < %s -o %t.table

llvm/test/tools/sycl-post-link/device-code-split/per-aspect-split-2.ll

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@
2121
; CHECK-TABLE-NEXT: _2.sym
2222
; CHECK-TABLE-EMPTY:
2323

24-
; CHECK-M0-SYMS: kernel0
24+
; CHECK-M0-SYMS: kernel3
2525

26-
; CHECK-M1-SYMS: kernel3
26+
; CHECK-M1-SYMS: kernel1
27+
; CHECK-M1-SYMS: kernel2
2728

28-
; CHECK-M2-SYMS: kernel1
29-
; CHECK-M2-SYMS: kernel2
29+
; CHECK-M2-SYMS: kernel0
3030

3131
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
3232
target triple = "spir64-unknown-linux"

llvm/test/tools/sycl-post-link/device-code-split/per-reqd-wg-size-split-1.ll

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,29 +10,29 @@
1010
; RUN: sycl-post-link -split=auto -symbols -S < %s -o %t.table
1111
; RUN: FileCheck %s -input-file=%t_0.ll --check-prefixes CHECK-M0-IR \
1212
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
13-
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M1-IR \
13+
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M1-IR \
1414
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
15-
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M2-IR \
15+
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M2-IR \
1616
; RUN: --implicit-check-not kernel1 --implicit-check-not kernel2
1717
; RUN: FileCheck %s -input-file=%t_0.sym --check-prefixes CHECK-M0-SYMS \
1818
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
19-
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M1-SYMS \
19+
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M1-SYMS \
2020
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel2
21-
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M2-SYMS \
21+
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M2-SYMS \
2222
; RUN: --implicit-check-not kernel1 --implicit-check-not kernel2
2323

2424
; RUN: sycl-post-link -split=source -symbols -S < %s -o %t.table
2525
; RUN: FileCheck %s -input-file=%t_0.ll --check-prefixes CHECK-M0-IR \
2626
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
27-
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M1-IR \
27+
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M1-IR \
2828
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel2
29-
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M2-IR \
29+
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M2-IR \
3030
; RUN: --implicit-check-not kernel1 --implicit-check-not kernel2
3131
; RUN: FileCheck %s -input-file=%t_0.sym --check-prefixes CHECK-M0-SYMS \
3232
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
33-
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M1-SYMS \
33+
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M1-SYMS \
3434
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel2
35-
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefixes CHECK-M2-SYMS \
35+
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefixes CHECK-M2-SYMS \
3636
; RUN: --implicit-check-not kernel1 --implicit-check-not kernel2
3737

3838
; RUN: sycl-post-link -split=kernel -symbols -S < %s -o %t.table

llvm/test/tools/sycl-post-link/device-code-split/per-reqd-wg-size-split-2.ll

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,29 @@
44
; RUN: sycl-post-link -split=auto -symbols -S < %s -o %t.table
55
; RUN: FileCheck %s -input-file=%t.table --check-prefix CHECK-TABLE
66
;
7-
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefix CHECK-M0-SYMS \
8-
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1 \
7+
; RUN: FileCheck %s -input-file=%t_0.sym --check-prefix CHECK-M0-SYMS \
8+
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel2
9+
;
10+
; RUN: FileCheck %s -input-file=%t_1.sym --check-prefix CHECK-M1-SYMS \
11+
; RUN: --implicit-check-not kernel1 --implicit-check-not kernel3 \
912
; RUN: --implicit-check-not kernel2
1013
;
1114
; RUN: FileCheck %s -input-file=%t_2.sym --check-prefix CHECK-M2-SYMS \
12-
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel3
13-
;
14-
; RUN: FileCheck %s -input-file=%t_0.sym --check-prefix CHECK-M1-SYMS \
1515
; RUN: --implicit-check-not kernel1 --implicit-check-not kernel2 \
16-
; RUN: --implicit-check-not kernel3
16+
; RUN: --implicit-check-not kernel0
1717

1818
; CHECK-TABLE: Code
1919
; CHECK-TABLE-NEXT: _0.sym
2020
; CHECK-TABLE-NEXT: _1.sym
2121
; CHECK-TABLE-NEXT: _2.sym
2222
; CHECK-TABLE-EMPTY:
2323

24-
; CHECK-M0-SYMS: kernel3
24+
; CHECK-M0-SYMS: kernel1
25+
; CHECK-M0-SYMS: kernel2
2526

2627
; CHECK-M1-SYMS: kernel0
2728

28-
; CHECK-M2-SYMS: kernel1
29-
; CHECK-M2-SYMS: kernel2
29+
; CHECK-M2-SYMS: kernel3
3030

3131
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
3232
target triple = "spir64-unknown-linux"

llvm/test/tools/sycl-post-link/device-code-split/split-with-kernel-declarations.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
;
99
; RUN: sycl-post-link -split=kernel -symbols -S < %s -o %t1.table
1010
; RUN: FileCheck %s -input-file=%t1.table --check-prefix CHECK-PER-KERNEL-TABLE
11-
; RUN: FileCheck %s -input-file=%t1_0.sym --check-prefix CHECK-PER-KERNEL-SYM0
12-
; RUN: FileCheck %s -input-file=%t1_1.sym --check-prefix CHECK-PER-KERNEL-SYM1
13-
; RUN: FileCheck %s -input-file=%t1_2.sym --check-prefix CHECK-PER-KERNEL-SYM2
11+
; RUN: FileCheck %s -input-file=%t1_0.sym --check-prefix CHECK-PER-KERNEL-SYM1
12+
; RUN: FileCheck %s -input-file=%t1_1.sym --check-prefix CHECK-PER-KERNEL-SYM2
13+
; RUN: FileCheck %s -input-file=%t1_2.sym --check-prefix CHECK-PER-KERNEL-SYM0
1414

1515
; With per-source split, there should be two device images
1616
; CHECK-PER-SOURCE-TABLE: [Code|Properties|Symbols]

llvm/test/tools/sycl-post-link/device-globals/test_global_variable_many_kernels_in_one_module.ll

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
; RUN: sycl-post-link --device-globals --split=source -S < %s -o %t.files.table
2-
; RUN: FileCheck %s -input-file=%t.files_0.ll --check-prefix CHECK-MOD0
3-
; RUN: FileCheck %s -input-file=%t.files_1.ll --check-prefix CHECK-MOD1
2+
; RUN: FileCheck %s -input-file=%t.files_0.ll --check-prefix CHECK-MOD1
3+
; RUN: FileCheck %s -input-file=%t.files_1.ll --check-prefix CHECK-MOD0
44

55
; This test is intended to check that sycl-post-link generates no errors
66
; when a device global variable with the 'device_image_scope' property

llvm/test/tools/sycl-post-link/device-globals/test_global_variable_many_modules_no_dev_global.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
; RUN: sycl-post-link --device-globals --split=source -S < %s -o %t.files.table
2-
; RUN: FileCheck %s -input-file=%t.files_0.ll --check-prefix CHECK-MOD0
3-
; RUN: FileCheck %s -input-file=%t.files_1.ll --check-prefix CHECK-MOD1
4-
; RUN: FileCheck %s -input-file=%t.files_2.ll --check-prefix CHECK-MOD2
2+
; RUN: FileCheck %s -input-file=%t.files_0.ll --check-prefix CHECK-MOD2
3+
; RUN: FileCheck %s -input-file=%t.files_1.ll --check-prefix CHECK-MOD0
4+
; RUN: FileCheck %s -input-file=%t.files_2.ll --check-prefix CHECK-MOD1
55

66
; This test is intended to check that sycl-post-link generates no error if the
77
; 'device_image_scope' property is attached to not a device global variable.

llvm/test/tools/sycl-post-link/device-globals/test_global_variable_many_modules_no_dev_img_scope.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
; RUN: sycl-post-link --device-globals --split=source -S < %s -o %t.files.table
2-
; RUN: FileCheck %s -input-file=%t.files_0.ll --check-prefix CHECK-MOD0
3-
; RUN: FileCheck %s -input-file=%t.files_1.ll --check-prefix CHECK-MOD1
4-
; RUN: FileCheck %s -input-file=%t.files_2.ll --check-prefix CHECK-MOD2
2+
; RUN: FileCheck %s -input-file=%t.files_0.ll --check-prefix CHECK-MOD2
3+
; RUN: FileCheck %s -input-file=%t.files_1.ll --check-prefix CHECK-MOD0
4+
; RUN: FileCheck %s -input-file=%t.files_2.ll --check-prefix CHECK-MOD1
55

66
; ModuleID = 'llvm/test/tools/sycl-post-link/device-globals/test_global_variable_many_modules_no_dev_img_scope.ll'
77
source_filename = "llvm/test/tools/sycl-post-link/device-globals/test_global_variable_many_modules_no_dev_img_scope.ll"

0 commit comments

Comments
 (0)