Loop analysis tool #1

virnarula · 2022-08-12T16:52:46Z

No description provided.

fhahn

Thanks for sharing this! Could you run clang-format across all file you added?

fhahn · 2022-08-13T16:50:08Z

llvm/include/llvm/Transforms/IPO/LoopExtractionAnalysis.h

nit: stray new line

llvm/include/llvm/Transforms/IPO/LoopExtractionAnalysis.h

llvm/lib/Passes/PassBuilderPipelines.cpp

fhahn · 2022-08-15T08:00:14Z

llvm/tools/loop-analyzer/loop-analyzer.cpp

Would be good to comment and add a more informative name.

llvm/test/Transforms/LoopExtractionAnalysis/complex.ll

llvm/lib/Transforms/IPO/LoopExtractionAnalysis.cpp

llvm/lib/Passes/PassBuilderPipelines.cpp

Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value being read is the same as what would be read in the place the instruction would be hoisted to. This is being done in preparation for adding FPCR handling to the AArch64 backend, in order to prevent it to from worsening the generated code, but for targets that already have a similar register it should improve things. This patch affects code generation in several tests. The new code looks better except for in Thumb2/LowOverheadLoops/memcall.ll where we perform PRE but the LowOverheadLoops transformation then undoes it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse, but actually the function as a whole is better (as a MOV is PRE'd). Differential Revision: https://reviews.llvm.org/D136675

Differential Revision: https://reviews.llvm.org/D135948

template-template parameters. Although it effects whether a template can be used as an argument for another template, the constraint seems not to be checked, nor other major implementations (GCC, MSVC, et al.) check it. Additionally, Part-A of the document seems to have been implemented. So mark P0857R0 as completed. Differential Revision: https://reviews.llvm.org/D134128

This is copying the code that was added for 'add' with D130075. (That patch removed a fallthrough in the cases, but we can probably still share at least some code again as a follow-up cleanup, but I didn't want to risk it here.) The reasoning is similar to the carry propagation for 'add': if we don't demand low bits of the subtraction and the subtrahend (aka RHS or operand 1) is known zero in those low bits, then there can't be any borrowing required from the higher bits of operand 0, so the low bits don't matter. Also, the no-wrap flags can be propagated (and I think that should be true for add too). Here's an attempt to prove that in Alive2: https://alive2.llvm.org/ce/z/xqh7Pa (can add nsw or nuw to src and tgt, and it should still pass) Differential Revision: https://reviews.llvm.org/D136788

…ide loops. When calculating the specialization bonus for a given function argument, we recursively traverse the chain of (certain) users, accumulating the instruction costs. Then we exponentially increase the bonus to account for loop nests. This is problematic for two reasons: (a) the users might not themselves be inside the loop nest, (b) if they are we are accounting for it multiple times. Instead we should be adjusting the bonus before traversing the user chain. This reduces the instruction count for CTMark (newPM-O3) when Function Specialization is enabled without actually reducing the amount of specializations performed (geomean: -0.001% non-LTO, -0.406% LTO). Differential Revision: https://reviews.llvm.org/D136692

It splited into several zb* extensions, and `b` is dropped after 0.93, so it time to retired that as other non-ratified zb* extensions. Currntly clang can accept that with warning: $ clang -target riscv64-elf ~/hello.c -S -march=rv64gcb '+b' is not a recognized feature for this target (ignoring feature) '+b' is not a recognized feature for this target (ignoring feature) '+b' is not a recognized feature for this target (ignoring feature) Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D136812

…rolling. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D136478

https://reviews.llvm.org/D136806

Differential Revision: https://reviews.llvm.org/D136852

readelf --section-details displays ch_type/ch_size/ch_addralign for a SHF_COMPRESSED section. Port the feature. There is a small difference that readelf doesn't display `[<corrupt>]` for an empty section while we do. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D136636

Currently any errors during pipeline parsing are reported to stderr. This adds a new pipeline parsing function to the C api that reports errors through a callback, and updates the python bindings to use it. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D136402

The LSR may suggest less profitable transformation to the loop. This patch adds check to prevent LSR from generating worse code than what we already have. Since LSR affects nearly all targets, the patch is guarded by the option 'lsr-drop-solution' and default as disable for now. The next step should be extending an TTI interface to allow target(s) to enable this enhancememnt. Debug log is added to remind user of such choice to skip the LSR solution. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D126043

This adds the fgets function and its unit tests. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D136785

DXContainer files contain a part that has an MD5 of the generated shader. This adds support to the ObjectYAML tooling to expand the hash part data and hash iteself in preparation for adding hashing support to DirectX code generation. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D136632

This adds a new function for creating pass managers that takes an argument for the anchor string. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D136404

Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D136866

The -fstrict-flex-arrays=3 is the most restrictive type of flex arrays. No number, including 0, is allowed in the FAM. In the cases where a "0" is used, the resulting size is the same as if a zero-sized object were substituted. This is needed for proper _FORTIFY_SOURCE coverage in the Linux kernel, among other reasons. So while the only reason for specifying a zero-length array at the end of a structure is for specify a FAM, treating it as such will cause _FORTIFY_SOURCE not to work correctly; __builtin_object_size will report -1 instead of 0 for a destination buffer size to keep any kernel internals from using the deprecated members as fake FAMs. For example: struct broken { int foo; int fake_fam[0]; struct something oops; }; There have been bugs where the above struct was created because "oops" was added after "fake_fam" by someone not realizing. Under __FORTIFY_SOURCE, doing: memcpy(p->fake_fam, src, len); raises no warnings when __builtin_object_size(p->fake_fam, 1) returns -1 and may stomp on "oops." Omitting a warning when using the (invalid) zero-length array is how GCC treats -fstrict-flex-arrays=3. A warning in that situation is likely an irritant, because requesting this option level is explicitly requesting this behavior. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 Differential Revision: https://reviews.llvm.org/D134902

@jdoerfert

This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid. The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources. Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert) Reviewed By: jhuber6, jdoerfert Differential Revision: https://reviews.llvm.org/D134396

This prepare a subsequent revision that will generalize the insertion code generation. Similar to the support lib, insertions become much easier to perform with some "cursor" bookkeeping. Note that we, in the long run, could perhaps avoid storing the "cursor" permanently and use some retricted-scope solution (alloca?) instead. However, that puts harder restrictions on insertion-chain operations, so for now we follow the more straightforward approach. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D136800

…structions This patch adds the assembly/disassembly for the following instructions: pext (predicate) : Set predicate from predicate-as-counter ptrue (predicate-as-counter) : Initialise predicate-as-counter to all active This patch also introduces the predicate-as-counter registers pn8, etc. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Differential Revision: https://reviews.llvm.org/D136678

Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531

This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides. Fixes llvm#57872 Differential Revision: https://reviews.llvm.org/D136042

Enable fuzzing these independently. Currently still not linking in dialects beyond Builtin.

Reported by the (experimental) arc buildbot after D136042

Previously an error raised during an expansion of response files (including configuration files) was ignored and only the fact of its presence was reported to the user with generic error messages. This made it difficult to analyze problems. For example, if a configuration file tried to read an inexistent file, the error message said that 'configuration file cannot be found', which is wrong and misleading. This change enhances handling errors in the expansion so that users could get more informative error messages. Differential Revision: https://reviews.llvm.org/D136090

SLM (silvermont) doesn't support any AVX instructions

There's never been a 512-bit vdpps instruction (and the implementation is so convoluted there probably won't ever be)

There are at least 2 other potential patterns that could go here.

X * ((1 << Z) + 1) --> (X << Z) + X https://alive2.llvm.org/ce/z/P-7WK9 It's possible that we could do better with propagating no-wrap, but this carries over the existing logic and appears to be correct. The naming differences on the existing folds are a result of using getName() to set the final value via Builder. That makes it easier to transfer no-wrap rather than the gymnastics required from the raw create instruction APIs.

…equality with 0 Fixes 1st issue of llvm#58061 Fixes the crash of llvm#58675 Reviewed By: dmgreen, efriedma Differential Revision: https://reviews.llvm.org/D136244

This will allow recognizing Q.31 multiplications on vectors that are multiplies of HVX vectors. At the moment this comes at the expense of Q.15 multiplications, which now are handled as 32-bit multiplications with shifts. In the longer term this will likely be replaced by a different scheme of "legalizing" vectors, which is necessary for idiom recognition, at least where using direct HVX instrinsics is desired.

…be overridden Fortran famously allows a generic interface definition to share a scope with a procedure or derived type of the same name. When that shadowed name is accessed via host or USE association, but is also defined by an interface in the generic, then name resolution needs to fix up the representation of the shadowing so that the new interface definition is seen as the shadowed symbol -- the host or USE associated name is not material to the situation. See the new test case for particular examples. Differential Revision: https://reviews.llvm.org/D136891

It can produce some dead code, which is harmless in the end, but breaks expensive checks when unreported. This should be fixed eventually, but it's a low priority.

Replace custom code to check if only the first lane is used by generic helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few additional cases and was suggested in D133760. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136368

Some intrinsic functions can handle NULL() as an actual argument; most can't. Distinguish the two with a new ArgFlag facility in the intrinsic procedure table. Also transform some confusing Optionality codes that were standing in for flags into ArgFlags. Last, return false for a NULL() pointer from the utility IsActuallyConstant(), ensuring that it can still deal with a nested NULL() for components in structure constructors. Differential Revision: https://reviews.llvm.org/D136893

This reverts commit 17eb198. Reverted for investigation, because ClangDriverTests failed on some builders.

ObjectLinkingLayer attempts to claim responsibility for weak definitions that are present in LinkGraphs, but not present in the corresponding MaterializationResponsibility object. Where such a claim is successful, the symbol should be marked as live to prevent it from being dead stripped. (For the curious: Such "late-breaking" definitions are introduced somewhere in the materialization pipeline after the initial responsibility set is calculated. The usual source is the complier or assembler. Examples of common late-breaking definitions include personality pointers, e.g. "DW.ref.__gxx_personality_v0", and named constant pool entries, e.g. __realXX..XX.) The failure to mark these symbols live caused few problems in practice because late-breaking definitions are usually anchored by existing live definitions within the graph (e.g. DW.ref.__gxx_personality_v0 is transitively referenced by functions via eh-frame records), and so they usually survived dead-stripping anyway. This accidental persistence isn't a principled solution though, and it fails altogether if a late-breaking definition is not otherwise referenced by the graph, with the result that the now-claimed symbol is stripped triggering a "Failed to materialize symbols" error in ORC. Marking such symbols live is the correct solution. No testcase, as it's difficult to construct a situation where a late-breaking definition is inserted without being referenced outside the context of new backend bringup or plugin-specific shenanigans. See discussion in https://reviews.llvm.org/D133452 and https://reviews.llvm.org/D136877.

Add an explicit empty initializer to a new struct member definition to silence warnings from clang 16 about missing initializers.

When a multi-statement construct should end with a particular END statement like "END SELECT", and that construct's END statement is missing or unrecognizable, the error recovery productions should not misinterpret a program unit END statement that follows and consume it as a misspelled construct END statement. Doing so leads to cascading errors or a failed parse. Differential Revision: https://reviews.llvm.org/D136896

LEN_TRIM's folding is currently based on VERIFY(), and it is kind of slow for the very large CHARACTER arguments that can show up in artificial test suites. Rewrite in terms of single-character accesses. Differential Revision: https://reviews.llvm.org/D136901

Diagnose attempts to use an non-polymorphic instance of an abstract derived type. Differential Revision: https://reviews.llvm.org/D136902

@foo1

…in tests. In D136659 I found a few tests that write through readonly parameters: * Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it readonly. I removed the readonly annotation. * CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly %arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and @store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed readonly from all three. Also, I added some CHECK-LABEL directives to make it harder for FileCheck output to be mixed up. * Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll: @gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the readonly attribute. * Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes through the readonly %P1 and %P2. Also, the corresponding C code in the comment didn't match the test. I removed the readonly attribute from both parameters and corrected the C code. Differential Revision: https://reviews.llvm.org/D136880

…odRefInfoMask(). This commit adds some tests in preparation for D136659, which allows alias analysis to treat locally-invariant memory pointed to by readonly noalias pointers the same as globally-invariant memory in some cases. The existing behavior for these tests is marked as expected and will be changed when that diff lands. Differential Revision: https://reviews.llvm.org/D136993

This prepares for an upcoming change to make --print-imm-hex the default behavior of llvm-objdump. These tests were updated in a semi-automatic fashion. See D136972 for details.

Rewrite a correct use of "&" -- conjunction without short-circuiting -- from a recent patch into multiple lines so that clang doesn't warn about it.

Emit a warning when the result of folding a call to ABS() with a complex argument results in an overflow. Differential Revision: https://reviews.llvm.org/D136904

…nter components When a derived type has a procedure pointer component with no interface, we can't do a lot of checking on its call sites, but we can at least require that the same procedure pointer component be used consistently as either a function or as a subroutine, but not both. Differential Revision: https://reviews.llvm.org/D136905

up-to-date code

fhahn reviewed Aug 15, 2022

View reviewed changes

virnarula marked this pull request as draft August 16, 2022 22:57

virnarula force-pushed the loop-analysis-tool branch from 2fe0cd0 to d625d9c Compare October 19, 2022 21:55

llvmgnsyncbot and others added 27 commits November 2, 2022 12:13

[gn build] Port 1705975

fea96eb

[gn build] Port b51b90d

a9d7efa

[LoongArch] Add codegen support for cmpxchg on LA64

1efe678

Differential Revision: https://reviews.llvm.org/D135948

[lldb][test] Remove empty setUp/tearDown methods (NFC)

cfa5790

[lldb][test] Remove explicit mydir definitions (NFC)

8616e90

[LegalizeVectorOps][X86][RISCV] Expand vector S/USHLSAT instead of un…

f90a31c

…rolling. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D136478

[NFC][PhaseOrdering] Add one more test for SROA after partial unroll

b798aed

https://reviews.llvm.org/D136806

[mlir] Fix asan issue in Vectorization.cpp of Linalg.

5bc4f58

Differential Revision: https://reviews.llvm.org/D136852

[libc] add fgets

7ac81c7

This adds the fgets function and its unit tests. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D136785

[mlir][CAPI] Allow specifying pass manager anchor

5bc1fba

This adds a new function for creating pass managers that takes an argument for the anchor string. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D136404

[RISCV] Fix an obvious CSE opportunity in LSR test case. NFC

d4efd39

[mlir][sparse] fix crash when sparsifying broadcast operations.

d0b64de

Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D136866

Fix whitespace introduced by 891aaff

108bb58

RKSimon and others added 29 commits November 2, 2022 12:13

[mlir] Split parser fuzzer for bytecode & text

2f8f78b

Enable fuzzing these independently. Currently still not linking in dialects beyond Builtin.

[ARC] Regenerate ldst.ll

042eacf

Reported by the (experimental) arc buildbot after D136042

[InstCombine] add tests for mul with shl operand; NFC

10c63a3

[InstCombine] reduce code duplication in visitMul(); NFC

2fd5ab9

[X86] Remove 256-bit scheduler classes

064f1e1

SLM (silvermont) doesn't support any AVX instructions

[X86] Remove the WriteDPPSZ schedule pair

b1d1186

There's never been a 512-bit vdpps instruction (and the implementation is so convoluted there probably won't ever be)

[InstCombine] create helper function for mul patterns with 1<<X; NFC

f9d1754

There are at least 2 other potential patterns that could go here.

[Hexagon] Fix vector concatenation

d7f2e5f

Recommit [AArch64] Optimize memcmp when the result is tested for [in]…

e55ff44

…equality with 0 Fixes 1st issue of llvm#58061 Fixes the crash of llvm#58675 Reviewed By: dmgreen, efriedma Differential Revision: https://reviews.llvm.org/D136244

[Hexagon] Report changes in HvxIdioms pass

69e82d6

It can produce some dead code, which is harmless in the end, but breaks expensive checks when unreported. This should be fixed eventually, but it's a low priority.

Revert "Handle errors in expansion of response files"

5a6478a

This reverts commit 17eb198. Reverted for investigation, because ClangDriverTests failed on some builders.

[flang] Fix warning from clang 16 on recent patch

18c699b

Add an explicit empty initializer to a new struct member definition to silence warnings from clang 16 about missing initializers.

[flang] Enforce constraint C911

46e6474

Diagnose attempts to use an non-polymorphic instance of an abstract derived type. Differential Revision: https://reviews.llvm.org/D136902

[llvm-objdump] Add --no-print-imm-hex to tests depending on it.

b3395e5

This prepares for an upcoming change to make --print-imm-hex the default behavior of llvm-objdump. These tests were updated in a semi-automatic fashion. See D136972 for details.

[flang] Recode a line to dodge a clang warning

d897236

Rewrite a correct use of "&" -- conjunction without short-circuiting -- from a recent patch into multiple lines so that clang doesn't warn about it.

[flang] Warn about overflow from folding complex ABS()

465d949

Emit a warning when the result of folding a call to ABS() with a complex argument results in an overflow. Differential Revision: https://reviews.llvm.org/D136904

virnarula pushed a commit that referenced this pull request Apr 24, 2024

Merge pull request #1 from virnarula/gpu_optimizations

79dd60e

up-to-date code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loop analysis tool #1

Loop analysis tool #1

Uh oh!

virnarula commented Aug 12, 2022

Uh oh!

fhahn left a comment

Uh oh!

fhahn Aug 13, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn Aug 15, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

110 participants

Loop analysis tool #1

Are you sure you want to change the base?

Loop analysis tool #1

Uh oh!

Conversation

virnarula commented Aug 12, 2022

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn Aug 13, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn Aug 15, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

110 participants