-
Notifications
You must be signed in to change notification settings - Fork 0
Loop analysis tool #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing this! Could you run clang-format across all file you added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: stray new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to comment and add a more informative name.
2fe0cd0 to
d625d9c
Compare
Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value being read is the same as what would be read in the place the instruction would be hoisted to. This is being done in preparation for adding FPCR handling to the AArch64 backend, in order to prevent it to from worsening the generated code, but for targets that already have a similar register it should improve things. This patch affects code generation in several tests. The new code looks better except for in Thumb2/LowOverheadLoops/memcall.ll where we perform PRE but the LowOverheadLoops transformation then undoes it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse, but actually the function as a whole is better (as a MOV is PRE'd). Differential Revision: https://reviews.llvm.org/D136675
Differential Revision: https://reviews.llvm.org/D135948
template-template parameters. Although it effects whether a template can be used as an argument for another template, the constraint seems not to be checked, nor other major implementations (GCC, MSVC, et al.) check it. Additionally, Part-A of the document seems to have been implemented. So mark P0857R0 as completed. Differential Revision: https://reviews.llvm.org/D134128
This is copying the code that was added for 'add' with D130075. (That patch removed a fallthrough in the cases, but we can probably still share at least some code again as a follow-up cleanup, but I didn't want to risk it here.) The reasoning is similar to the carry propagation for 'add': if we don't demand low bits of the subtraction and the subtrahend (aka RHS or operand 1) is known zero in those low bits, then there can't be any borrowing required from the higher bits of operand 0, so the low bits don't matter. Also, the no-wrap flags can be propagated (and I think that should be true for add too). Here's an attempt to prove that in Alive2: https://alive2.llvm.org/ce/z/xqh7Pa (can add nsw or nuw to src and tgt, and it should still pass) Differential Revision: https://reviews.llvm.org/D136788
…ide loops. When calculating the specialization bonus for a given function argument, we recursively traverse the chain of (certain) users, accumulating the instruction costs. Then we exponentially increase the bonus to account for loop nests. This is problematic for two reasons: (a) the users might not themselves be inside the loop nest, (b) if they are we are accounting for it multiple times. Instead we should be adjusting the bonus before traversing the user chain. This reduces the instruction count for CTMark (newPM-O3) when Function Specialization is enabled without actually reducing the amount of specializations performed (geomean: -0.001% non-LTO, -0.406% LTO). Differential Revision: https://reviews.llvm.org/D136692
It splited into several zb* extensions, and `b` is dropped after 0.93, so it time to retired that as other non-ratified zb* extensions. Currntly clang can accept that with warning: $ clang -target riscv64-elf ~/hello.c -S -march=rv64gcb '+b' is not a recognized feature for this target (ignoring feature) '+b' is not a recognized feature for this target (ignoring feature) '+b' is not a recognized feature for this target (ignoring feature) Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D136812
…rolling. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D136478
Differential Revision: https://reviews.llvm.org/D136852
readelf --section-details displays ch_type/ch_size/ch_addralign for a SHF_COMPRESSED section. Port the feature. There is a small difference that readelf doesn't display `[<corrupt>]` for an empty section while we do. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D136636
Currently any errors during pipeline parsing are reported to stderr. This adds a new pipeline parsing function to the C api that reports errors through a callback, and updates the python bindings to use it. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D136402
The LSR may suggest less profitable transformation to the loop. This patch adds check to prevent LSR from generating worse code than what we already have. Since LSR affects nearly all targets, the patch is guarded by the option 'lsr-drop-solution' and default as disable for now. The next step should be extending an TTI interface to allow target(s) to enable this enhancememnt. Debug log is added to remind user of such choice to skip the LSR solution. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D126043
This adds the fgets function and its unit tests. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D136785
DXContainer files contain a part that has an MD5 of the generated shader. This adds support to the ObjectYAML tooling to expand the hash part data and hash iteself in preparation for adding hashing support to DirectX code generation. Reviewed By: python3kgae Differential Revision: https://reviews.llvm.org/D136632
This adds a new function for creating pass managers that takes an argument for the anchor string. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D136404
Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D136866
The -fstrict-flex-arrays=3 is the most restrictive type of flex arrays.
No number, including 0, is allowed in the FAM. In the cases where a "0"
is used, the resulting size is the same as if a zero-sized object were
substituted.
This is needed for proper _FORTIFY_SOURCE coverage in the Linux kernel,
among other reasons. So while the only reason for specifying a
zero-length array at the end of a structure is for specify a FAM,
treating it as such will cause _FORTIFY_SOURCE not to work correctly;
__builtin_object_size will report -1 instead of 0 for a destination
buffer size to keep any kernel internals from using the deprecated
members as fake FAMs.
For example:
struct broken {
int foo;
int fake_fam[0];
struct something oops;
};
There have been bugs where the above struct was created because "oops"
was added after "fake_fam" by someone not realizing. Under
__FORTIFY_SOURCE, doing:
memcpy(p->fake_fam, src, len);
raises no warnings when __builtin_object_size(p->fake_fam, 1) returns -1
and may stomp on "oops."
Omitting a warning when using the (invalid) zero-length array is how GCC
treats -fstrict-flex-arrays=3. A warning in that situation is likely an
irritant, because requesting this option level is explicitly requesting
this behavior.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
Differential Revision: https://reviews.llvm.org/D134902
This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid. The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources. Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert) Reviewed By: jhuber6, jdoerfert Differential Revision: https://reviews.llvm.org/D134396
This prepare a subsequent revision that will generalize the insertion code generation. Similar to the support lib, insertions become much easier to perform with some "cursor" bookkeeping. Note that we, in the long run, could perhaps avoid storing the "cursor" permanently and use some retricted-scope solution (alloca?) instead. However, that puts harder restrictions on insertion-chain operations, so for now we follow the more straightforward approach. Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D136800
…structions This patch adds the assembly/disassembly for the following instructions: pext (predicate) : Set predicate from predicate-as-counter ptrue (predicate-as-counter) : Initialise predicate-as-counter to all active This patch also introduces the predicate-as-counter registers pn8, etc. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2022-09 Differential Revision: https://reviews.llvm.org/D136678
Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides. Fixes llvm#57872 Differential Revision: https://reviews.llvm.org/D136042
Enable fuzzing these independently. Currently still not linking in dialects beyond Builtin.
Reported by the (experimental) arc buildbot after D136042
Previously an error raised during an expansion of response files (including configuration files) was ignored and only the fact of its presence was reported to the user with generic error messages. This made it difficult to analyze problems. For example, if a configuration file tried to read an inexistent file, the error message said that 'configuration file cannot be found', which is wrong and misleading. This change enhances handling errors in the expansion so that users could get more informative error messages. Differential Revision: https://reviews.llvm.org/D136090
SLM (silvermont) doesn't support any AVX instructions
There's never been a 512-bit vdpps instruction (and the implementation is so convoluted there probably won't ever be)
There are at least 2 other potential patterns that could go here.
X * ((1 << Z) + 1) --> (X << Z) + X https://alive2.llvm.org/ce/z/P-7WK9 It's possible that we could do better with propagating no-wrap, but this carries over the existing logic and appears to be correct. The naming differences on the existing folds are a result of using getName() to set the final value via Builder. That makes it easier to transfer no-wrap rather than the gymnastics required from the raw create instruction APIs.
…equality with 0 Fixes 1st issue of llvm#58061 Fixes the crash of llvm#58675 Reviewed By: dmgreen, efriedma Differential Revision: https://reviews.llvm.org/D136244
This will allow recognizing Q.31 multiplications on vectors that are multiplies of HVX vectors. At the moment this comes at the expense of Q.15 multiplications, which now are handled as 32-bit multiplications with shifts. In the longer term this will likely be replaced by a different scheme of "legalizing" vectors, which is necessary for idiom recognition, at least where using direct HVX instrinsics is desired.
…be overridden Fortran famously allows a generic interface definition to share a scope with a procedure or derived type of the same name. When that shadowed name is accessed via host or USE association, but is also defined by an interface in the generic, then name resolution needs to fix up the representation of the shadowing so that the new interface definition is seen as the shadowed symbol -- the host or USE associated name is not material to the situation. See the new test case for particular examples. Differential Revision: https://reviews.llvm.org/D136891
It can produce some dead code, which is harmless in the end, but breaks expensive checks when unreported. This should be fixed eventually, but it's a low priority.
Replace custom code to check if only the first lane is used by generic helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few additional cases and was suggested in D133760. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136368
Some intrinsic functions can handle NULL() as an actual argument; most can't. Distinguish the two with a new ArgFlag facility in the intrinsic procedure table. Also transform some confusing Optionality codes that were standing in for flags into ArgFlags. Last, return false for a NULL() pointer from the utility IsActuallyConstant(), ensuring that it can still deal with a nested NULL() for components in structure constructors. Differential Revision: https://reviews.llvm.org/D136893
This reverts commit 17eb198. Reverted for investigation, because ClangDriverTests failed on some builders.
ObjectLinkingLayer attempts to claim responsibility for weak definitions that are present in LinkGraphs, but not present in the corresponding MaterializationResponsibility object. Where such a claim is successful, the symbol should be marked as live to prevent it from being dead stripped. (For the curious: Such "late-breaking" definitions are introduced somewhere in the materialization pipeline after the initial responsibility set is calculated. The usual source is the complier or assembler. Examples of common late-breaking definitions include personality pointers, e.g. "DW.ref.__gxx_personality_v0", and named constant pool entries, e.g. __realXX..XX.) The failure to mark these symbols live caused few problems in practice because late-breaking definitions are usually anchored by existing live definitions within the graph (e.g. DW.ref.__gxx_personality_v0 is transitively referenced by functions via eh-frame records), and so they usually survived dead-stripping anyway. This accidental persistence isn't a principled solution though, and it fails altogether if a late-breaking definition is not otherwise referenced by the graph, with the result that the now-claimed symbol is stripped triggering a "Failed to materialize symbols" error in ORC. Marking such symbols live is the correct solution. No testcase, as it's difficult to construct a situation where a late-breaking definition is inserted without being referenced outside the context of new backend bringup or plugin-specific shenanigans. See discussion in https://reviews.llvm.org/D133452 and https://reviews.llvm.org/D136877.
Add an explicit empty initializer to a new struct member definition to silence warnings from clang 16 about missing initializers.
When a multi-statement construct should end with a particular END statement like "END SELECT", and that construct's END statement is missing or unrecognizable, the error recovery productions should not misinterpret a program unit END statement that follows and consume it as a misspelled construct END statement. Doing so leads to cascading errors or a failed parse. Differential Revision: https://reviews.llvm.org/D136896
LEN_TRIM's folding is currently based on VERIFY(), and it is kind of slow for the very large CHARACTER arguments that can show up in artificial test suites. Rewrite in terms of single-character accesses. Differential Revision: https://reviews.llvm.org/D136901
Diagnose attempts to use an non-polymorphic instance of an abstract derived type. Differential Revision: https://reviews.llvm.org/D136902
…in tests. In D136659 I found a few tests that write through readonly parameters: * Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it readonly. I removed the readonly annotation. * CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly %arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and @store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed readonly from all three. Also, I added some CHECK-LABEL directives to make it harder for FileCheck output to be mixed up. * Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll: @gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the readonly attribute. * Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes through the readonly %P1 and %P2. Also, the corresponding C code in the comment didn't match the test. I removed the readonly attribute from both parameters and corrected the C code. Differential Revision: https://reviews.llvm.org/D136880
…odRefInfoMask(). This commit adds some tests in preparation for D136659, which allows alias analysis to treat locally-invariant memory pointed to by readonly noalias pointers the same as globally-invariant memory in some cases. The existing behavior for these tests is marked as expected and will be changed when that diff lands. Differential Revision: https://reviews.llvm.org/D136993
This prepares for an upcoming change to make --print-imm-hex the default behavior of llvm-objdump. These tests were updated in a semi-automatic fashion. See D136972 for details.
Rewrite a correct use of "&" -- conjunction without short-circuiting -- from a recent patch into multiple lines so that clang doesn't warn about it.
Emit a warning when the result of folding a call to ABS() with a complex argument results in an overflow. Differential Revision: https://reviews.llvm.org/D136904
…nter components When a derived type has a procedure pointer component with no interface, we can't do a lot of checking on its call sites, but we can at least require that the same procedure pointer component be used consistently as either a function or as a subroutine, but not both. Differential Revision: https://reviews.llvm.org/D136905
No description provided.