Skip to content

Conversation

@virnarula
Copy link
Owner

No description provided.

Copy link

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this! Could you run clang-format across all file you added?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: stray new line

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to comment and add a more informative name.

@virnarula virnarula marked this pull request as draft August 16, 2022 22:57
llvmgnsyncbot and others added 27 commits November 2, 2022 12:13
Currently MachineCSE forbids PRE when the instruction reads a physical
register. Relax this so that it's allowed when the value being read is
the same as what would be read in the place the instruction would be
hoisted to.

This is being done in preparation for adding FPCR handling to the
AArch64 backend, in order to prevent it to from worsening the
generated code, but for targets that already have a similar register
it should improve things.

This patch affects code generation in several tests. The new code
looks better except for in Thumb2/LowOverheadLoops/memcall.ll where
we perform PRE but the LowOverheadLoops transformation then undoes
it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse,
but actually the function as a whole is better (as a MOV is PRE'd).

Differential Revision: https://reviews.llvm.org/D136675
template-template parameters. Although it effects whether a template can be
used as an argument for another template, the constraint seems not to
be checked, nor other major implementations (GCC, MSVC, et al.) check it.

Additionally, Part-A of the document seems to have been implemented.
So mark P0857R0 as completed.

Differential Revision: https://reviews.llvm.org/D134128
This is copying the code that was added for 'add' with D130075.
(That patch removed a fallthrough in the cases, but we can
probably still share at least some code again as a follow-up
cleanup, but I didn't want to risk it here.)

The reasoning is similar to the carry propagation for 'add':
if we don't demand low bits of the subtraction and the
subtrahend (aka RHS or operand 1) is known zero in those low
bits, then there can't be any borrowing required from the
higher bits of operand 0, so the low bits don't matter.

Also, the no-wrap flags can be propagated (and I think that
should be true for add too).

Here's an attempt to prove that in Alive2:
https://alive2.llvm.org/ce/z/xqh7Pa
(can add nsw or nuw to src and tgt, and it should still pass)

Differential Revision: https://reviews.llvm.org/D136788
…ide loops.

When calculating the specialization bonus for a given function argument,
we recursively traverse the chain of (certain) users, accumulating the
instruction costs. Then we exponentially increase the bonus to account
for loop nests. This is problematic for two reasons: (a) the users might
not themselves be inside the loop nest, (b) if they are we are accounting
for it multiple times. Instead we should be adjusting the bonus before
traversing the user chain.

This reduces the instruction count for CTMark (newPM-O3) when Function
Specialization is enabled without actually reducing the amount of
specializations performed (geomean: -0.001% non-LTO, -0.406% LTO).

Differential Revision: https://reviews.llvm.org/D136692
It splited into several zb* extensions, and `b` is dropped after
0.93, so it time to retired that as other non-ratified zb* extensions.

Currntly clang can accept that with warning:

$ clang -target riscv64-elf ~/hello.c -S  -march=rv64gcb
'+b' is not a recognized feature for this target (ignoring feature)
'+b' is not a recognized feature for this target (ignoring feature)
'+b' is not a recognized feature for this target (ignoring feature)

Reviewed By: asb, luismarques

Differential Revision: https://reviews.llvm.org/D136812
readelf --section-details displays ch_type/ch_size/ch_addralign for
a SHF_COMPRESSED section. Port the feature. There is a small difference
that readelf doesn't display `[<corrupt>]` for an empty section while
we do.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D136636
Currently any errors during pipeline parsing are reported to stderr.
This adds a new pipeline parsing function to the C api that reports
errors through a callback, and updates the python bindings to use it.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D136402
The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.

Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.

The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.

Debug log is added to remind user of such choice to skip the LSR
solution.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D126043
This adds the fgets function and its unit tests.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D136785
DXContainer files contain a part that has an MD5 of the generated
shader. This adds support to the ObjectYAML tooling to expand the hash
part data and hash iteself in preparation for adding hashing support to
DirectX code generation.

Reviewed By: python3kgae

Differential Revision: https://reviews.llvm.org/D136632
This adds a new function for creating pass managers that takes an
argument for the anchor string.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D136404
The -fstrict-flex-arrays=3 is the most restrictive type of flex arrays.
No number, including 0, is allowed in the FAM. In the cases where a "0"
is used, the resulting size is the same as if a zero-sized object were
substituted.

This is needed for proper _FORTIFY_SOURCE coverage in the Linux kernel,
among other reasons. So while the only reason for specifying a
zero-length array at the end of a structure is for specify a FAM,
treating it as such will cause _FORTIFY_SOURCE not to work correctly;
__builtin_object_size will report -1 instead of 0 for a destination
buffer size to keep any kernel internals from using the deprecated
members as fake FAMs.

For example:

  struct broken {
      int foo;
      int fake_fam[0];
      struct something oops;
  };

There have been bugs where the above struct was created because "oops"
was added after "fake_fam" by someone not realizing. Under
__FORTIFY_SOURCE, doing:

  memcpy(p->fake_fam, src, len);

raises no warnings when __builtin_object_size(p->fake_fam, 1) returns -1
and may stomp on "oops."

Omitting a warning when using the (invalid) zero-length array is how GCC
treats -fstrict-flex-arrays=3. A warning in that situation is likely an
irritant, because requesting this option level is explicitly requesting
this behavior.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

Differential Revision: https://reviews.llvm.org/D134902
This patch adds a new infrastructure for OpenMP target plugins. It also implements the CUDA and GenericELF64bit plugins under this new infrastructure. We place the sources in a separate directory named plugins-nextgen, and we build the new plugins as different plugin libraries. The original plugins, which remain untouched, will be used by default. However, the user can change this behavior at run-time through the boolean envar LIBOMPTARGET_NEXTGEN_PLUGINS. If enabled, the libomptarget will try to load the NextGen version of each plugin, falling back to the original if they are not present or valid.

The idea of this new plugin infrastructure is to implement the common parts of target plugins in generic classes (defined in files inside plugins-next/common/PluginInterface folder), and then, each specific plugin defines its own specific classes inheriting from the common ones. In this way, most logic remains on the common interface while reducing the plugin-specific source code. It is also beneficial in the sense that now most code and behavior are the same across the different plugins. As an example, we define classes for a plugin, a device, a device image, a stream manager, etc. The plugin object (a single instance per plugin library) holds different device objects (i.e., one per available device), while these latter are the responsible for managing its own resources.

Most code on this patch is based on the changes made by @jdoerfert (Johannes Doerfert)

Reviewed By: jhuber6, jdoerfert

Differential Revision: https://reviews.llvm.org/D134396
This prepare a subsequent revision that will generalize
the insertion code generation. Similar to the support lib,
insertions become much easier to perform with some "cursor"
bookkeeping. Note that we, in the long run, could perhaps
avoid storing the "cursor" permanently and use some
retricted-scope solution (alloca?) instead. However,
that puts harder restrictions on insertion-chain operations,
so for now we follow the more straightforward approach.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D136800
…structions

This patch adds the assembly/disassembly for the following instructions:

pext (predicate) : Set predicate from predicate-as-counter
ptrue (predicate-as-counter) : Initialise predicate-as-counter to all active

This patch also introduces the predicate-as-counter registers pn8, etc.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09

Differential Revision: https://reviews.llvm.org/D136678
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
RKSimon and others added 29 commits November 2, 2022 12:13
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.

Fixes llvm#57872

Differential Revision: https://reviews.llvm.org/D136042
Enable fuzzing these independently. Currently still not linking in
dialects beyond Builtin.
Reported by the (experimental) arc buildbot after D136042
Previously an error raised during an expansion of response files (including
configuration files) was ignored and only the fact of its presence was
reported to the user with generic error messages. This made it difficult to
analyze problems. For example, if a configuration file tried to read an
inexistent file, the error message said that 'configuration file cannot
be found', which is wrong and misleading.

This change enhances handling errors in the expansion so that users
could get more informative error messages.

Differential Revision: https://reviews.llvm.org/D136090
SLM (silvermont) doesn't support any AVX instructions
There's never been a 512-bit vdpps instruction (and the implementation is so convoluted there probably won't ever be)
There are at least 2 other potential patterns that could go here.
X * ((1 << Z) + 1) --> (X << Z) + X

https://alive2.llvm.org/ce/z/P-7WK9

It's possible that we could do better with propagating
no-wrap, but this carries over the existing logic and
appears to be correct.

The naming differences on the existing folds are a result
of using getName() to set the final value via Builder.
That makes it easier to transfer no-wrap rather than the
gymnastics required from the raw create instruction APIs.
…equality with 0

Fixes 1st issue of llvm#58061
Fixes the crash of llvm#58675

Reviewed By: dmgreen, efriedma
Differential Revision: https://reviews.llvm.org/D136244
This will allow recognizing Q.31 multiplications on vectors that are
multiplies of HVX vectors. At the moment this comes at the expense of
Q.15 multiplications, which now are handled as 32-bit multiplications
with shifts.
In the longer term this will likely be replaced by a different scheme
of "legalizing" vectors, which is necessary for idiom recognition, at
least where using direct HVX instrinsics is desired.
…be overridden

Fortran famously allows a generic interface definition to share a
scope with a procedure or derived type of the same name.  When that
shadowed name is accessed via host or USE association, but is also
defined by an interface in the generic, then name resolution needs
to fix up the representation of the shadowing so that the new interface
definition is seen as the shadowed symbol -- the host or USE associated
name is not material to the situation.  See the new test case for
particular examples.

Differential Revision: https://reviews.llvm.org/D136891
It can produce some dead code, which is harmless in the end, but breaks
expensive checks when unreported. This should be fixed eventually, but
it's a low priority.
Replace custom code to check if only the first lane is used by generic
helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few
additional cases and was suggested in D133760.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136368
Some intrinsic functions can handle NULL() as an actual argument; most
can't.  Distinguish the two with a new ArgFlag facility in the intrinsic
procedure table.  Also transform some confusing Optionality codes that
were standing in for flags into ArgFlags.  Last, return false for a NULL()
pointer from the utility IsActuallyConstant(), ensuring that it can
still deal with a nested NULL() for components in structure constructors.

Differential Revision: https://reviews.llvm.org/D136893
This reverts commit 17eb198.
Reverted for investigation, because ClangDriverTests failed on some builders.
ObjectLinkingLayer attempts to claim responsibility for weak definitions that
are present in LinkGraphs, but not present in the corresponding
MaterializationResponsibility object. Where such a claim is successful, the
symbol should be marked as live to prevent it from being dead stripped.

(For the curious: Such "late-breaking" definitions are introduced somewhere in
the materialization pipeline after the initial responsibility set is calculated.
The usual source is the complier or assembler. Examples of common late-breaking
definitions include personality pointers, e.g. "DW.ref.__gxx_personality_v0",
and named constant pool entries, e.g. __realXX..XX.)

The failure to mark these symbols live caused few problems in practice because
late-breaking definitions are usually anchored by existing live definitions
within the graph (e.g. DW.ref.__gxx_personality_v0 is transitively referenced by
functions via eh-frame records), and so they usually survived dead-stripping
anyway. This accidental persistence isn't a principled solution though, and it
fails altogether if a late-breaking definition is not otherwise referenced by
the graph, with the result that the now-claimed symbol is stripped triggering a
"Failed to materialize symbols" error in ORC. Marking such symbols live is the
correct solution.

No testcase, as it's difficult to construct a situation where a late-breaking
definition is inserted without being referenced outside the context of new
backend bringup or plugin-specific shenanigans.

See discussion in https://reviews.llvm.org/D133452 and
https://reviews.llvm.org/D136877.
Add an explicit empty initializer to a new struct member definition
to silence warnings from clang 16 about missing initializers.
When a multi-statement construct should end with a particular END statement
like "END SELECT", and that construct's END statement is missing or
unrecognizable, the error recovery productions should not misinterpret
a program unit END statement that follows and consume it as a misspelled
construct END statement. Doing so leads to cascading errors or a failed parse.

Differential Revision: https://reviews.llvm.org/D136896
LEN_TRIM's folding is currently based on VERIFY(), and it is kind of
slow for the very large CHARACTER arguments that can show up in artificial
test suites.  Rewrite in terms of single-character accesses.

Differential Revision: https://reviews.llvm.org/D136901
Diagnose attempts to use an non-polymorphic instance of an
abstract derived type.

Differential Revision: https://reviews.llvm.org/D136902
…in tests.

In D136659 I found a few tests that write through readonly parameters:

* Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it
readonly. I removed the readonly annotation.

* CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly
%arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and
@store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed
readonly from all three. Also, I added some CHECK-LABEL directives to make it
harder for FileCheck output to be mixed up.

* Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll:
@gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the
readonly attribute.

* Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes
through the readonly %P1 and %P2. Also, the corresponding C code in the comment
didn't match the test. I removed the readonly attribute from both parameters
and corrected the C code.

Differential Revision: https://reviews.llvm.org/D136880
…odRefInfoMask().

This commit adds some tests in preparation for D136659, which allows alias
analysis to treat locally-invariant memory pointed to by readonly noalias
pointers the same as globally-invariant memory in some cases. The existing
behavior for these tests is marked as expected and will be changed when that
diff lands.

Differential Revision: https://reviews.llvm.org/D136993
This prepares for an upcoming change to make --print-imm-hex the default
behavior of llvm-objdump. These tests were updated in a semi-automatic
fashion.

See D136972 for details.
Rewrite a correct use of "&" -- conjunction without short-circuiting --
from a recent patch into multiple lines so that clang doesn't warn
about it.
Emit a warning when the result of folding a call to
ABS() with a complex argument results in an overflow.

Differential Revision: https://reviews.llvm.org/D136904
…nter components

When a derived type has a procedure pointer component with no interface,
we can't do a lot of checking on its call sites, but we can at least require
that the same procedure pointer component be used consistently as either
a function or as a subroutine, but not both.

Differential Revision: https://reviews.llvm.org/D136905
virnarula pushed a commit that referenced this pull request Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.