Skip to content

Conversation

@abhinavgaba
Copy link
Owner

@abhinavgaba abhinavgaba commented Jul 16, 2025

This is the initial clang change to support using ATTACH map-type for pointer-attachment.

This builds upon the following:

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.


With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                // attach-ptr: spp
  map(spp),                   // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers.

Cases that need handling:

  • When a class member like p is a base-pointer in a map from a member function within the same class, p is not being privatized, instead, we still try to create an implicit map of this[0:1], and access p through that, which is incorrect.
 struct S { int *p;
 void f1() {
   #pragma omp target data map(p[0:1])
      printf("%p %p\n", &p, p);
 }
  • Attach-style maps for declare mappers. That should be a separate PR.
  • use_device_addr clause does not work properly, because we don't have a proper component-list set-up for it, just one component, so we cannot find the proper attach-ptr. For use_device_addr, we should match existing maps whose attach-ptr matches the attach-ptr of the use_device_addr operand.
  • use_device_ptr handling has some issues too. Need debugging.
  • Other issues that haven't been found yet.

Some tests still haven't been updated. These include:

  Clang :: OpenMP/copy-gaps-1.cpp
  Clang :: OpenMP/copy-gaps-6.cpp
  Clang :: OpenMP/map_struct_ordering.cpp
  Clang :: OpenMP/target_data_use_device_addr_codegen.cpp
  Clang :: OpenMP/target_data_use_device_ptr_codegen.cpp
  Clang :: OpenMP/target_enter_data_codegen.cpp
  Clang :: OpenMP/target_enter_data_depend_codegen.cpp
  Clang :: OpenMP/target_exit_data_codegen.cpp
  Clang :: OpenMP/target_exit_data_depend_codegen.cpp
  Clang :: OpenMP/target_map_codegen_18c.cpp
  Clang :: OpenMP/target_map_codegen_18d.cpp
  Clang :: OpenMP/target_map_codegen_28.cpp
  Clang :: OpenMP/target_map_codegen_29.cpp
  Clang :: OpenMP/target_map_codegen_31.cpp
  Clang :: OpenMP/target_map_codegen_hold.cpp
  Clang :: OpenMP/target_map_deref_array_codegen.cpp
  Clang :: OpenMP/target_map_member_expr_codegen.cpp
  Clang :: OpenMP/target_update_codegen.cpp
  Clang :: OpenMP/target_update_depend_codegen.cpp

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libomptarget code will disappear from this PR once llvm#149036 is merged.

const ValueDecl *BaseDecl = nullptr, const Expr *MapExpr = nullptr,
ArrayRef<OMPClauseMappableExprCommon::MappableExprComponentListRef>
OverlappedElements = {},
bool AreBothBasePtrAndPteeMapped = false) const {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AreBothBaseptrAndPteeMapped was used to decide to use PTR_AND_OBJ maps for something like map(p, p[0]). We don't do that now, since we map them independently, and attach them separately.

IgWod-IMG and others added 3 commits December 1, 2025 14:15
preames and others added 30 commits December 2, 2025 12:16
Reducing spurious diff in an upcoming change.
This moves a call inside an assert to avoid a warning about the result
variable being unused in release builds.
This reverts commit e719e93.

revert this since it caused regression in our internal CI.

Deduction guide with host/device attrs have already been
used in

https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/library/src/rng/utils/cpp_utils.hpp#L249

```
template<class V>
__host__ __device__ vec_wrapper(V) -> vec_wrapper<V>;
```
…0358)

These two are both incredibly similar and simple, basically identical to
'seq'. This patch adds them both together.
Adding the following dependencies to PluginScriptedProcess:
-         "//lldb:CoreHeaders",
-         "//lldb:SymbolHeaders",
-         "//llvm:Support",

For c50802c
This upstreams the handler for the BI__builtin_constant_p function.
Commit b262785 introduced a separate `AnalysisFpExc` target to try to
workaround the lack of a bazel equivalent of single source file
properties. However, this introduces backref errors when
`--warn-backrefs` is enabled.

This change alternatively just adds the `-ftrapping-math` copt to the
entire `Analysis` target.

Fix suggested by @rocallahan.
…del (llvm#168270)

The VPlan-based cost model assigns the forced cost once for a whole
VPInterleaveRecipe. Update the legacy cost model to match this behavior.
This fixes a cost-model divergence, and assigns the cost in a way that
matches the generated code more accurately.

PR: llvm#168270
This clause is pretty small/trivial and is a simple 'set a bool' value
on the IR node, so its implementation is quite simple. We create the
Operation with this as 'false', so the 'nohost' marks it as true always.
Remove a redundant duplicated computeCost call. NFC, just skipping an
unneeded call.
Shared memory for TMA operation needs to be align to 16. Add ability to
set an alignment on the cuf.shared_memory operation.
…lvm#170350)

Updates `InitializeRequestArguments` to correctly follow the spec, see
https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Initialize.

This should correct which fields are tracked as optional and simplifies
some of the types to make sure they're meaningful (e.g. an
`optional<bool>` isn't anymore helpful than a `bool` since undefined and
false are basically equivalent and it requires us to handle interpreting undefined as the default value in all the places we use the `optional<bool>`).
This change fixes couple of issues with static resources:
- Enables assignment to static resource or resource array variables (fixes llvm#166458)
- Initializes static resources and resource arrays with default constructor that sets the handle to poison
llvm#170265)

* Add compatibility support for DP and REPORT macros 
* Define a set of predefined Debug Type for libomptarget
* Start to update libomptarget files (OffloadRTL.cpp, device.cpp)
…lure (llvm#169918)

Use standard GlobalISel error reporting with reportGISelFailure
and pass returning false instead of llvm_unreachable.
Also enables -global-isel-abort=0 or 2 for -global-isel -new-reg-bank-select.
Note: new-reg-bank-select with abort 0 or 2 runs LCSSA,
while "intended use" without abort or with abort 1 does not run LCSSA.
…FC (llvm#132364)

Adding some new test cases (including FIXME:s) to highlight some bugs
related to lowering of llvm.objectsize.

One special case is when there are getelementptr instruction with index
types that are larger than the index type size for the pointer being
analysed. This will add a couple of tests to show what happens both when
using a smaller and larger index type, and when having out-of-bounds
indices (both too large and negative).
…8281)

This PR is a follow up to llvm#167975 and replaces calls to trivial copy
constructors with `cir::CopyOp`.

---------

Co-authored-by: Andy Kaylor <[email protected]>
Co-authored-by: Henrich Lauko <[email protected]>
Reverts llvm#154069. I pointed out a number of issues
post-merge, most importantly examples of miscompiles:
llvm#154069 (comment).

While the motivation of the change is clear, I think the implementation
approach is flawed. It seems like the goal is to allow elements like
`load <2xi16>` and `load i32` to be vectorized together despite the
current algorithm not grouping them into the same equivalence classes. I
personally think that if we want to attempt this it should be a more
wholistic approach, maybe even redefining the concept of an equivalence
class. This current solution seems like it would be really hard to do
bug-free, and even if the bugs were not present, it is only able to
merge chains that happen to be adjacent to each other after
`splitChainByContiguity`, which seems like it is leaving things up to
chance whether this optimization kicks in. But we can discuss more in
the re-land. Maybe the broader approach I'm proposing is too difficult,
and a narrow optimization is worthwhile. Regardless, this should be
reverted, it needs more iteration before it is correct.
This PR is part of llvm#167752. It upstreams the codegen and tests for the
shuffle builtins implemented in the incubator, including:
- `vinsert` + `insert`
- `pblend` + `blend`
- `vpermilp`
- `pshuf` + `shufp`
- `palignr`

It does NOT upstream the `perm`, `vperm2`, `vpshuf`, `shuf_i` / `shuf_f`
and `align` builtins, which are not yet implemented in the incubator.

This _is_ a large commit, but most of it is tests.

The `pshufd` / `vpermilp` builtins seem to have no test coverage in the
incubator, what should I do?
We were not marking the `.cfi.jumptable`​ functions as `naked`​ on windows. The referenced bug (https://llvm.org/bugs/show_bug.cgi?id=28641#c3) appears to be fixed:

```bash
build/bin/opt -S -passes=lowertypetests -mtriple=i686-pc-win32 llvm/test/Transforms/LowerTypeTests/function.ll | build/bin/llc -O0
```

```
L_.cfi.jumptable:                       # @.cfi.jumptable
# %bb.0:                                # %entry
        #APP
        jmp     _f.cfi@PLT
        int3
        int3
        int3

        #NO_APP
        #APP
        jmp     _g.cfi@PLT
        int3
        int3
        int3

        #NO_APP
                                        # -- End function
        .section        .rdata,"dr"
        .p2align        4, 0x0                          # @0

```

Not seeing the spilled registers described in the bug anymore.
…accept `const T *` arguments when the key is `T *` (llvm#170377)

Also use `is_contained` to implement `contains`, since this tries the
`contains` member function of the set type first.
…68915)

fixes llvm#168737
fixes llvm#168755

This change fixes adds support for Matrix truncations via the
ICK_HLSL_Matrix_Truncation enum. That ends up being most of the files
changed.

It also allows Matrix as an HLSL Elementwise cast as long as the cast
does not perform a shape transformation ie 3x2 to 2x3.

Tests for the new elementwise and truncation behavior were added. As
well as sema tests to make sure we error n the shape transformation
cast.

I am punting right now on the ConstExpr Matrix support. That will need
to be addressed later. Will file a seperate issue for that if reviewers
agree it can wait.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet