8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' #26890

bulasevich · 2025-08-22T00:47:48Z

This reworks the recent update #24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.

The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is XX:+OptoScheduling option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simple java -version run.

This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.

The problems is that shift count n may be too large here:

class Pipeline_Use_Cycle_Mask {
protected:
  uint _mask;
  ..
  Pipeline_Use_Cycle_Mask& operator<<=(int n) {
    _mask <<= n;
    return *this;
  }
};

The recent change attempted to cap the shift amount at one call site:

class Pipeline_Use_Element {
protected:
  ..
  // Mask of specific used cycles
  Pipeline_Use_Cycle_Mask _mask;
  ..
  void step(uint cycles) {
    _used = 0;
    uint max_shift = 8 * sizeof(_mask) - 1;
    _mask <<= (cycles < max_shift) ? cycles : max_shift;
  }
}

However, there is another site where Pipeline_Use_Cycle_Mask::operator<<= can be called with a too-large shift count:

// The following two routines assume that the root Pipeline_Use entity
// consists of exactly 1 element for each functional unit
// start is relative to the current cycle; used for latency-based info
uint Pipeline_Use::full_latency(uint delay, const Pipeline_Use &pred) const {
  for (uint i = 0; i < pred._count; i++) {
    const Pipeline_Use_Element *predUse = pred.element(i);
    if (predUse->_multiple) {
      uint min_delay = 7;
      // Multiple possible functional units, choose first unused one
      for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
        const Pipeline_Use_Element *currUse = element(j);
        uint curr_delay = delay;
        if (predUse->_used & currUse->_used) {
          Pipeline_Use_Cycle_Mask x = predUse->_mask;
          Pipeline_Use_Cycle_Mask y = currUse->_mask;

          for ( y <<= curr_delay; x.overlaps(y); curr_delay++ )
            y <<= 1;
        }
        if (min_delay > curr_delay)
          min_delay = curr_delay;
      }
      if (delay < min_delay)
      delay = min_delay;
    }
    else {
      for (uint j = predUse->_lb; j <= predUse->_ub; j++) {
        const Pipeline_Use_Element *currUse = element(j);
        if (predUse->_used & currUse->_used) {
          Pipeline_Use_Cycle_Mask x = predUse->_mask;
          Pipeline_Use_Cycle_Mask y = currUse->_mask;

>        for ( y <<= delay; x.overlaps(y); delay++ )
            y <<= 1;
        }
      }
    }
  }

  return (delay);
}

Fix: cap the shift inside Pipeline_Use_Cycle_Mask::operator<<= so all call sites are safe:

class Pipeline_Use_Cycle_Mask {
protected:
  uint _mask;
  ..
  Pipeline_Use_Cycle_Mask& operator<<=(int n) {
    int max_shift = 8 * sizeof(_mask) - 1;
    _mask <<= (n < max_shift) ? n : max_shift;
    return *this;
  }
};

class Pipeline_Use_Element {
protected:
  ..
  // Mask of specific used cycles
  Pipeline_Use_Cycle_Mask _mask;
  ..
  void step(uint cycles) {
    _used = 0;
    _mask <<= cycles;
  }
}

Note: on platforms where PipelineForm::_maxcycleused > 32 (e.g., ARM32), the Pipeline_Use_Cycle_Mask implementation already handles large shifts, so no additional check is needed:

class Pipeline_Use_Cycle_Mask {
protected:
  uint _mask1, _mask2, _mask3;

  Pipeline_Use_Cycle_Mask& operator<<=(int n) {
    if (n >= 32)
      do {
        _mask3 = _mask2; _mask2 = _mask1; _mask1 = 0;
      } while ((n -= 32) >= 32);

    if (n > 0) {
      uint m = 32 - n;
      uint mask = (1 << n) - 1;
      uint temp2 = mask & (_mask1 >> m); _mask1 <<= n;
      uint temp3 = mask & (_mask2 >> m); _mask2 <<= n; _mask2 |= temp2;
      _mask3 <<= n; _mask3 |= temp3;
    }
    return *this;
  }
}

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' (Bug - P4)

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890
$ git checkout pull/26890

Update a local copy of the PR:
$ git checkout pull/26890
$ git pull https://git.openjdk.org/jdk.git pull/26890/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26890

View PR using the GUI difftool:
$ git pr show -t 26890

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26890.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-08-22T00:48:36Z

👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-08-22T00:49:25Z

@bulasevich This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int'

Reviewed-by: kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 31 new commits pushed to the master branch:

d24449f: 8365815: JFR: Update metadata.xml with 'jfr query' examples
45726a1: 8365052: G1: Remove G1CollectionSet::groups() accessors
5cc8673: 8365765: thread.inline.hpp includes the wrong primary header file
... and 28 more: https://git.openjdk.org/jdk/compare/cf70cb70bcd5292ed10d8fb08019f0da82db25dd...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-08-22T00:50:04Z

@bulasevich The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

… is too large for 32-bit type 'unsigned int'

mlbridge · 2025-08-22T16:34:04Z

Webrevs

vnkozlov · 2025-08-22T16:46:17Z

src/hotspot/share/adlc/output_h.cpp

-    fprintf(fp_hpp, "    _mask <<= n;\n");
+    fprintf(fp_hpp, "    int max_shift = 8 * sizeof(_mask) - 1;\n");
+    fprintf(fp_hpp, "    _mask <<= (n < max_shift) ? n : max_shift;\n");


sizeof(_mask) is know - it is sizeof(uint).
Lines 760-768 should be cleaned: <= 32 checks are redundant because of check at line 758. This is leftover from SPARC code (not clean) removal.

Good point - I removed the redundant code.

As for sizeof(_mask), shouldn’t it just be max_shift = 31 or _mask <<= (n < 32) ? n : 31;?

Yes, if sizeof(uint) is 32 bits on all our platforms.

Hmm, may be we should use uint32_t for _mask here. Then we can use 32 and 31 without confusion.

I mean to use _mask <<= (n < 32) ? n : 31;

Good! Let me correct both variants then. The resulting code is:

class Pipeline_Use_Cycle_Mask { protected: uint32_t _mask; public: Pipeline_Use_Cycle_Mask() : _mask(0) {} Pipeline_Use_Cycle_Mask(uint32_t mask) : _mask(mask) {} bool overlaps(const Pipeline_Use_Cycle_Mask &in2) const { return ((_mask & in2._mask) != 0); } Pipeline_Use_Cycle_Mask& operator<<=(int n) { _mask <<= (n < 32) ? n : 31; return *this; } void Or(const Pipeline_Use_Cycle_Mask &in2) { _mask |= in2._mask; } friend Pipeline_Use_Cycle_Mask operator&(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend Pipeline_Use_Cycle_Mask operator|(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend class Pipeline_Use; friend class Pipeline_Use_Element; };

// code generated for arm32: class Pipeline_Use_Cycle_Mask { protected: uint32_t _mask1, _mask2, _mask3; public: Pipeline_Use_Cycle_Mask() : _mask1(0), _mask2(0), _mask3(0) {} Pipeline_Use_Cycle_Mask(uint32_t mask1, uint32_t mask2, uint32_t mask3) : _mask1(mask1), _mask2(mask2), _mask3(mask3) {} Pipeline_Use_Cycle_Mask intersect(const Pipeline_Use_Cycle_Mask &in2) { Pipeline_Use_Cycle_Mask out; out._mask1 = _mask1 & in2._mask1; out._mask2 = _mask2 & in2._mask2; out._mask3 = _mask3 & in2._mask3; return out; } bool overlaps(const Pipeline_Use_Cycle_Mask &in2) const { return ((_mask1 & in2._mask1) != 0) || ((_mask2 & in2._mask2) != 0) || ((_mask3 & in2._mask3) != 0); } Pipeline_Use_Cycle_Mask& operator<<=(int n) { if (n >= 32) do { _mask3 = _mask2; _mask2 = _mask1; _mask1 = 0; } while ((n -= 32) >= 32); if (n > 0) { uint m = 32 - n; uint32_t mask = (1 << n) - 1; uint32_t temp2 = mask & (_mask1 >> m); _mask1 <<= n; uint32_t temp3 = mask & (_mask2 >> m); _mask2 <<= n; _mask2 |= temp2; _mask3 <<= n; _mask3 |= temp3; } return *this; } void Or(const Pipeline_Use_Cycle_Mask &); friend Pipeline_Use_Cycle_Mask operator&(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend Pipeline_Use_Cycle_Mask operator|(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &); friend class Pipeline_Use; friend class Pipeline_Use_Element; };

dean-long · 2025-08-22T21:36:56Z

I didn't realize we already had code to handle masks for large shifts. So I think the main problem is that _maxcycleused is not being set to the max value of 100. There is a secondary problem that we don't really need values that high, if the units are in pipeline stages.

vnkozlov

Looks good. I will submit testing.

dean-long · 2025-08-26T00:27:10Z

diff --git a/src/hotspot/share/adlc/adlparse.cpp b/src/hotspot/share/adlc/adlparse.cpp
index 033e8d26ca7..ca6e8b7ed5e 100644
--- a/src/hotspot/share/adlc/adlparse.cpp
+++ b/src/hotspot/share/adlc/adlparse.cpp
@@ -1770,6 +1770,10 @@ void ADLParser::pipe_class_parse(PipelineForm &pipeline) {
         return;
       }
 
+      if (pipeline._maxcycleused < fixed_latency) {
+        pipeline._maxcycleused = fixed_latency;
+      }
+
       pipe_class->setFixedLatency(fixed_latency);
       next_char(); skipws();
       continue;

I think this also solves the problem, because the 100 is coming from a fixed_latency(100) statement.

vnkozlov · 2025-08-26T00:48:03Z

I think this also solves the problem, because the 100 is coming from a fixed_latency(100) statement.

Or we can fix pipe_slow()to use reasonable fixed_latency instead of arbitrary 100.
It is used for float point instructions mostly and, I think, came from time when we used FPU instead of current SSE/AVX instructions.

But I think code in output_h.cpp should be fixed, as proposed, regardless what we do with fixed_latency.

dean-long · 2025-08-26T01:09:04Z

Also note that the min_delay logic in Pipeline_Use::full_latency() initializes min_delay to _maxcycleused+1, so it does seem to expect _maxcycleused to be set to the max value.

dean-long · 2025-08-26T02:02:10Z

Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.

adinn · 2025-08-26T10:51:43Z

Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic.

When I looked into this earlier I thought the obvious thing needed to fix this was to reassign all the latencies so they represented a realizable pipeline delay. A proper fix would sensibly require each latency to be less than the pipeline length declared in the CPU model -- which for most arches is much less than 32. However, I didn't suggest such a rationalization because I believed (perhaps wrongly) that the latencies were also used to pick a preferred choice when we have alternative instruction/operand rule matches. The selection process involves comparing the cumulative latencies for subgraph nodes against the latency of each node defined by a match rule for the subgraph and picking the lowest latency result. After looking at some of the rules I was not sure that it would be easy to reduce all current latencies so they lie in the range 0-31 and still guarantee the current selection order. It would be even harder when the range was correctly reduced to 0 - lengthof(pipeline).

I don't even think most rule authors understand that the latencies are used by the pipeline model and instead they simply use latency as a weight to enforce orderings. That's certainly how I understood it until I ran into this issue. If so then perhaps we would be better sticking with the de facto use and fixing the shift issue with a maximum shift bound. The mask tests which rely on this shift count may help with deriving scheduling delays for some instructions with small latencies but I don't believe it is very reliable even in cases where the accumulated shifts lie within the 32 bit range. If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so.

bulasevich · 2025-08-26T14:17:31Z

+      if (pipeline._maxcycleused < fixed_latency) {
+        pipeline._maxcycleused = fixed_latency;
+      }
+
I think this also solves the problem, because the 100 is coming from a fixed_latency(100) statement.

@dean-long Right! I checked that, it makes ubsan quiet.

Please note. 100 isn’t the only triggering value. With an extra trace on macosx-aarch64 I see:

printf("%i -> %i\n", pipeline._maxcycleused, fixed_latency);
6 -> 8
8 -> 16
16 -> 100

If we resolve it at parse stage, I think we should do the opposite: limit the user-specified value to maxcycleused.

diff --git a/src/hotspot/share/adlc/adlparse.cpp b/src/hotspot/share/adlc/adlparse.cpp
index 033e8d26ca7..1060f7b18ab 100644
--- a/src/hotspot/share/adlc/adlparse.cpp
+++ b/src/hotspot/share/adlc/adlparse.cpp
@@ -1770,7 +1770,7 @@ void ADLParser::pipe_class_parse(PipelineForm &pipeline) {
         return;
       }

-      pipe_class->setFixedLatency(fixed_latency);
+      pipe_class->setFixedLatency(fixed_latency <= pipeline._maxcycleused ? fixed_latency : pipeline._maxcycleused);
       next_char(); skipws();
       continue;
     }

vnkozlov · 2025-08-26T16:16:23Z

My testing passed for version V02.

dean-long · 2025-08-26T19:12:03Z

If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so.

That's a good point. While looking into this, I discovered that the initial masks generated by pipeline_res_mask_initializer() appear wrong. For example, the mask for stage 0 with 1 cycle is computed as 0x80000001, not the 0x1 that I would expect. Stage 2 with 1 cycle is 0x2, not 0x4, etc. I guess if all the masks are wrong in the same way, the problems might mostly cancel out, but it does shed doubt on the usefulness of this code.

We could preserve the large latencies for now, and let them trigger the _maxcycleused > 32 code for more platforms.

openjdk bot added the hotspot-compiler [email protected] label Aug 22, 2025

8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100…

7e8c282

… is too large for 32-bit type 'unsigned int'

bulasevich force-pushed the JDK-8338197-ubsan branch from 1f91d49 to 7e8c282 Compare August 22, 2025 15:33

bulasevich marked this pull request as ready for review August 22, 2025 16:29

openjdk bot added the rfr Pull request is ready for review label Aug 22, 2025

vnkozlov reviewed Aug 22, 2025

View reviewed changes

remove redundant code

389a9da

use uint32_t for _mask

e3ac870

vnkozlov approved these changes Aug 25, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Aug 25, 2025

8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' #26890

Are you sure you want to change the base?

8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' #26890

Conversation

bulasevich commented Aug 22, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Aug 22, 2025

Uh oh!

openjdk bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Aug 22, 2025

Uh oh!

mlbridge bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

vnkozlov Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

bulasevich Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

vnkozlov Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

vnkozlov Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

bulasevich Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

dean-long commented Aug 22, 2025

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

dean-long commented Aug 26, 2025

Uh oh!

vnkozlov commented Aug 26, 2025

Uh oh!

dean-long commented Aug 26, 2025

Uh oh!

dean-long commented Aug 26, 2025

Uh oh!

adinn commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bulasevich commented Aug 26, 2025

Uh oh!

vnkozlov commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dean-long commented Aug 26, 2025

Uh oh!

Uh oh!

bulasevich commented Aug 22, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Aug 22, 2025 •

edited

Loading

mlbridge bot commented Aug 22, 2025 •

edited

Loading

adinn commented Aug 26, 2025 •

edited

Loading

vnkozlov commented Aug 26, 2025 •

edited

Loading