-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8338197: ubsan: ad_x86.hpp:6417:11: runtime error: shift exponent 100 is too large for 32-bit type 'unsigned int' #26890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into |
@bulasevich This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 31 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@bulasevich The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
… is too large for 32-bit type 'unsigned int'
1f91d49
to
7e8c282
Compare
Webrevs
|
src/hotspot/share/adlc/output_h.cpp
Outdated
fprintf(fp_hpp, " _mask <<= n;\n"); | ||
fprintf(fp_hpp, " int max_shift = 8 * sizeof(_mask) - 1;\n"); | ||
fprintf(fp_hpp, " _mask <<= (n < max_shift) ? n : max_shift;\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof(_mask) is know - it is sizeof(uint).
Lines 760-768 should be cleaned: <= 32
checks are redundant because of check at line 758. This is leftover from SPARC code (not clean) removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I removed the redundant code.
As for sizeof(_mask)
, shouldn’t it just be max_shift = 31
or _mask <<= (n < 32) ? n : 31;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if sizeof(uint)
is 32 bits on all our platforms.
Hmm, may be we should use uint32_t
for _mask
here. Then we can use 32 and 31 without confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean to use _mask <<= (n < 32) ? n : 31;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! Let me correct both variants then. The resulting code is:
class Pipeline_Use_Cycle_Mask {
protected:
uint32_t _mask;
public:
Pipeline_Use_Cycle_Mask() : _mask(0) {}
Pipeline_Use_Cycle_Mask(uint32_t mask) : _mask(mask) {}
bool overlaps(const Pipeline_Use_Cycle_Mask &in2) const {
return ((_mask & in2._mask) != 0);
}
Pipeline_Use_Cycle_Mask& operator<<=(int n) {
_mask <<= (n < 32) ? n : 31;
return *this;
}
void Or(const Pipeline_Use_Cycle_Mask &in2) {
_mask |= in2._mask;
}
friend Pipeline_Use_Cycle_Mask operator&(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &);
friend Pipeline_Use_Cycle_Mask operator|(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &);
friend class Pipeline_Use;
friend class Pipeline_Use_Element;
};
// code generated for arm32:
class Pipeline_Use_Cycle_Mask {
protected:
uint32_t _mask1, _mask2, _mask3;
public:
Pipeline_Use_Cycle_Mask() : _mask1(0), _mask2(0), _mask3(0) {}
Pipeline_Use_Cycle_Mask(uint32_t mask1, uint32_t mask2, uint32_t mask3) : _mask1(mask1), _mask2(mask2), _mask3(mask3) {}
Pipeline_Use_Cycle_Mask intersect(const Pipeline_Use_Cycle_Mask &in2) {
Pipeline_Use_Cycle_Mask out;
out._mask1 = _mask1 & in2._mask1;
out._mask2 = _mask2 & in2._mask2;
out._mask3 = _mask3 & in2._mask3;
return out;
}
bool overlaps(const Pipeline_Use_Cycle_Mask &in2) const {
return ((_mask1 & in2._mask1) != 0) || ((_mask2 & in2._mask2) != 0) || ((_mask3 & in2._mask3) != 0);
}
Pipeline_Use_Cycle_Mask& operator<<=(int n) {
if (n >= 32)
do {
_mask3 = _mask2; _mask2 = _mask1; _mask1 = 0;
} while ((n -= 32) >= 32);
if (n > 0) {
uint m = 32 - n;
uint32_t mask = (1 << n) - 1;
uint32_t temp2 = mask & (_mask1 >> m); _mask1 <<= n;
uint32_t temp3 = mask & (_mask2 >> m); _mask2 <<= n; _mask2 |= temp2;
_mask3 <<= n; _mask3 |= temp3;
}
return *this;
}
void Or(const Pipeline_Use_Cycle_Mask &);
friend Pipeline_Use_Cycle_Mask operator&(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &);
friend Pipeline_Use_Cycle_Mask operator|(const Pipeline_Use_Cycle_Mask &, const Pipeline_Use_Cycle_Mask &);
friend class Pipeline_Use;
friend class Pipeline_Use_Element;
};
I didn't realize we already had code to handle masks for large shifts. So I think the main problem is that _maxcycleused is not being set to the max value of 100. There is a secondary problem that we don't really need values that high, if the units are in pipeline stages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I will submit testing.
I think this also solves the problem, because the 100 is coming from a |
Or we can fix But I think code in |
Also note that the min_delay logic in Pipeline_Use::full_latency() initializes min_delay to _maxcycleused+1, so it does seem to expect _maxcycleused to be set to the max value. |
Yes, I think we should fix both, output_h.cpp and fixed_latency(100) on all platforms, then we can get rid of the workarounds and arm32-specific logic. |
When I looked into this earlier I thought the obvious thing needed to fix this was to reassign all the latencies so they represented a realizable pipeline delay. A proper fix would sensibly require each latency to be less than the pipeline length declared in the CPU model -- which for most arches is much less than 32. However, I didn't suggest such a rationalization because I believed (perhaps wrongly) that the latencies were also used to pick a preferred choice when we have alternative instruction/operand rule matches. The selection process involves comparing the cumulative latencies for subgraph nodes against the latency of each node defined by a match rule for the subgraph and picking the lowest latency result. After looking at some of the rules I was not sure that it would be easy to reduce all current latencies so they lie in the range 0-31 and still guarantee the current selection order. It would be even harder when the range was correctly reduced to 0 - lengthof(pipeline). I don't even think most rule authors understand that the latencies are used by the pipeline model and instead they simply use latency as a weight to enforce orderings. That's certainly how I understood it until I ran into this issue. If so then perhaps we would be better sticking with the de facto use and fixing the shift issue with a maximum shift bound. The mask tests which rely on this shift count may help with deriving scheduling delays for some instructions with small latencies but I don't believe it is very reliable even in cases where the accumulated shifts lie within the 32 bit range. If we are to change anything here then I think we need a review of the accuracy of pipeline models and their current or potential value before doing so. |
@dean-long Right! I checked that, it makes ubsan quiet. Please note. 100 isn’t the only triggering value. With an extra trace on macosx-aarch64 I see:
If we resolve it at parse stage, I think we should do the opposite: limit the user-specified value to maxcycleused.
|
My testing passed for version V02. |
That's a good point. While looking into this, I discovered that the initial masks generated by pipeline_res_mask_initializer() appear wrong. For example, the mask for stage 0 with 1 cycle is computed as 0x80000001, not the 0x1 that I would expect. Stage 2 with 1 cycle is 0x2, not 0x4, etc. I guess if all the masks are wrong in the same way, the problems might mostly cancel out, but it does shed doubt on the usefulness of this code. We could preserve the large latencies for now, and let them trigger the _maxcycleused > 32 code for more platforms. |
This reworks the recent update #24696 to fix a UBSan issue on aarch64. The problem now reproduces on x86_64 as well, which suggests the previous update was not optimal.
The issue reproduces with a HeapByteBufferTest jtreg test on a UBSan-enabled build. Actually the trigger is
XX:+OptoScheduling
option used by test (by default OptoScheduling is disabled on most x86 CPUs). With the option enabled, the failure can be reproduced with a simplejava -version
run.This fix is in ADLC-generated code. For simplicity, the examples below show the generated fragments.
The problems is that shift count
n
may be too large here:The recent change attempted to cap the shift amount at one call site:
However, there is another site where
Pipeline_Use_Cycle_Mask::operator<<=
can be called with a too-large shift count:Fix: cap the shift inside
Pipeline_Use_Cycle_Mask::operator<<=
so all call sites are safe:Note: on platforms where PipelineForm::_maxcycleused > 32 (e.g., ARM32), the Pipeline_Use_Cycle_Mask implementation already handles large shifts, so no additional check is needed:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26890/head:pull/26890
$ git checkout pull/26890
Update a local copy of the PR:
$ git checkout pull/26890
$ git pull https://git.openjdk.org/jdk.git pull/26890/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26890
View PR using the GUI difftool:
$ git pr show -t 26890
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26890.diff
Using Webrev
Link to Webrev Comment