-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8359820: Improve handshake/safepoint timeout diagnostic messages #26309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8359820: Improve handshake/safepoint timeout diagnostic messages #26309
Conversation
👋 Welcome back toxaart! A progress list of the required criteria for merging this PR into |
@toxaart This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 24 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dholmes-ora, @tstuefe) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
@toxaart I'm really looking for something in the fatal error handler so that instead of seeing just:
There is something there that indicates it was a handshake timeout. E.g.
We may need the handshake code to set a flag on the target Thread that the error code can query if it sees a SIGILL. |
…in VMError::report()
a8179fc
to
310ef85
Compare
…with-low-handshake-timeout-on-intel-sde
…sde' of https://github.com/toxaart/jdk into JDK-8359820-SIGILL-with-low-handshake-timeout-on-intel-sde
|
…with-low-handshake-timeout-on-intel-sde
BTW, for artificially generated signals we already have a clear indication in hs_err files. We print the sigaction structure associated with the signal. e.g.
SI_USER => sent via kill command or pthread_kill See also: https://pubs.opengroup.org/onlinepubs/007904875/functions/sigaction.html I have nothing against making this clearer, just saying that the info is already kind of there. |
Thank you, @dholmes-ora . I already answered Anton, but I get that now. |
Yes, the slow thread started reporting, and I think I also observed the latter message as well. Note that the fatal error is still processed in the end of the timeout handler, but not reported by VMError, as it can report only one such error. So yes, we want to improve the reporting for case A: when a slow thread receives a SIGILL and dies being able to handle the error, we want to know if SIGILL came from handshake/safepoint timeout and print extra info if that is the case.
Thanks, added to the latest change.
I think this would be a more invasive change, we can do it when there is a real need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of this looks good, but I have a few remaining nits. Thanks.
…with-low-handshake-timeout-on-intel-sde
…with-low-handshake-timeout-on-intel-sde
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks
/integrate |
/sponsor |
Going to push as commit 6656e76.
Your commit was automatically rebased without conflicts. |
@dholmes-ora @toxaart Pushed as commit 6656e76. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few post-integration notes, maybe do a little follow-up cleanup?
Hi, please consider the following changes:
The problem in the issue description is not a problem by itself, the behavior is not unexpected, but it is somewhat difficult to find out what caused SIGILL to be fired.
We propagate this information from
handshake::handle_timeout()
toVMError::report()
with a help of a global variable. The same mechanism is used to address a similar issue in the safepoint timeout handler.Tested in tiers 1-3.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26309/head:pull/26309
$ git checkout pull/26309
Update a local copy of the PR:
$ git checkout pull/26309
$ git pull https://git.openjdk.org/jdk.git pull/26309/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26309
View PR using the GUI difftool:
$ git pr show -t 26309
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26309.diff
Using Webrev
Link to Webrev Comment