-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8364819: Post-integration cleanups for JDK-8359820 #26656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8364819: Post-integration cleanups for JDK-8359820 #26656
Conversation
👋 Welcome back toxaart! A progress list of the required criteria for merging this PR into |
@toxaart This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 149 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @dholmes-ora, @albertnetymk) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
Also a closing parenthesis in test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java message matcher, I think? |
A better name for the PR and ticket is: "8364819: Post-integration cleanups for JDK-8359820" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me, thanks! But @dholmes-ora should take another look, since he reviewed the original patch.
/reviewers 2
Thanks for spotting, yes, that one has to be changed. Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volatile Thread* VMError::_handshake_timed_out_thread = nullptr; | ||
volatile Thread* VMError::_safepoint_timed_out_thread = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have picked this up previously but volatile
generally serves no purpose for inter-thread consistency. If we need to guarantee visibility between the signaller and signalee, then we need a fence to ensure that. Or we can skip the fence (and volatile) and note that this will usually, but perhaps not always, work fine. (I note that pthread_kill
is not specified as a memory synchronizing operation, but I strongly suspect it has to have its own internal memory barriers that we would be piggy-backing on.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure the memory consistency here is irrelevant, given that: a) the Thread
in question is likely already initialized for a long time, so it is unlikely we will lose anything release-acquire-wise; b) we only use these for pointer comparisons.
But since we are here, and in the interest of clarity and avoiding future surprises, we can just summarily wrap these with Atomic::release_store
and Atomic::load_acquire
to be extra paranoidly safe. This is a failing path, so we don't care about performance, and would like to avoid a secondary crash in error handler some time in the future, if anyone reads anything from these threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dholmes-ora and @shipilev, I addressed the possible issue with atomic commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shipilev The memory consistency is about seeing the writes to these fields in the thread that is signalled. acquire/release has no meaning in this context. I would not add acquire/release just in case someone in the future actually tried to follow those pointers - they are only for identifying purposes. If you are worried about that then lets go back to changing them to a value (intptr_t) so it will never be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It always feels dubious to me to add fences for the single-variable "visibility" reasons. The fences order things relative to other things, not just "flush" or "make this single variable visible". But that's a theoretical quibble. A more pressing problem is reading the thread marker without any acquire/relaxed semantics; that might miss updates as well.
Given how messy C++ memory model is, and how data races are UB-scary there, I think we should strive to do Atomic
-s as the matter of course for anything that transfers data between threads. Adjusting Atomic::release_store
-> Atomic::release_store_fence
would have satisfied both flavors of concurrency paranoia we are having, I think: it is equivalent to having a fence after the store, and it clearly makes Atomic release->acquire chain as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acquire/release does not achieve that it only says what happens if you see the new value of field
Note I am describing the semantic model for acquire/release in a general sense - as we describe in orderAccess.hpp. An actual implementation may ensure memory operations actually complete before the release
in a similar way to a fence
(e.g. X86 mfence
).
A "fence" may only ensure ordering in its abstract description but that suffices, as the store to the field must happen before any of the stores related to raising the signal, which must happen before the signal can actually be delivered, which happens before we load the field. Hence by transitivity we are guaranteed to see the fields written value, by using the fence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, getting too deep into theory and how this case might be special is counter-productive here.
Honestly, I do not understand the aversion to the idea that a field that is accessed by several threads should be nominally wrapped with Atomic
. (Also note that some fields in VMError
already do this.) Whatever the current situation is, that might allow for a low-level naked fence, the situation can change. I strongly believe we should be erring on the side of caution and future-proofness, unless there are other concerns on the table, like performance. There is no other concerns here, AFAICS. If you want a fence
after the store for whatever reason, that's fine, there is Atomic::release_store_fence
that gives it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the "atomic" that is the issue it is the inappropriate and unnecessary use of acquire/release semantics. When people look at this code later they will wonder what it is that we are trying to coordinate - when the answer is "nothing". That just causes confusion. Synchronization and concurrency is hard enough without obfuscating things by sprinkling the wrong kind of synchronization constructs around "just to be safe". That isn't future-proofing things. If in the future we have different synchronization requirements then those requirements need to be understood and the correct code inserted. Maybe in the future we will need a mutex - who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "inappropriate and unnecessary", I say "conservative". I think this is an opinion stalemate, so I'll just recluse myself from this conversation, and let someone else to break this tie.
There is no confusion to me: we pass something between the threads without clear additional synchronization in sight (i.e. mutexes) => we do Atomic
-s. Deciding on the memory ordering for Atomics is: unless we see the need for performance, we default to conservative (acqrel and minimum for pointers, seqcst if we are are paranoid and cannot guarantee it is one-sided transfer) ordering, to avoid future accidents.
If anything, a naked fence()
raises much more questions for me in comparison with Atomic
accesses. Mostly because fences are significantly more low-level than Atomics. When I look at this code from the perspective of bystander, these are the questions that pop into my mind: Why it is only the fence on the write side? Shouldn't there be a fence on the reader side somewhere then? Maybe we are optimizing performance? Are we relying on some other (partial) synchronization? Are we stacking with some other fence? Are there arch-specific details about what fences actually do? Etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness our own conventions say we should use Atomic::load
and Atomic::store
to highlight the lock-free access. But only a fence provides the guarantee of visibility. acquire/release does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updates look fine to me. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think these should be wrapped with Atomic
-s, but I don't have energy to argue in favor of them anymore. So I would defer a second review to someone else.
True, but I thought we denote a variable
How about making the ordering explicit by using Writer side:
Reader side:
I think for this to work, we need ordering btw set-field and sending-signal and btw reading-signal and read-field. Therefore, the acquire/release "operation" should be performed on the "signal" variable. (Ofc, such "signal" variable is not directly accessible, hence, the suggested I have to say it's indeed a bit odd that "setters" (e.g. |
I think we have 2 orthogonal problems here:
I came to conclusion that the 1st issue is not an issue, as signal firing and receiving is very heavy compared by a simple store to a variable. One can safely assume that by the time the signal is received on the target thread the memory operation has been already finished and the value will be visible. Hence we do not need a fence. We may have it, but it is not a must. The 2nd issue has more a hypothetical one. If we ever see more than one thread timing out, the variable keeping the thread address of the timed out thread will be updated more than once. More than one signal will be sent. But, as the comment on top of I also brought back the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks more straight-forward. I would still prefer to see Atomic::load
(can be without acquire) for loads on reader side to match the Atomic
RMW on writer side. This would give us at least coherency for the load.
I was aware of only the first one. Thank you for the clarification. Conventionally, the
Maybe such "heavy" operation contains enough instructions that CPU doesn't/couldn't reorder the important read... Just to be explicit, the following is the problematic scenario I had in mind, and I assume it's the same one Aleksey talked about. litmus source
|
Okay, I added |
I do not think so, getters return a copy of the thread's address.
I am not familiar with herd7 tool and with ARM nuances, but I do not think the example is relevant. This scenario cannot be applied to this particular case, as we never will have update of the value happening concurrently with the read, as there is an expensive operation between the two. Let's not overengineer the solution for this failing code path. |
My previous msg was probably unclear -- I meant to use
I think it's fragile to rely on surrounding code to maintain the desired memory ordering. An |
Our convention for lock-free code is that we should use @albertnetymk suggestion to use explicit Arguably, as stated much much earlier, we don't need any explicit memory ordering here in practice as the signal implementation must provide its own ordering guarantees to actually function - and there is no way a compiler would ever re-order the store with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably, as stated much much earlier, we don't need any explicit memory ordering here in practice as the signal implementation must provide its own ordering guarantees to actually function - and there is no way a compiler would ever re-order the store with the pthread_kill call!
I see. I think I misunderstood the problem a bit. Hopefully, I get it this time -- the non-inline function-call should prevent compiler-reordering and the system-call inside pthread_kill
should prevent CPU-reordering. The only thing needed here is to ensure the store-to-the-shared-var becomes visible on other CPUs, e.g. not hidden inside store-buffer. Conceptually, pthread_kill
performs a "store" as well, so the invariant is that the two stores should be ordered. x64 uses FIFO on store-buffer, so everything is fine. OTOH, aarch64 needs some amending to enforce FIFO for visibility of these two stores.
Atomic::replace_if_null
should be enough to provide the ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks.
FTR use of volatile
is the preferred convention.
Thanks @dholmes-ora /integrate |
/integrate |
/sponsor |
Going to push as commit 4ffd2a8.
Your commit was automatically rebased without conflicts. |
@albertnetymk @toxaart Pushed as commit 4ffd2a8. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi, please consider the following changes:
This is a cleanup after #26309
_handshake_timed_out_thread
and_safepoint_timed_out_thread
are nowThread*
and notintptr_t
, no conversionsp2i <-> i2p
needed.Added a missed brace in the error message.
Updates are done with
Atomic::replace_if_null()
to address possible multiple updates and visibility among all threads.Trivial change.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26656/head:pull/26656
$ git checkout pull/26656
Update a local copy of the PR:
$ git checkout pull/26656
$ git pull https://git.openjdk.org/jdk.git pull/26656/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26656
View PR using the GUI difftool:
$ git pr show -t 26656
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26656.diff
Using Webrev
Link to Webrev Comment