Skip to content

Conversation

toxaart
Copy link
Contributor

@toxaart toxaart commented Aug 6, 2025

Hi, please consider the following changes:

This is a cleanup after #26309

  1. _handshake_timed_out_thread and _safepoint_timed_out_thread are now Thread* and not intptr_t, no conversions p2i <-> i2p needed.

  2. Added a missed brace in the error message.

  3. Updates are done with Atomic::replace_if_null() to address possible multiple updates and visibility among all threads.

Trivial change.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8364819: Post-integration cleanups for JDK-8359820 (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26656/head:pull/26656
$ git checkout pull/26656

Update a local copy of the PR:
$ git checkout pull/26656
$ git pull https://git.openjdk.org/jdk.git pull/26656/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26656

View PR using the GUI difftool:
$ git pr show -t 26656

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26656.diff

Using Webrev

Link to Webrev Comment

@toxaart toxaart marked this pull request as ready for review August 6, 2025 10:06
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 6, 2025

👋 Welcome back toxaart! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 6, 2025

@toxaart This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8364819: Post-integration cleanups for JDK-8359820

Reviewed-by: dholmes, ayang, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 149 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @dholmes-ora, @albertnetymk) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 6, 2025
@openjdk
Copy link

openjdk bot commented Aug 6, 2025

@toxaart The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Aug 6, 2025

@shipilev
Copy link
Member

shipilev commented Aug 6, 2025

Also a closing parenthesis in test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java message matcher, I think?

@shipilev
Copy link
Member

shipilev commented Aug 6, 2025

A better name for the PR and ticket is: "8364819: Post-integration cleanups for JDK-8359820"

@toxaart toxaart changed the title 8364819: Cleanup handshake_timed_out_thread and safepoint_timed_out_thread 8364819: Post-integration cleanups for JDK-8359820 Aug 6, 2025
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, thanks! But @dholmes-ora should take another look, since he reviewed the original patch.

/reviewers 2

@toxaart
Copy link
Contributor Author

toxaart commented Aug 6, 2025

Also a closing parenthesis in test/hotspot/jtreg/runtime/Safepoint/TestAbortVMOnSafepointTimeout.java message matcher, I think?

Thanks for spotting, yes, that one has to be changed. Done.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 6, 2025
@openjdk
Copy link

openjdk bot commented Aug 6, 2025

@shipilev
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 6, 2025
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching these @shipilev . I have to confess I thought there was a reason we stored the thread as just an id rather than a true pointer, but I was mistaken.

@toxaart thanks for the speedy follow up here. I have one other nit I just noticed - @shipilev may want to weigh in on it too.

Comment on lines 107 to 108
volatile Thread* VMError::_handshake_timed_out_thread = nullptr;
volatile Thread* VMError::_safepoint_timed_out_thread = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have picked this up previously but volatile generally serves no purpose for inter-thread consistency. If we need to guarantee visibility between the signaller and signalee, then we need a fence to ensure that. Or we can skip the fence (and volatile) and note that this will usually, but perhaps not always, work fine. (I note that pthread_kill is not specified as a memory synchronizing operation, but I strongly suspect it has to have its own internal memory barriers that we would be piggy-backing on.)

Copy link
Member

@shipilev shipilev Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure the memory consistency here is irrelevant, given that: a) the Thread in question is likely already initialized for a long time, so it is unlikely we will lose anything release-acquire-wise; b) we only use these for pointer comparisons.

But since we are here, and in the interest of clarity and avoiding future surprises, we can just summarily wrap these with Atomic::release_store and Atomic::load_acquire to be extra paranoidly safe. This is a failing path, so we don't care about performance, and would like to avoid a secondary crash in error handler some time in the future, if anyone reads anything from these threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dholmes-ora and @shipilev, I addressed the possible issue with atomic commands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shipilev The memory consistency is about seeing the writes to these fields in the thread that is signalled. acquire/release has no meaning in this context. I would not add acquire/release just in case someone in the future actually tried to follow those pointers - they are only for identifying purposes. If you are worried about that then lets go back to changing them to a value (intptr_t) so it will never be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always feels dubious to me to add fences for the single-variable "visibility" reasons. The fences order things relative to other things, not just "flush" or "make this single variable visible". But that's a theoretical quibble. A more pressing problem is reading the thread marker without any acquire/relaxed semantics; that might miss updates as well.

Given how messy C++ memory model is, and how data races are UB-scary there, I think we should strive to do Atomic-s as the matter of course for anything that transfers data between threads. Adjusting Atomic::release_store -> Atomic::release_store_fence would have satisfied both flavors of concurrency paranoia we are having, I think: it is equivalent to having a fence after the store, and it clearly makes Atomic release->acquire chain as well :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acquire/release does not achieve that it only says what happens if you see the new value of field

Note I am describing the semantic model for acquire/release in a general sense - as we describe in orderAccess.hpp. An actual implementation may ensure memory operations actually complete before the release in a similar way to a fence (e.g. X86 mfence).

A "fence" may only ensure ordering in its abstract description but that suffices, as the store to the field must happen before any of the stores related to raising the signal, which must happen before the signal can actually be delivered, which happens before we load the field. Hence by transitivity we are guaranteed to see the fields written value, by using the fence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, getting too deep into theory and how this case might be special is counter-productive here.

Honestly, I do not understand the aversion to the idea that a field that is accessed by several threads should be nominally wrapped with Atomic. (Also note that some fields in VMError already do this.) Whatever the current situation is, that might allow for a low-level naked fence, the situation can change. I strongly believe we should be erring on the side of caution and future-proofness, unless there are other concerns on the table, like performance. There is no other concerns here, AFAICS. If you want a fence after the store for whatever reason, that's fine, there is Atomic::release_store_fence that gives it.

Copy link
Member

@dholmes-ora dholmes-ora Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the "atomic" that is the issue it is the inappropriate and unnecessary use of acquire/release semantics. When people look at this code later they will wonder what it is that we are trying to coordinate - when the answer is "nothing". That just causes confusion. Synchronization and concurrency is hard enough without obfuscating things by sprinkling the wrong kind of synchronization constructs around "just to be safe". That isn't future-proofing things. If in the future we have different synchronization requirements then those requirements need to be understood and the correct code inserted. Maybe in the future we will need a mutex - who knows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say "inappropriate and unnecessary", I say "conservative". I think this is an opinion stalemate, so I'll just recluse myself from this conversation, and let someone else to break this tie.

There is no confusion to me: we pass something between the threads without clear additional synchronization in sight (i.e. mutexes) => we do Atomic-s. Deciding on the memory ordering for Atomics is: unless we see the need for performance, we default to conservative (acqrel and minimum for pointers, seqcst if we are are paranoid and cannot guarantee it is one-sided transfer) ordering, to avoid future accidents.

If anything, a naked fence() raises much more questions for me in comparison with Atomic accesses. Mostly because fences are significantly more low-level than Atomics. When I look at this code from the perspective of bystander, these are the questions that pop into my mind: Why it is only the fence on the write side? Shouldn't there be a fence on the reader side somewhere then? Maybe we are optimizing performance? Are we relying on some other (partial) synchronization? Are we stacking with some other fence? Are there arch-specific details about what fences actually do? Etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness our own conventions say we should use Atomic::load and Atomic::store to highlight the lock-free access. But only a fence provides the guarantee of visibility. acquire/release does not.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments...

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates look fine to me. Thanks.

@toxaart toxaart requested a review from shipilev August 7, 2025 12:28
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think these should be wrapped with Atomic-s, but I don't have energy to argue in favor of them anymore. So I would defer a second review to someone else.

@albertnetymk
Copy link
Member

I should have picked this up previously but volatile generally serves no purpose for inter-thread consistency.

True, but I thought we denote a variable volatile if it can be accessed by multiple threads, mostly for documentation purpose. In this case, it should be e.g. Thread* volatile VMError::_handshake_timed_out_thread.

(I note that pthread_kill is not specified as a memory synchronizing operation, but I strongly suspect it has to have its own internal memory barriers that we would be piggy-backing on.)

How about making the ordering explicit by using OrderAccess::acquire/release in reader/writer side? Sth like:

Writer side:

    VMError::set_handshake_timed_out_thread(target);
    OrderAccess::release();
    if (os::signal_thread(target, SIGILL, "cannot be handshaked")) {

Reader side:

      if (_siginfo != nullptr && os::signal_sent_by_kill(_siginfo)) {
        OrderAccess::acquire();
        if (get_handshake_timed_out_thread() == _thread) {

Here we have a very simple situation:
Thread 1: set field; signal target thread
Target thread: receive signal; read field

I think for this to work, we need ordering btw set-field and sending-signal and btw reading-signal and read-field. Therefore, the acquire/release "operation" should be performed on the "signal" variable. (Ofc, such "signal" variable is not directly accessible, hence, the suggested OrderAccess above.)

I have to say it's indeed a bit odd that "setters" (e.g. set_handshake_timed_out_thread) have fence() there for no obvious reasons.

@toxaart
Copy link
Contributor Author

toxaart commented Aug 12, 2025

I think we have 2 orthogonal problems here:

  1. We want to make a value stored in that global variables to be visible to all threads.

  2. We may have multiple updates of that value cause by (theoretically) more than one thread timed out.

I came to conclusion that the 1st issue is not an issue, as signal firing and receiving is very heavy compared by a simple store to a variable. One can safely assume that by the time the signal is received on the target thread the memory operation has been already finished and the value will be visible. Hence we do not need a fence. We may have it, but it is not a must.

The 2nd issue has more a hypothetical one. If we ever see more than one thread timing out, the variable keeping the thread address of the timed out thread will be updated more than once. More than one signal will be sent. But, as the comment on top of VMError::report() says, "Only one thread can call this function, so we don't need to worry about MT-safety." And we do not know which thread will execute this method anyway. Therefore, I suggest to use Atomic::replace_if_null() in the setter with the default (conservative) memory order. This will discard all updates except the 1st one and give us fences addressing the 1st not-really-an-issue. In case of a mismatch between the reporting thread and the thread stored in the variable we fall back to default reporting, but I think it will be an extremely rare case.

I also brought back the volatile keyword for indicative purposes.

@openjdk openjdk bot added rfr Pull request is ready for review and removed rfr Pull request is ready for review labels Aug 12, 2025
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks more straight-forward. I would still prefer to see Atomic::load (can be without acquire) for loads on reader side to match the Atomic RMW on writer side. This would give us at least coherency for the load.

@albertnetymk
Copy link
Member

I think we have 2 orthogonal problems here:

I was aware of only the first one. Thank you for the clarification.

Conventionally, the Atomic::X is used to access volatile vars, so the getters should be updated as well.

I came to conclusion that the 1st issue is not an issue, as signal firing and receiving is very heavy compared by a simple store to a variable.

Maybe such "heavy" operation contains enough instructions that CPU doesn't/couldn't reorder the important read... Just to be explicit, the following is the problematic scenario I had in mind, and I assume it's the same one Aleksey talked about.

litmus source
C signal

{
[x] = 0;
[y] = 0;
}

P0 (atomic_int* x, atomic_int* y) {
  atomic_store_explicit(x, 1, memory_order_relaxed);

  atomic_thread_fence(memory_order_seq_cst);

  atomic_store_explicit(y, 1, memory_order_seq_cst);
}

P1 (atomic_int* x, atomic_int* y) {
  r0 = atomic_load_explicit(y, memory_order_relaxed);
  // required to prevent loading-x from floating up
  atomic_thread_fence(memory_order_acquire);
  if (r0 == 1) {
    r1 = atomic_load_explicit(x, memory_order_relaxed);
  }
}

exists
(  true
/\ 1:r0 == 1
/\ 1:r1 == 0
)

herd7 -c11 -cat rc11.cat signal.litmus shows that the acquire-fence on the reader side is necessary.

@toxaart
Copy link
Contributor Author

toxaart commented Aug 12, 2025

OK, this looks more straight-forward. I would still prefer to see Atomic::load (can be without acquire) for loads on reader side to match the Atomic RMW on writer side. This would give us at least coherency for the load.

Okay, I added Atomic::load() where necessary to support coherency, although, first of all, it is not the case in a significant number of places in the codebase for cmpxchg, and, second of all, I do think it is not needed in this particular case. We do not need atomicity here as there is no possible concurrent mutation of that value, as there can be only one such, which happed before the read happens.

@toxaart
Copy link
Contributor Author

toxaart commented Aug 12, 2025

Conventionally, the Atomic::X is used to access volatile vars, so the getters should be updated as well.

I do not think so, getters return a copy of the thread's address.

Maybe such "heavy" operation contains enough instructions that CPU doesn't/couldn't reorder the important read... Just to be explicit, the following is the problematic scenario I had in mind, and I assume it's the same one Aleksey talked about.

I am not familiar with herd7 tool and with ARM nuances, but I do not think the example is relevant. This scenario cannot be applied to this particular case, as we never will have update of the value happening concurrently with the read, as there is an expensive operation between the two. Let's not overengineer the solution for this failing code path.

@albertnetymk
Copy link
Member

I do not think so, getters return a copy of the thread's address.

My previous msg was probably unclear -- I meant to use Atomic::load in the getter, which is exactly what is in the latest revision.

... as we never will have update of the value happening concurrently with the read, as there is an expensive operation between the two

I think it's fragile to rely on surrounding code to maintain the desired memory ordering. An acquire fence on the reader side makes it more explicit and convey the intention clearly. Maybe this is subjective, up to you and other reviewers to decide.

@dholmes-ora
Copy link
Member

dholmes-ora commented Aug 17, 2025

Our convention for lock-free code is that we should use Atomic::load and Atomic::store for all accesses to the shared variable. - this serves as clear documentation of the lock-free nature of the code. The use, or not, of volatile on the declaration of the shared variable is not a convention we have firmly established or documented. I am in the camp that the volatile should not be present on the declaration as we have internalised/encapsulated the volatile within the Atomic functions.

@albertnetymk suggestion to use explicit release between the store and the pthread_kill does also address the visibility concern unlike the earlier release_store suggestion which puts the release in the wrong place. On further reflection my suggestion about the fence should also make the fence explicit between the store and the pthread_kill rather than being inside the setter method (which again by our own conventions should be renamed to have fence in the name if we did that).

Arguably, as stated much much earlier, we don't need any explicit memory ordering here in practice as the signal implementation must provide its own ordering guarantees to actually function - and there is no way a compiler would ever re-order the store with the pthread_kill call!

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, as stated much much earlier, we don't need any explicit memory ordering here in practice as the signal implementation must provide its own ordering guarantees to actually function - and there is no way a compiler would ever re-order the store with the pthread_kill call!

I see. I think I misunderstood the problem a bit. Hopefully, I get it this time -- the non-inline function-call should prevent compiler-reordering and the system-call inside pthread_kill should prevent CPU-reordering. The only thing needed here is to ensure the store-to-the-shared-var becomes visible on other CPUs, e.g. not hidden inside store-buffer. Conceptually, pthread_kill performs a "store" as well, so the invariant is that the two stores should be ordered. x64 uses FIFO on store-buffer, so everything is fine. OTOH, aarch64 needs some amending to enforce FIFO for visibility of these two stores.

Atomic::replace_if_null should be enough to provide the ordering.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks.

FTR use of volatile is the preferred convention.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 20, 2025
@toxaart
Copy link
Contributor Author

toxaart commented Aug 20, 2025

Thanks @dholmes-ora

/integrate

@toxaart
Copy link
Contributor Author

toxaart commented Aug 20, 2025

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Aug 20, 2025
@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@toxaart
Your change (at version ee4cead) is now ready to be sponsored by a Committer.

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@toxaart
Your change (at version ee4cead) is now ready to be sponsored by a Committer.

@albertnetymk
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

Going to push as commit 4ffd2a8.
Since your change was applied there have been 150 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 20, 2025
@openjdk openjdk bot closed this Aug 20, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Aug 20, 2025
@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@albertnetymk @toxaart Pushed as commit 4ffd2a8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants