Skip to content

Conversation

dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Jul 30, 2025

After the changes in JDK-8361912 we could "return " the carrier thread from cv_internal_thread_to_JavaThread, but before we hit the transition disabler the virtual thread could unmount. As a result when we execute this code:

  if (is_virtual) {
    // 1st need to disable mount/unmount transitions
    transition_disabler.init(jthread);

    carrier_thread = Handle(THREAD, java_lang_VirtualThread::carrier_thread(thread_h()));
    if (carrier_thread != nullptr) {
      java_thread = java_lang_Thread::thread(carrier_thread());
    }
  }

we hit the implicit else where "carrier_thread == nullptr" and we do nothing, but java_thread still holds the old carrier, which we then perform the handshake operation with:

  void do_thread(Thread* th) override {
    Thread* current = Thread::current();

    bool is_virtual = java_lang_VirtualThread::is_instance(_thread_h());
    if (_java_thread != nullptr) {
      if (is_virtual) {
        // mounted vthread, use carrier thread state
        oop carrier_thread = java_lang_VirtualThread::carrier_thread(_thread_h());
        _thread_status = java_lang_Thread::get_thread_status(carrier_thread);
      } else {

But the _java_thread no longer has a carrier, so get_thread_status is passed null and we crash.

Simple fix is to clear java_thread when we find a null carrier oop. Also added an assert to guard against a null carrier oop in the handshake code, and added some additional commentary.

Testing:

  • com/sun/management/HotSpotDiagnosticMXBean/DumpThreads.java
  • tier 5 and 6

Thanks


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8364314: java_lang_Thread::get_thread_status fails assert(base != nullptr) failed: Invalid base (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26544/head:pull/26544
$ git checkout pull/26544

Update a local copy of the PR:
$ git checkout pull/26544
$ git pull https://git.openjdk.org/jdk.git pull/26544/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26544

View PR using the GUI difftool:
$ git pr show -t 26544

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26544.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 30, 2025

👋 Welcome back dholmes! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 30, 2025

@dholmes-ora This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8364314: java_lang_Thread::get_thread_status fails assert(base != nullptr) failed: Invalid base

Reviewed-by: amenkov, shade, dcubed, pchilanomate, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 56 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 30, 2025
@openjdk
Copy link

openjdk bot commented Jul 30, 2025

@dholmes-ora The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Jul 30, 2025

Webrevs

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 30, 2025
Copy link
Member

@dcubed-ojdk dcubed-ojdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up. I have a single typo and a suggested rewording for a comment.

@dcubed-ojdk
Copy link
Member

Ping @mgronlun @egahlin:

In src/hotspot/share/jfr/jni/jfrJavaSupport.cpp, the static get_native(ThreadsListHandle& tlh, jobject thread)
function calls cv_internal_thread_to_JavaThread. There are two other functions in /jfrJavaSupport.cpp
that call get_native and both of those functions create a ThreadsListHandle before make that call which
is good.

However, both JfrJavaSupport::exclude and JfrJavaSupport::include do their virtual thread processing
before the creation of the ThreadsListHandle so what is protecting the virtual thread and the underlying
carrier thread?

@dcubed-ojdk
Copy link
Member

For the calls to tlh.cv_internal_thread_to_JavaThread in src/hotspot/share/prims/jvm.cpp,
the new logic added by:
JDK-8361912 ThreadsListHandle::cv_internal_thread_to_JavaThread does not deal with a virtual thread's carrier thread

will allow the carrier JavaThread to be returned to the caller.
However, NONE of the callers in jvm.cpp use a JvmtiVTMSTransitionDisabler
to prevent the virtual thread from being unmounted from the
carrier thread. So at the time of the remainder of the logic in
the JVM calls, the JavaThread* receiver may be a stale
carrier thread that no longer has a virtual thread mounted.

I don't know if this is an issue or not.

@dcubed-ojdk
Copy link
Member

In src/hotspot/share/prims/whitebox.cpp, we have three functions:

  • WB_HandshakeReadMonitors
  • WB_HandshakeWalkStack
  • WB_AsyncHandshakeWalkStack

that call tlh.cv_internal_thread_to_JavaThread and each of them
does a handshake operation with the 'target' thread. If the target
is a virtual thread, then we'll do the handshake with the carrier thread
and not with the virtual thread. None of these functions in whitebox.cpp
use a JvmtiVTMSTransitionDisabler to prevent the virtual thread from
being unmounted from the carrier thread. I could be missing it, but I
don't see a way that the virtual thread info that should be mounted
on the carrier thread is being passed to the handshake code. So what
does the handshake code do when it gets down into the guts and
the carrier thread no longer has a virtual thread mounted on it or if
a different virtual thread is mounted on it?

I'm not sure who to ping for answering this query.

@dcubed-ojdk
Copy link
Member

@dholmes-ora - I went back and took a wider look at all the code that calls
tlh.cv_internal_thread_to_JavaThread and posted three more comments
that are really more related to the original work done with:
JDK-8361912 ThreadsListHandle::cv_internal_thread_to_JavaThread does not deal with a virtual thread's carrier thread

I think for fixing the crash that we're seeing in the CI, this fix is good to go,
but I do think there are some wider questions that need to be addressed
about virtual threads, carrier threads and how things should work. Perhaps
I'm worried about nothing and the other callsites to tlh.cv_internal_thread_to_JavaThread
are just fine...

@dcubed-ojdk
Copy link
Member

dcubed-ojdk commented Jul 30, 2025

I just finished catching up on this other issue/PR:

JDK-8361103 java_lang_Thread::async_get_stack_trace does not properly protect JavaThread
#26119

And this comment from @sspitsyn stuck out to me w.r.t. to this fix:

#26119 (comment)

But, please, note that the JvmtiVTMSTransitionDisabler mechanism is enabled
only when there is a JVMTI agent. Otherwise, it has been disabled for scalability
purposes to exclude potentially high performance overhead at the VTMS
transition points.

The above comment from Serguei calls into question this suggested change that I posted on the PR:

https://github.com/openjdk/jdk/pull/26544/files#r2243188114

If the JvmtiVTMSTransitionDisabler only works when there's an agent attached,
I don't think we're protecting the carrier thread at all since it can become unmounted
at anytime when there's no agent.

@alexmenkov
Copy link

If the JvmtiVTMSTransitionDisabler only works when there's an agent attached, I don't think we're protecting the carrier thread at all since it can become unmounted at anytime when there's no agent.

Right. The issue was discovered several weeks ago: https://bugs.openjdk.org/browse/JDK-8361913 Work in progress

Copy link
Contributor

@pchilano pchilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@pchilano
Copy link
Contributor

For the calls to tlh.cv_internal_thread_to_JavaThread in src/hotspot/share/prims/jvm.cpp, the new logic added by: JDK-8361912 ThreadsListHandle::cv_internal_thread_to_JavaThread does not deal with a virtual thread's carrier thread

will allow the carrier JavaThread to be returned to the caller. However, NONE of the callers in jvm.cpp use a JvmtiVTMSTransitionDisabler to prevent the virtual thread from being unmounted from the carrier thread. So at the time of the remainder of the logic in the JVM calls, the JavaThread* receiver may be a stale carrier thread that no longer has a virtual thread mounted.

I don't know if this is an issue or not.

Those two methods should only be called for platform threads. VirtualThread class overrides Thread.interrupt, and Thread.setPriority ignores virtual threads. Maybe we should add an assert there after the JDK-8361912 changes.

@pchilano
Copy link
Contributor

In src/hotspot/share/prims/whitebox.cpp, we have three functions:

  • WB_HandshakeReadMonitors
  • WB_HandshakeWalkStack
  • WB_AsyncHandshakeWalkStack

that call tlh.cv_internal_thread_to_JavaThread and each of them does a handshake operation with the 'target' thread. If the target is a virtual thread, then we'll do the handshake with the carrier thread and not with the virtual thread. None of these functions in whitebox.cpp use a JvmtiVTMSTransitionDisabler to prevent the virtual thread from being unmounted from the carrier thread. I could be missing it, but I don't see a way that the virtual thread info that should be mounted on the carrier thread is being passed to the handshake code. So what does the handshake code do when it gets down into the guts and the carrier thread no longer has a virtual thread mounted on it or if a different virtual thread is mounted on it?

I'm not sure who to ping for answering this query.

Currently these are only used in tests with platform threads. If we would pass a virtual thread as argument, then in those cases you mentioned we would read/print the state of some other thread in the handshake closure, but there shouldn’t be a crash.

@sspitsyn
Copy link
Contributor

sspitsyn commented Jul 31, 2025

Right. The issue was discovered several weeks ago: https://bugs.openjdk.org/browse/JDK-8361913 Work in progress

Also, I think a TransitionDisabler is more safe to install before the cv_internal_thread_to_JavaThread() is called. It would also prevent this bug to reproduce.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable safety fix. Looks good to me modulo comment wording which is being disussed.

@AlanBateman
Copy link
Contributor

If the JvmtiVTMSTransitionDisabler only works when there's an agent attached,
I don't think we're protecting the carrier thread at all since it can become unmounted
at anytime when there's no agent.

I've pushed some initial changes to the loom repo to deal with the transitions. This drops the use of JvmtiVTMSTransitionDisabler as this requires a JVMTI environment. We have a new stress too that bashes on dumpThreads while many virtual threads are parking and unparking. Need to go over this with @alexmenkov and @sspitsyn before proposing anything for main line.

@dholmes-ora
Copy link
Member Author

@dcubed-ojdk I had already examined all the other clients of cv_internal_thread_to_javaThread and determined that they did not involve virtual threads - as referenced in our internal discussions. Unfortunately I forgot to put a comment to that affect in the JBS issue. In any case those usages are not relevant to the current PR.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 3, 2025
@dholmes-ora
Copy link
Member Author

Thanks for the reviews @shipilev , @dcubed-ojdk , @pchilano and @sspitsyn .

As noted above this particular code is problematic for a range of reasons, but for now this fix maintains the pretense that the transition disabler actually works, and @AlanBateman will be fixing that as noted above.

Other clients of cv_internal_thread_to_Java_thread may potentially have issues if they ever have to deal with virtual threads (they currently don't or else are already being addressed), but that is not related to this fix, nor even the fix applied in JDK-8361912.

Thanks again for the reviews. If someone could re-review the comment changes that would be appreciated. Thanks.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 4, 2025
@AlanBateman
Copy link
Contributor

As noted above this particular code is problematic for a range of reasons, but for now this fix maintains the pretense that the transition disabler actually works, and @AlanBateman will be fixing that as noted above.

Yes, get_thread_snapshot has a bug tail. Longer term we need to put in infrastructure to interact with a specific virtual thread or all virtual threads. We need this for a second phase of thread dump anyway. Short term we can use the same approach as Thread::getStackTrace by suspending the virtual thread when unmounted, have the handshake op check that the virtual thread and continuation is mounted, and retry if attempting a snapshot during a transition. I've put a possible change in this draft PR - I need to discuss with Serguei and Alex on how they want to handle this.

@dholmes-ora
Copy link
Member Author

Thanks for the re-review @alexmenkov !

/integrate

@openjdk
Copy link

openjdk bot commented Aug 4, 2025

Going to push as commit 84a4a36.
Since your change was applied there have been 76 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 4, 2025
@openjdk openjdk bot closed this Aug 4, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 4, 2025
@dholmes-ora dholmes-ora deleted the 8364314-threadSMR branch August 4, 2025 21:48
@openjdk
Copy link

openjdk bot commented Aug 4, 2025

@dholmes-ora Pushed as commit 84a4a36.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

7 participants