Mitigate concurrent classes definition in RunnerClassLoader #43022

mariofusco · 2024-09-04T14:58:47Z

The intent of this pull request is to mitigate the concurrent loading (or define to be more precise) of the same classes by multiple threads as reported here by @gsmet.

Note that I wrote "mitigate" instead of "solve", because I believe that given the concurrent nature of our classloader I believe that it's impossible to structurally solve the problem or at least that a solution consistently preventing ANY concurrent class definition will have a cost in terms of performances and/or memory occupation that will largely overcome its advantages.

As suggested by @franz1981 the main problem that we have at the moment is in the fact that multiple threads could try to define the same class. This behaviour is expected and inherent with the classloader's concurrent mechanisms. When this happens the classloader raises a LinkageError that is properly managed by discarding the attempted duplicated class definition and returning the class defined by the thread that won the race. This means however that we're basically using an exception for the normal control flow, which is a known performance antipattern, and something that should be avoided as much as possible.

To diagnose this problem I simply printed a log statement like

System.out.println("Duplicated class definition: " + name);

in the catch block of that LinkageError. Giving a single run of the fullMicroProfile benchmark of the quarkus-startstop application with this set up I saw that in average that duplicated class definition is performed around 50 times per run:

Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.net.impl.TCPServerBase
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.netty.channel.AbstractChannel$AbstractUnsafe$6
Duplicated class definition: io.netty.buffer.PoolArena$1

As anticipated this pull request allows, with a negligible cost, to largely mitigate this problem, so that, with this fix in place, the typical run of the same application now only has from 2

Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3

to maximum 5

Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.vertx.core.http.impl.HttpServerImpl$HttpStreamHandler
Duplicated class definition: io.quarkus.vertx.http.runtime.VertxHttpRecorder$WebDeploymentVerticle$3
Duplicated class definition: io.netty.channel.AbstractChannel$AbstractUnsafe$6

of those concurrent class definition.

The implementation of this improvement uses a single value cache to store the class currently to be defined from a given jar and blocks the other threads attempting to deifne that same class, making them to wait until the first thread that initiated the class definition process has completed it. Note that for virtual threads this blocking heuristic cannot be used and it is necessary to keep the completely non-blocking behaviour that we had before.

/cc @dmlloyd @Sanne @geoand @gsmet @franz1981

quarkus-bot · 2024-09-04T20:36:27Z

Status for workflow `Quarkus CI`

This is the status report for running Quarkus CI on commit 3cb0952.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

dmlloyd · 2024-09-05T14:39:15Z

As far as I can tell, this change, and the discussion which preceded it, presumes that there is actually a measurable performance deficit associated with the existing algorithm. But I didn't see anything that shows that the overall performance could be improved in this way. I guess the theory is that loading the resources is expensive enough that doing it extra times would have a cost, but that's just my guess.

Is there any way we could measure to see if startup time and/or time to first request is improved with this change?

gsmet · 2024-09-05T14:42:10Z

I think we probably need to test three things:

Quarkus main
Quarkus main + revert the CL patch
Quarkus main + this patch

Because otherwise it's quite hard to compare things.

mariofusco · 2024-09-05T15:34:43Z

The intent of this pull request is reducing the number of concurrent classes definition as reported in the linked issue. Said that reverting the patch making this class loader non-blocking doesn't seem an option to me, unless we don't want to drop virtual threads support, I already tested these 3 options

* Quarkus main
* Quarkus main + revert the CL patch
* Quarkus main + this patch

against the quarkus-startup full-microprofile benchmark and in all honesty I didn't find any relevant difference, or at least no difference above the margin of variability resulting from running that benchmark on my laptop. Is there any other benchmark or in general a better way through which we could test this change?

dmlloyd · 2024-09-05T15:44:33Z

When this was designed, duplicate loading was an expected behavior of the algorithm. I think the discussion on the linked bug lost sight of that pretty quickly. So reducing it as a primary goal doesn't make a lot of sense to me. OTOH if we can show that the duplicate loading does in fact have a negative performance impact (maybe in total CPU time?), then it would make sense to say "this algorithm isn't actually working the way we want" and make this change to the algorithm itself as a way to fix the deficit.

Otherwise my opinion, FWIW, is that we should leave it alone for now.

mariofusco · 2024-09-05T16:02:30Z

When this was designed, duplicate loading was an expected behavior of the algorithm. I think the discussion on the linked bug lost sight of that pretty quickly. So reducing it as a primary goal doesn't make a lot of sense to me.

I totally agree on everything.

OTOH if we can show that the duplicate loading does in fact have a negative performance impact (maybe in total CPU time?), then it would make sense to say "this algorithm isn't actually working the way we want" and make this change to the algorithm itself as a way to fix the deficit.

In theory the duplicated class loading very likely has a negative impact. What I'm saying is that in practice, at least using the benchmark that I mentioned above, this impact is too small (a few tens of duplicated definitions, moreover made concurrently by different threads, on a total of many thousand of classes) for being measured reliably and repeatably.

franz1981 · 2024-09-05T16:04:07Z

OTOH if we can show that the duplicate loading does in fact have a negative performance impact (maybe in total CPU time?), then it would make sense to say "this algorithm isn't actually working the way we want" and make this change to the algorithm itself as a way to fix the deficit.

I didn't measured yet, but the effects here are:

allocating (again - N times more) byte[] to read the resource
using it to enter in the JNI realm in defineClass - which means performing a bunch of malloc/free - likely to fail later: see https://github.com/openjdk/jdk/blob/59c4649be37a387efaf100f368b3e9db06d44f3a/src/java.base/share/native/libjava/ClassLoader.c#L252-L278 (performing an heap -> off-heap copy via GetByteArrayRegion)
throwing a linkage exception which likely cause (local) safepoint to happen in order to collect the stack trace

Having a single check which mitigate all this madness is a win to me, if it won't make the code less maintainable.
@mariofusco as a proof I suggest 2 things:

create some custom JFR events to detect these behaviours - it will help debuggability in prod/containers/our tests
setup the QE startstop test with an appropriate number of event loops and switch the event(s) on to see if it happens and off, to compare RSS vs baseline

And than decide which commits keep e.g. JFR events and/or these changes.

franz1981

I suggest to keep this change simple without adding more states to this complex already state machine: we trade simplicity and some effectiveness embracing volatility.
synchronized if contended is not free, but instead can cause inflation, which increase RSS as well...
I won't introduce any form of waiting, but just checking if the last loaded resource name match what we want to load and in case, check again with findClass before retrying if nothing is found (which shouldn't happen really, if the proper barrier(s) are used)

mariofusco · 2024-09-05T16:19:51Z

I won't introduce any form of waiting, but just checking if the last loaded resource name match what we want to load and in case, check again with findClass before retrying if nothing is found (which shouldn't happen really, if the proper barrier(s) are used)

Checking again with a findClass instead of making the thread performing the concurrent class definition to wait as I'm attempting with this pull request, is indeed another heuristic that I could try. I just don't understand of which barrier(s) you're speaking about since I think that there isn't any at the moment. Can you please clarify?

Also I'm still concerned that whatever change I try in this area, its impact on the overall startup time will be however too small to be measured, but I'm open to try different implementations before taking a decision.

franz1981 · 2024-09-05T16:25:23Z

just don't understand of which barrier(s) you're speaking about since I think that there isn't any at the moment. Can you please clarify?

Let's say you store in a field the name of the last successufully defined class in the jar resource: you have to use setVolatile/setRelease and getVolatile/getAcquire to make sure that there's the right synchronize-with relation (still an happens-before one , see https://preshing.com/20130823/the-synchronizes-with-relation/) with defineClass and findClass

i.e.

loadAcquire lastClassDefined;
// LoadLoad; LoadStore barriers
if (lastClassDefined same as className)  {
     // no need to loop here really
     return findClass 
}
// bla bla
try {
 defineClass(className)
 // LoadStore:StoreStore  barriers
 storeRelease lastClassDefined := className
} catch (LinkageEx ) {
    
}

dmlloyd · 2024-09-05T16:28:28Z

OTOH if we can show that the duplicate loading does in fact have a negative performance impact (maybe in total CPU time?), then it would make sense to say "this algorithm isn't actually working the way we want" and make this change to the algorithm itself as a way to fix the deficit.

I didn't measured yet, but the effects here are:

allocating (again - N times more) byte[] to read the resource

using it to enter in the JNI realm in defineClass - which means performing a bunch of malloc/free - likely to fail later: see https://github.com/openjdk/jdk/blob/59c4649be37a387efaf100f368b3e9db06d44f3a/src/java.base/share/native/libjava/ClassLoader.c#L252-L278 (performing an heap -> off-heap copy via GetByteArrayRegion)

throwing a linkage exception which likely cause (local) safepoint to happen in order to collect the stack trace

Having a single check which mitigate all this madness is a win to me, if it won't make the code less maintainable.

But, that assumes that the tradeoff - holding a lock and wait/notify ping-pong, plus the added cognitive load of tracking resource lifecycle right on the interface - is cheaper. We don't really know that's the case.

If the effect is as small as it seems, then maybe we can just let it be "bad" for now with the idea of not making it harder to improve later, for example by switching to mmap'd archives and changing the way we manage lifecycle. A clean abstraction is worth 1000x its LOC in microoptimizations.

franz1981 · 2024-09-05T16:29:15Z

that assumes that the tradeoff - holding a lock and wait/notify ping-pong

I refer to the idea of PR I got in mind here -> see #43022 (review)

Sorry I didn't materialized the comment earlier ^^

cescoffier · 2024-11-10T17:20:38Z

@mariofusco @dmlloyd @franz1981 Where are we on this one?

franz1981 · 2024-11-10T19:23:33Z

In this current form could be troublesome;
eg

Thread 1 cas(null, clazzA) - win
Thread 2 cas(null, classA) - fail but found that it has to wait (but doesn't start doing it yet)
Thread 1 start define, complete it, cas(classA, null) and try notify - none, since Thread 1 is still not within the synchronized region
Thread 1 enter in synchronized and just wait forever

Using a volatile field with an immutable pair with String and CompletableFuture<?> is easier to make it right imo

dmlloyd · 2024-11-10T21:44:00Z

My intention is ultimately to replace the majority of this logic, so I don't think it matters too much what we do in the near term.

mariofusco · 2024-11-11T08:56:45Z

My intention is ultimately to replace the majority of this logic, so I don't think it matters too much what we do in the near term.

Can you please clarify what you have in mind? Are you planning to read the jars through memory mapping? Can we maybe schedule a call to discuss this in more details?

dmlloyd · 2024-11-11T12:54:46Z

Yeah probably memory mapping. We can chat this week if you want to. It's part of what I want to cover in the class loading WG though so chances are good that we would repeat the discussion when that is kicked off.

WG proposal link: #43749

cescoffier · 2025-09-15T06:17:24Z

@mariofusco @dmlloyd what's the status on this one? Should it be recondiered when we will have a "classloading revamping" WG?

franz1981 · 2025-09-16T02:40:53Z

IDK @cescoffier but we have this problem (and TBH wildfly too):

This is collected via @brunobat benchmark on OTEL and it shows a huge amount of hibernate-related linkage errors due to the worker threads (which far exceed the number of cores, clearly) trying to define the same classes.
This is clearly a limit case since antl4 has become much "fatter" and it's using a worker pool (no virtual threads: with 2-3 FJ workers, things are different, clearly...) - but is such a common scenario that probably is worth to work on a solution before the revamping...
For wildfly, I wish to write to them because in some customer case I observed the same exact scenario - and they perform way more classloading than us so...

dmlloyd · 2025-09-16T14:12:00Z

This is collected via @brunobat benchmark on OTEL and it shows a huge amount of hibernate-related linkage errors due to the worker threads (which far exceed the number of cores, clearly) trying to define the same classes.

Was this before or after a391ab7 was merged (that commit is found in 3.26.0 and later)?

franz1981 · 2025-09-16T14:17:41Z

I think the benchmark from @brunobat was 3.26 RC as platform version

@mariofusco have a test for this IIRC?

dmlloyd · 2025-09-16T15:38:20Z

One idea I'm playing with is to do a limited scan of the class bytes and attempt to preload the superclass (and maybe interfaces). This could mitigate the problem by reducing the window of time that is the "black box" of defineClass (see #50091).

brunobat · 2025-09-22T15:27:47Z

I think the benchmark from @brunobat was 3.26 RC as platform version

@mariofusco have a test for this IIRC?

My app is currently using 3.24.0.CR1.
I can create a new branch with a version of your preference.

Mitigate concurrent classes definition in RunnerClassLoader

3cb0952

quarkus-bot bot added the area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins label Sep 4, 2024

franz1981 requested changes Sep 5, 2024

View reviewed changes

franz1981 mentioned this pull request Jan 8, 2025

Places to potentially find startup performance improvements #45436

Open

franz1981 mentioned this pull request Jul 17, 2025

Attempt to reduce spurious linkage errors #48965

Merged

gastaldi added the triage/needs-rebase This PR needs to be rebased first because it has merge conflicts label Sep 16, 2025

dmlloyd mentioned this pull request Sep 16, 2025

Reduce pinning by preloading superclasses #50091

Closed

Mitigate concurrent classes definition in RunnerClassLoader #43022

Are you sure you want to change the base?

Mitigate concurrent classes definition in RunnerClassLoader #43022

Uh oh!

Conversation

mariofusco commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quarkus-bot bot commented Sep 4, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status for workflow Quarkus CI

Uh oh!

dmlloyd commented Sep 5, 2024

Uh oh!

gsmet commented Sep 5, 2024

Uh oh!

mariofusco commented Sep 5, 2024

Uh oh!

dmlloyd commented Sep 5, 2024

Uh oh!

mariofusco commented Sep 5, 2024

Uh oh!

franz1981 commented Sep 5, 2024

Uh oh!

franz1981 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mariofusco commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franz1981 commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmlloyd commented Sep 5, 2024

Uh oh!

franz1981 commented Sep 5, 2024

Uh oh!

cescoffier commented Nov 10, 2024

Uh oh!

franz1981 commented Nov 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmlloyd commented Nov 10, 2024

Uh oh!

mariofusco commented Nov 11, 2024

Uh oh!

dmlloyd commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cescoffier commented Sep 15, 2025

Uh oh!

franz1981 commented Sep 16, 2025

Uh oh!

dmlloyd commented Sep 16, 2025

Uh oh!

franz1981 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmlloyd commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brunobat commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mariofusco commented Sep 4, 2024 •

edited

Loading

quarkus-bot bot commented Sep 4, 2024 •

edited by github-actions bot

Loading

Status for workflow `Quarkus CI`

franz1981 left a comment •

edited

Loading

mariofusco commented Sep 5, 2024 •

edited

Loading

franz1981 commented Sep 5, 2024 •

edited

Loading

franz1981 commented Nov 10, 2024 •

edited

Loading

dmlloyd commented Nov 11, 2024 •

edited

Loading

franz1981 commented Sep 16, 2025 •

edited

Loading

dmlloyd commented Sep 16, 2025 •

edited

Loading