-
Notifications
You must be signed in to change notification settings - Fork 3k
Mitigate concurrent classes definition in RunnerClassLoader #43022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Status for workflow
|
As far as I can tell, this change, and the discussion which preceded it, presumes that there is actually a measurable performance deficit associated with the existing algorithm. But I didn't see anything that shows that the overall performance could be improved in this way. I guess the theory is that loading the resources is expensive enough that doing it extra times would have a cost, but that's just my guess. Is there any way we could measure to see if startup time and/or time to first request is improved with this change? |
I think we probably need to test three things:
Because otherwise it's quite hard to compare things. |
The intent of this pull request is reducing the number of concurrent classes definition as reported in the linked issue. Said that reverting the patch making this class loader non-blocking doesn't seem an option to me, unless we don't want to drop virtual threads support, I already tested these 3 options
against the quarkus-startup |
When this was designed, duplicate loading was an expected behavior of the algorithm. I think the discussion on the linked bug lost sight of that pretty quickly. So reducing it as a primary goal doesn't make a lot of sense to me. OTOH if we can show that the duplicate loading does in fact have a negative performance impact (maybe in total CPU time?), then it would make sense to say "this algorithm isn't actually working the way we want" and make this change to the algorithm itself as a way to fix the deficit. Otherwise my opinion, FWIW, is that we should leave it alone for now. |
I totally agree on everything.
In theory the duplicated class loading very likely has a negative impact. What I'm saying is that in practice, at least using the benchmark that I mentioned above, this impact is too small (a few tens of duplicated definitions, moreover made concurrently by different threads, on a total of many thousand of classes) for being measured reliably and repeatably. |
I didn't measured yet, but the effects here are:
Having a single check which mitigate all this madness is a win to me, if it won't make the code less maintainable.
And than decide which commits keep e.g. JFR events and/or these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep this change simple without adding more states to this complex already state machine: we trade simplicity and some effectiveness embracing volatility.
synchronized
if contended is not free, but instead can cause inflation, which increase RSS as well...
I won't introduce any form of waiting, but just checking if the last loaded resource name match what we want to load and in case, check again with findClass
before retrying if nothing is found (which shouldn't happen really, if the proper barrier(s) are used)
Checking again with a Also I'm still concerned that whatever change I try in this area, its impact on the overall startup time will be however too small to be measured, but I'm open to try different implementations before taking a decision. |
Let's say you store in a field the name of the last successufully defined class in the jar resource: you have to use setVolatile/setRelease and getVolatile/getAcquire to make sure that there's the right synchronize-with relation (still an happens-before one , see https://preshing.com/20130823/the-synchronizes-with-relation/) with i.e.
|
But, that assumes that the tradeoff - holding a lock and wait/notify ping-pong, plus the added cognitive load of tracking resource lifecycle right on the interface - is cheaper. We don't really know that's the case. If the effect is as small as it seems, then maybe we can just let it be "bad" for now with the idea of not making it harder to improve later, for example by switching to mmap'd archives and changing the way we manage lifecycle. A clean abstraction is worth 1000x its LOC in microoptimizations. |
I refer to the idea of PR I got in mind here -> see #43022 (review) Sorry I didn't materialized the comment earlier ^^ |
@mariofusco @dmlloyd @franz1981 Where are we on this one? |
In this current form could be troublesome;
Using a volatile field with an immutable pair with String and CompletableFuture<?> is easier to make it right imo |
My intention is ultimately to replace the majority of this logic, so I don't think it matters too much what we do in the near term. |
Can you please clarify what you have in mind? Are you planning to read the jars through memory mapping? Can we maybe schedule a call to discuss this in more details? |
Yeah probably memory mapping. We can chat this week if you want to. It's part of what I want to cover in the class loading WG though so chances are good that we would repeat the discussion when that is kicked off. WG proposal link: #43749 |
@mariofusco @dmlloyd what's the status on this one? Should it be recondiered when we will have a "classloading revamping" WG? |
IDK @cescoffier but we have this problem (and TBH wildfly too): This is collected via @brunobat benchmark on OTEL and it shows a huge amount of hibernate-related linkage errors due to the worker threads (which far exceed the number of cores, clearly) trying to define the same classes. |
Was this before or after a391ab7 was merged (that commit is found in 3.26.0 and later)? |
I think the benchmark from @brunobat was 3.26 RC as platform version @mariofusco have a test for this IIRC? |
One idea I'm playing with is to do a limited scan of the class bytes and attempt to preload the superclass (and maybe interfaces). This could mitigate the problem by reducing the window of time that is the "black box" of |
My app is currently using 3.24.0.CR1. |
The intent of this pull request is to mitigate the concurrent loading (or define to be more precise) of the same classes by multiple threads as reported here by @gsmet.
Note that I wrote "mitigate" instead of "solve", because I believe that given the concurrent nature of our classloader I believe that it's impossible to structurally solve the problem or at least that a solution consistently preventing ANY concurrent class definition will have a cost in terms of performances and/or memory occupation that will largely overcome its advantages.
As suggested by @franz1981 the main problem that we have at the moment is in the fact that multiple threads could try to define the same class. This behaviour is expected and inherent with the classloader's concurrent mechanisms. When this happens the classloader raises a
LinkageError
that is properly managed by discarding the attempted duplicated class definition and returning the class defined by the thread that won the race. This means however that we're basically using an exception for the normal control flow, which is a known performance antipattern, and something that should be avoided as much as possible.To diagnose this problem I simply printed a log statement like
System.out.println("Duplicated class definition: " + name);
in the catch block of that
LinkageError
. Giving a single run of thefullMicroProfile
benchmark of thequarkus-startstop
application with this set up I saw that in average that duplicated class definition is performed around 50 times per run:As anticipated this pull request allows, with a negligible cost, to largely mitigate this problem, so that, with this fix in place, the typical run of the same application now only has from 2
to maximum 5
of those concurrent class definition.
The implementation of this improvement uses a single value cache to store the class currently to be defined from a given jar and blocks the other threads attempting to deifne that same class, making them to wait until the first thread that initiated the class definition process has completed it. Note that for virtual threads this blocking heuristic cannot be used and it is necessary to keep the completely non-blocking behaviour that we had before.
/cc @dmlloyd @Sanne @geoand @gsmet @franz1981