Skip to content

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Aug 13, 2025

This provides the following new metrics:

  • ProcessSize event (new, periodic)
    • vsize (for analyzing address-space fragmentation issues)
    • RSS including subtypes (subtypes are useful for excluding atypical issues, e.g. kernel problems that cause large file buffer bloat)
    • peak RSS
    • process swap (if we swap we cannot trust the RSS values, plus it indicates bad sizing)
    • pte size (to quickly see if we run with a super-large working set but an unsuitably small page size)
  • LibcStatistics (new, periodic)
    • outstanding malloc size (important counterpoint to whatever NMT tries to tell me, which alone is often misleading)
    • retained malloc size (super-important for the same reason)
    • number of libc trims the hotspot executed (needed to gauge the usefulness of the retain counter, and to see if a customer employs native heap auto trimming (-XX:TrimNativeHeapInterval)
  • NativeHeapTrim (new, event-driven) (for both manual and automatic trims)
    • RSS before and RSS after
    • RSS recovered by this trim
    • whether it was an automatic or manual trim
    • duration
  • JavaThreadStatistic
    • os thread counter (new field) (useful to understand the behavior of third-party code in our process if threads are created that bypass the JVM. E.g. some custom launchers do that.)
    • nonJava thread counter (new field) (needed to interprete the os thread counter)

Notes:

  • we already have ResidentSetSize event, and the new ProcessSize event is a superset of that. I don't know how these cases are handled. I'd prefer to throw the old event out, but JMC has a hard-coded chart for RSS, so I kept it in unless someone tells me to remove it.

  • Obviously, the libc events are very platform-specific. Still, I argue that these metrics are highly useful. We want people to use JFR and JMC; people include developers that are dealing with performance problems that require platform-specific knowledge to understand. See my comment in the JBS issue.

I provided implementations, as far as possible, to Linux, MacOS and Windows.

Testing:

  • ran the new tests manually and as part of GHAs

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8365306: Provide OS Process Size and Libc statistic metrics to JFR (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26756/head:pull/26756
$ git checkout pull/26756

Update a local copy of the PR:
$ git checkout pull/26756
$ git pull https://git.openjdk.org/jdk.git pull/26756/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26756

View PR using the GUI difftool:
$ git pr show -t 26756

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26756.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 13, 2025

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 13, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Aug 13, 2025

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@tstuefe tstuefe force-pushed the JDK-8365306-Provide-OS-Process-Size-and-Libc-statistic-metrics-to-JFR branch 3 times, most recently from 4828a1b to b24c784 Compare August 15, 2025 12:14
@tstuefe tstuefe force-pushed the JDK-8365306-Provide-OS-Process-Size-and-Libc-statistic-metrics-to-JFR branch from b24c784 to 39e282a Compare August 15, 2025 15:06
@tstuefe tstuefe marked this pull request as ready for review August 16, 2025 04:16
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 16, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 16, 2025

Webrevs

@tstuefe
Copy link
Member Author

tstuefe commented Aug 17, 2025

label /hotspot-jfr

@stefank
Copy link
Member

stefank commented Aug 18, 2025

/label hotspot-jfr

@openjdk
Copy link

openjdk bot commented Aug 18, 2025

@stefank
The hotspot-jfr label was successfully added.

@egahlin
Copy link
Member

egahlin commented Aug 19, 2025

What is the problem with adding RSS metrics to the existing ResidentSetSize event?

@tstuefe
Copy link
Member Author

tstuefe commented Aug 19, 2025

What is the problem with adding RSS metrics to the existing ResidentSetSize event?

You mean adding the vsize, swap etc fields to ResidentSetSize?

I thought about that, but then it would be weirdly misnamed. RSS has a very specific meaning. So we would have ResidentSetSize.vsize, ResidentSetSize.swap, ResidentSetSize.rss (?)

I also thought about splitting them up and add one event per value, following the "ResidentSetSize" pattern. So, one event for "VirtualSize", one for "Swap" etc. Apart from not liking the fine granularity, having these fields grouped in one event has multiple advantages. Mainly, I can build myself graphs in JMC for all these fields in one graph and correlate all the values. It is also cheaper to obtain (just one parsing operation for /proc/meminfo, for instance).

@egahlin
Copy link
Member

egahlin commented Aug 19, 2025

You mean adding the vsize, swap etc fields to ResidentSetSize?

I thought about that, but then it would be weirdly misnamed. RSS has a very specific meaning. So we would have ResidentSetSize.vsize, ResidentSetSize.swap, ResidentSetSize.rss (?)

I was thinking something like this:

<Event name="ResidentSetSize" category="Java Virtual Machine, Memory" ... >
  <Field type="ulong" contentType="bytes" name="size" .../>
  <Field type="ulong" contentType="bytes" name="peak" ..  />
  <Field type="ulong" contentType="bytes" name="anonymous"   />
  <Field type="ulong" contentType="bytes" name="file" />
  <Field type="ulong" contentType="bytes" name="sharedMemory"  />
</Event>

When it comes to non-rss metrics, there is a Swap event, but not sure it is appropriate? Regarding other metrics, perhaps they should go into other events, or perhaps new events should be created. I haven't had time to look into it.

Mainly, I can build myself graphs in JMC for all these fields in one graph and correlate all the values.

Periodic events can be emitted with the same timestamp. That way they can be grouped together. This is what we do with NativeMemoryUsage and NativeMemoryTotal.

These events will likely be around for a long time. We shouldn't design them just to match the workflow of one tool as it currently works. New functionality can be added in the future.

It is also cheaper to obtain (just one parsing operation for /proc/meminfo, for instance).

We should not sacrifice the design unless the overhead is significant.

@tstuefe
Copy link
Member Author

tstuefe commented Aug 20, 2025

You mean adding the vsize, swap etc fields to ResidentSetSize?
I thought about that, but then it would be weirdly misnamed. RSS has a very specific meaning. So we would have ResidentSetSize.vsize, ResidentSetSize.swap, ResidentSetSize.rss (?)

I was thinking something like this:

<Event name="ResidentSetSize" category="Java Virtual Machine, Memory" ... >
  <Field type="ulong" contentType="bytes" name="size" .../>
  <Field type="ulong" contentType="bytes" name="peak" ..  />
  <Field type="ulong" contentType="bytes" name="anonymous"   />
  <Field type="ulong" contentType="bytes" name="file" />
  <Field type="ulong" contentType="bytes" name="sharedMemory"  />
</Event>

Hmm, yes, that would be one alternative.

When it comes to non-rss metrics, there is a Swap event, but not sure it is appropriate? Regarding other metrics, perhaps they should go into other events, or perhaps new events should be created. I haven't had time to look into it.

Mainly, I can build myself graphs in JMC for all these fields in one graph and correlate all the values.

Periodic events can be emitted with the same timestamp. That way they can be grouped together. This is what we do with NativeMemoryUsage and NativeMemoryTotal.

These events will likely be around for a long time. We shouldn't design them just to match the workflow of one tool as it currently works. New functionality can be added in the future.

I spent a lot of the time allotted to this PR deliberating on how exactly to shape the events. I didn't want just to jam them in. And I agree. I dislike it how tight the coupling is between the data model and the renderer in the case of JMC. It leads to many UI inconsistencies and gives the wrong motivation.

Unfortunately, we live in a world where JMC is the only serious and freely available tool, and where JFR protocol is not open, so I fear this is likely to stay that way.

In the case of this PR, an argument can be made for grouping OS-side process-related metrics together into one event. Doing it the way I did is not absurd; this is how these metrics are often presented: vsize+rss+swap, complementing each other and giving a holistic view of the OS side of process memory consumption—especially rss+swap. You need swap to be able to explain rss.

I also agonized about the platform-specific aspects of this. This is rather Linux-centric. But again, reality: Linux is by far the most important platform.

On Windows, for example, we can ask for the size of the commit charge. That is a nice complementary information. On Linux, we also have a commit charge (as in, how much of the process memory was committed by the OS, underlaid with swap according to the kernel overcommit rules). That would be useful to know, since it can explain native OOMs when the OS runs out of swap. But I don't see a way to obtain that information on Linux. Since I only have the number on Windows, I left it out. And I realize this is a bit arbitrary, since I included values I only get for Linux.

Shaping events the right way is hard, and I am thankful for any direction from your side.

It is also cheaper to obtain (just one parsing operation for /proc/meminfo, for instance).

We should not sacrifice the design unless the overhead is significant.

Hm, sure.

@roberttoyonaga
Copy link
Contributor

roberttoyonaga commented Aug 20, 2025

@tstuefe
I think that putting those metrics (vsize, swap, rss, etc) under a new ProcessSize event (while not changing jdk.ResidentSetSize) makes sense. This allows JMC to continue as normal, and it also keeps the relevant metrics together so they can be easily interpreted in relation to each other.

These events will likely be around for a long time. We shouldn't design them just to match the workflow of one tool as it currently works. New functionality can be added in the future.

@egahlin , is there a procedure for deprecating event types? Or are events permanent once introduced? If you decide to go the above route (leaving all the metrics in ProcessSize), maybe it would be possible to mark the jdk.ResidentSetSize event for deprecation to allow JMC time to eventually switch over to using the new event for its charts?

When it comes to non-rss metrics, there is a Swap event, but not sure it is appropriate?

I think this new swap data is process-specific so is not already covered by jdk.SwapSpace, which shows the max amount available and how much is currently free to the OS.
I think it could be reasonable to leave the process' swap data in ProcessSize rather than the jdk.SwapSpace which contains data for the whole OS. The connecting thread is that this info is a "process" metric being grouped with other memory metrics for this specific process.

Another option, in order to avoid duplication, could be to leave the RSS metrics in the jdk.ResidentSetSize event, not put them in the ProcessSize event, and correlate them by giving them the same timestamp (as Erik mentions above).
Something similar was also done with the NativeMemoryUsagePeak and NativeMemoryUsageTotalPeak events (which are GraalVM specific) to avoid having to modify NativeMemoryUsage and NativeMemoryUsageTotal. Admittedly, this is probably not as clean or convenient.

Either way, I agree that putting all the metrics in ProcessSize in their own individual events is not the best way.

@tstuefe
Copy link
Member Author

tstuefe commented Aug 26, 2025

Thank you, @roberttoyonaga ! Let's see what @egahlin writes.

@egahlin
Copy link
Member

egahlin commented Aug 26, 2025

Events can be viewed as tables in a relational database. Duplicating information or designing tables based on the user interface is not a good idea. Correlating or grouping data from different events is not a new problem. The way we have solved it in the past is by using an explicit ID, similar to a foreign key (gcId, compileId, safepointId etc), or by using the same timestamp (ActiveSetting, NativeMemory* etc.). In this case, I think a timestamp makes more sense. Tools can place the value on a timeline and have memory on the y-axis.

An alternative design is to do something similar to the NativeMemoryUsage event. That is, to split the data per type

<Event name="ProcessMemoryUsage">
  <Field type="MemoryType" name="type" label="Type" />
  <Field type="ulong" contentType="bytes" name="amount" label="Amount"/>
</Event>

and then emit what is supported on a specific platform and have all those event with the same timestamp.

I'm not sure which of the following memory types make sense to include, there is also a maintenance aspect, or if it the list is correct or complete, but the data would be normalized, even if it differs depending on the platform.

Memory Type                 Linux    macOS   Windows Description
-------------------------- ------ -------- --------- ----------------------------------------------------------'
Resident Set Size             Yes      Yes         -  Resident memory currently in physical RAM
Private Bytes                   -        -       Yes  Privately allocated memory
Shared Memory                 Yes      Yes       Yes  Memory shared among processes
Virtual Memory Size           Yes      Yes       Yes  Total reserved address space
Commit Charge                   -        -       Yes  Promised/guaranteed memory for process
Mapped File Memory            Yes      Yes       Yes  Memory mapped from files
Page Table Size               Yes        -         -  Memory used for page tables
Swap / Swapped                Yes      Yes       Yes  Memory swapped out to disk
Wired Memory                    -      Yes         -  Locked memory that can't be paged out
Compressed Memory               -      Yes         -  Memory compressed by the system
Paged Pool                      -        -       Yes  Pool for pageable kernel allocations
Non-paged Pool                  -        -       Yes  Pool for non-pageable kernel allocations
Working Set                     -        -       Yes  Memory pages currently in RAM for the process
Stack                         Yes      Yes       Yes  Memory used by call stacks
Heap / Malloc                 Yes      Yes       Yes  Dynamically allocated memory (heap)
Image                         Yes      Yes         -  Memory used for executables and loaded libraries

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 25, 2025

@tstuefe This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@tstuefe
Copy link
Member Author

tstuefe commented Sep 25, 2025

/keepalive

@openjdk
Copy link

openjdk bot commented Sep 25, 2025

@tstuefe The pull request is being re-evaluated and the inactivity timeout has been reset.

@openjdk
Copy link

openjdk bot commented Sep 25, 2025

@tstuefe this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8365306-Provide-OS-Process-Size-and-Libc-statistic-metrics-to-JFR
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot [email protected] hotspot-jfr [email protected] merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

4 participants