-
Notifications
You must be signed in to change notification settings - Fork 6.3k
8365306: Provide OS Process Size and Libc statistic metrics to JFR #26756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8365306: Provide OS Process Size and Libc statistic metrics to JFR #26756
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
4828a1b
to
b24c784
Compare
b24c784
to
39e282a
Compare
label /hotspot-jfr |
/label hotspot-jfr |
@stefank |
What is the problem with adding RSS metrics to the existing ResidentSetSize event? |
You mean adding the vsize, swap etc fields to ResidentSetSize? I thought about that, but then it would be weirdly misnamed. RSS has a very specific meaning. So we would have ResidentSetSize.vsize, ResidentSetSize.swap, ResidentSetSize.rss (?) I also thought about splitting them up and add one event per value, following the "ResidentSetSize" pattern. So, one event for "VirtualSize", one for "Swap" etc. Apart from not liking the fine granularity, having these fields grouped in one event has multiple advantages. Mainly, I can build myself graphs in JMC for all these fields in one graph and correlate all the values. It is also cheaper to obtain (just one parsing operation for /proc/meminfo, for instance). |
I was thinking something like this:
When it comes to non-rss metrics, there is a Swap event, but not sure it is appropriate? Regarding other metrics, perhaps they should go into other events, or perhaps new events should be created. I haven't had time to look into it.
Periodic events can be emitted with the same timestamp. That way they can be grouped together. This is what we do with NativeMemoryUsage and NativeMemoryTotal. These events will likely be around for a long time. We shouldn't design them just to match the workflow of one tool as it currently works. New functionality can be added in the future.
We should not sacrifice the design unless the overhead is significant. |
Hmm, yes, that would be one alternative.
I spent a lot of the time allotted to this PR deliberating on how exactly to shape the events. I didn't want just to jam them in. And I agree. I dislike it how tight the coupling is between the data model and the renderer in the case of JMC. It leads to many UI inconsistencies and gives the wrong motivation. Unfortunately, we live in a world where JMC is the only serious and freely available tool, and where JFR protocol is not open, so I fear this is likely to stay that way. In the case of this PR, an argument can be made for grouping OS-side process-related metrics together into one event. Doing it the way I did is not absurd; this is how these metrics are often presented: vsize+rss+swap, complementing each other and giving a holistic view of the OS side of process memory consumption—especially rss+swap. You need swap to be able to explain rss. I also agonized about the platform-specific aspects of this. This is rather Linux-centric. But again, reality: Linux is by far the most important platform. On Windows, for example, we can ask for the size of the commit charge. That is a nice complementary information. On Linux, we also have a commit charge (as in, how much of the process memory was committed by the OS, underlaid with swap according to the kernel overcommit rules). That would be useful to know, since it can explain native OOMs when the OS runs out of swap. But I don't see a way to obtain that information on Linux. Since I only have the number on Windows, I left it out. And I realize this is a bit arbitrary, since I included values I only get for Linux. Shaping events the right way is hard, and I am thankful for any direction from your side.
Hm, sure. |
@tstuefe
@egahlin , is there a procedure for deprecating event types? Or are events permanent once introduced? If you decide to go the above route (leaving all the metrics in
I think this new swap data is process-specific so is not already covered by Another option, in order to avoid duplication, could be to leave the RSS metrics in the Either way, I agree that putting all the metrics in |
Thank you, @roberttoyonaga ! Let's see what @egahlin writes. |
Events can be viewed as tables in a relational database. Duplicating information or designing tables based on the user interface is not a good idea. Correlating or grouping data from different events is not a new problem. The way we have solved it in the past is by using an explicit ID, similar to a foreign key (gcId, compileId, safepointId etc), or by using the same timestamp (ActiveSetting, NativeMemory* etc.). In this case, I think a timestamp makes more sense. Tools can place the value on a timeline and have memory on the y-axis. An alternative design is to do something similar to the NativeMemoryUsage event. That is, to split the data per type
and then emit what is supported on a specific platform and have all those event with the same timestamp. I'm not sure which of the following memory types make sense to include, there is also a maintenance aspect, or if it the list is correct or complete, but the data would be normalized, even if it differs depending on the platform.
|
@tstuefe This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
/keepalive |
@tstuefe The pull request is being re-evaluated and the inactivity timeout has been reset. |
@tstuefe this pull request can not be integrated into git checkout JDK-8365306-Provide-OS-Process-Size-and-Libc-statistic-metrics-to-JFR
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
This provides the following new metrics:
ProcessSize
event (new, periodic)LibcStatistics
(new, periodic)-XX:TrimNativeHeapInterval
)NativeHeapTrim
(new, event-driven) (for both manual and automatic trims)JavaThreadStatistic
Notes:
we already have
ResidentSetSize
event, and the newProcessSize
event is a superset of that. I don't know how these cases are handled. I'd prefer to throw the old event out, but JMC has a hard-coded chart for RSS, so I kept it in unless someone tells me to remove it.Obviously, the libc events are very platform-specific. Still, I argue that these metrics are highly useful. We want people to use JFR and JMC; people include developers that are dealing with performance problems that require platform-specific knowledge to understand. See my comment in the JBS issue.
I provided implementations, as far as possible, to Linux, MacOS and Windows.
Testing:
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26756/head:pull/26756
$ git checkout pull/26756
Update a local copy of the PR:
$ git checkout pull/26756
$ git pull https://git.openjdk.org/jdk.git pull/26756/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26756
View PR using the GUI difftool:
$ git pr show -t 26756
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26756.diff
Using Webrev
Link to Webrev Comment