Skip to content

Add cgroups CPU quota and throttling metrics #1039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mkapalka
Copy link

@mkapalka mkapalka commented May 9, 2025

Add metrics related to CPU quotas and CPU throttling (Linux CFS bandwidth control), as well as the total CPU usage from Linux cgroups CPU accounting. Those metrics can be useful in multi-tenant cloud environments, in particular on Elastic Cloud nodes that use CPU boosting (vCPU credits).

Add metrics related to CPU quotas and CPU throttling (Linux CFS bandwidth control), as well as
the total CPU usage from Linux cgroups CPU accounting. Those metrics can be useful in multi-tenant
cloud environments, in particular on Elastic Cloud nodes that use CPU boosting (vCPU credits).

Signed-off-by: Michal Kapalka <[email protected]>
@mkapalka mkapalka force-pushed the feature/add-cgroups-cpu-stats branch from ad384a0 to 56195f7 Compare May 14, 2025 09:04
@mkapalka
Copy link
Author

mkapalka commented Jun 2, 2025

@SuperQ @sysadmind it would be great if you could have a look at this PR and tell me if there's anything missing that I should add. Thanks in advance!

@mkapalka
Copy link
Author

mkapalka commented Aug 8, 2025

Update: we have been using this branch successfully in production for quite some time now and those new metrics are very helpful, maybe it's worth merging this PR to make it easier for others to benefit from this as well? @SuperQ @sysadmind

Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this PR hasn't triggered the CI process. You may need to rebase, or when you update this PR that might trigger the CI to run.

@@ -286,6 +286,66 @@ func NewNodes(logger *slog.Logger, client *http.Client, url *url.URL, all bool,
},
Labels: defaultNodeLabelValues,
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these metrics should be converted to seconds. It's typical practice for prometheus metrics to always be in base units - https://prometheus.io/docs/practices/naming/#metric-names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants