Skip to content

✨ Metrics Summary #2134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dtfranz
Copy link
Contributor

@dtfranz dtfranz commented Aug 4, 2025

Description

Adds a util to the e2e suite which queries prometheus at the end of the test run for alerts and metrics data. This data is then processed into markdown and displayed to the contributor at the end of their test runs.

Extra: Tuned prometheus alerts to be less sensitive to memory growth. The tests will naturally cause an additional memory footprint at the beginning of the e2e, so we need to account for that somehow. Also tagged a couple of images we were implicitly using 'latest' versions of so nodes won't have to pull them on every test run.

The principal idea here is that if we are to fail a test run based on performance results, then those results need to be easily visible to the contributor to avoid a potentially frustrating development experience.

The markdown will be visible here when the e2e completes.

Example:

---
config:
  xyChart:
    showDataLabel: true
    xAxis:
      showLabel: false
---
xychart-beta
title "Memory Usage"
y-axis "MB"  44.153446 --> 75.057562
x-axis "time (start of test to end)"
line [46.477312,57.483264,64.397312,65.495040,66.863104,66.928640,67.592192,67.600384,67.600384,67.272704,63.877120,63.926272,64.409600,65.273856,66.310144,71.483392,67.194880,66.080768,66.826240,69.464064]
Loading

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@dtfranz dtfranz requested a review from a team as a code owner August 4, 2025 10:42
@openshift-ci openshift-ci bot requested review from OchiengEd and tmshort August 4, 2025 10:42
Copy link

openshift-ci bot commented Aug 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelanford for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

netlify bot commented Aug 4, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit a28cab6
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/6891777d50c3380008fc3ffa
😎 Deploy Preview https://deploy-preview-2134--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@@ -22,13 +22,13 @@ spec:
annotations:
description: "container {{ $labels.container }} of pod {{ $labels.pod }} experienced OOM event(s); count={{ $value }}"
- alert: operator-controller-memory-growth
expr: deriv(sum(container_memory_working_set_bytes{pod=~"operator-controller.*",container="manager"})[5m:]) > 50_000
expr: deriv(sum(container_memory_working_set_bytes{pod=~"operator-controller.*",container="manager"})[5m:]) > 100_000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values were too sensitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just note this in the commit message or put in another commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out 👍 I'll add a note to the commit message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the following to the commit and the PR description:

Extra: Tuned prometheus alerts to be less sensitive to memory growth. The tests will naturally cause an additional memory footprint at the beginning of the e2e, so we need to account for that somehow. Also tagged a couple of images we were implicitly using 'latest' versions of so nodes won't have to pull them on every test run.

@@ -129,15 +129,15 @@ func (c *MetricsTestConfig) getServiceAccountToken(t *testing.T) string {
func (c *MetricsTestConfig) createCurlMetricsPod(t *testing.T) {
t.Logf("Creating curl pod (%s/%s) to validate the metrics endpoint", c.namespace, c.curlPodName)
cmd := exec.Command(c.client, "run", c.curlPodName,
"--image=curlimages/curl",
"--image=curlimages/curl:8.15.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't force kind to pull this image every time we create the pod.

@@ -58,7 +58,7 @@ spec:
terminationGracePeriodSeconds: 0
containers:
- name: busybox
image: busybox
image: busybox:1.36
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic as with the curl pod.

@trgeiger
Copy link
Contributor

trgeiger commented Aug 4, 2025

This is very cool

@jianzhangbjz
Copy link

That's cool!
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2025
Copy link

codecov bot commented Aug 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.83%. Comparing base (e0b5c18) to head (a28cab6).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2134   +/-   ##
=======================================
  Coverage   72.83%   72.83%           
=======================================
  Files          79       79           
  Lines        7340     7340           
=======================================
  Hits         5346     5346           
  Misses       1645     1645           
  Partials      349      349           
Flag Coverage Δ
e2e 43.50% <ø> (ø)
experimental-e2e 56.12% <ø> (-0.10%) ⬇️
unit 58.22% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds a util to the e2e suite which queries prometheus at the end of the test run for alerts and metrics data. This data is then processed into markdown which is displayed to the contributor at the end of their test runs.

Extra: Tuned prometheus alerts to be less sensitive to memory growth. The tests will naturally cause an additional memory footprint at the beginning of the e2e, so we need to account for that somehow. Also tagged a couple of images we were implicitly using 'latest' versions of so nodes won't have to pull them on every test run.

Signed-off-by: Daniel Franz <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2025
@jianzhangbjz
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants