-
Notifications
You must be signed in to change notification settings - Fork 4.8k
NO-JIRA: Improve case execution time #30329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@gangwgr: This pull request explicitly references no jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Please update the PR description with a nice crisp outline of the change that were made and what the result was. This helps the reviewers quickly get context on what was done and why, and what the outcome was. |
"openshift-console-operator", | ||
"openshift-console", | ||
"openshift-operator-lifecycle-manager", | ||
"hypershift", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to use KnownNamespaces from pkg/monitortestlibrary/platformidentification/operator_mapping.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unaddressed comment here.
Re-use of one place with known namespaces would increase our chances of picking up as new namespaces are added. You may need to manually add hypershift into your copy of the main list?
"apiserver-watcher", | ||
} | ||
e2e.Logf("Inspect the %s Pod's securityContext.", namespace) | ||
pod, err := oc.AdminKubeClient().CoreV1().Pods(namespace).Get(context.Background(), podName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're doing one API get per pod here, AssertPodToBeReady is also doing one get per pod plus looping for it to be ready, and you earlier did a GetPodsListByLabel which listed all pods but threw away all the data and just returned name.
Can you do the following:
- list all pods in the namespace once (retain all the pod info, not just the name)
- get rid of AssertPodToBeReady (is it still of any use if we can get rid of the container writes?)
- get rid of this Pods Get
i, podName, namespace) | ||
} else { | ||
e2e.Logf("Container %d in pod %s (namespace %s) has readOnlyRootFilesystem=false", | ||
i, podName, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be e2e.Fail?
deployment, err := oc.AdminKubeClient().AppsV1().Deployments(testNs2).Get(context.Background(), "my-app", metav1.GetOptions{}) | ||
// Use comprehensive write operations testing | ||
isMultiContainer := len(pod.Spec.Containers) > 1 | ||
err := testContainerWriteOperations(oc, podName, container.Name, namespace, testPaths, isMultiContainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you outline why this is necessary and helpful? How much of the time is spent in this portion?
Per Petr's comments on slack, it seems wasteful to essentially be testing a feature repeatedly on every pod to be sure it worked. I would expect the readonly root fs feature itself to have tests that are sufficient for us to assume it's working, and this test should just be ensuring we're using it where we should.
I'd like to see the timings each phase takes but I think one single junit test for all namespaces, without the container writes, with much more efficient API use seems like a good approach. |
02e58f7
to
a689444
Compare
Job Failure Risk Analysis for sha: a689444
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: a689444
New tests seen in this PR at sha: a689444
|
Job Failure Risk Analysis for sha: ad5f58d
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: ad5f58d
|
/cc |
Update all comments |
for containerIndex, container := range pod.Spec.Containers { | ||
if container.SecurityContext != nil && container.SecurityContext.ReadOnlyRootFilesystem != nil { | ||
isReadOnly := *container.SecurityContext.ReadOnlyRootFilesystem | ||
e2e.Logf("Performing detailed write operations test on container %d (%s) (readOnlyRootFilesystem=%t)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Container writes are still here, can you clarify why we would test a feature this deeply on every pod and container? Again, we would assume that we want to check and make sure we're enabling the setting, but checking the setting actually works in kube itself, on every single pod, seems very wasteful.
In your testing, did you find the feature itself was broken?
here is a view of your tests, OLM itself takes ten minutes.
But unless there's a convincing argument I'm not aware of, at this level I think the test should just ensure we've defined our containers correctly, we do not need to test actually writing to every container in every pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i fill it is needed, better to check end to end working, instead of only checking config.
@dusk125 Please suggest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with Devan, we can assume that the feature will work as intended and that it's sufficient to check whether or not the pod configuration has the field enabled (or not)
for _, pod := range filteredPods { | ||
// Wait for pod to be ready before testing | ||
e2e.Logf("Waiting for pod %s in namespace %s to be ready", pod.Name, namespace) | ||
err := exutil.AssertPodToBeReady(oc, pod.Name, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be removed, I assume it was a pre-check before trying to write to every container, but if that goes away per the comment below, the ready check can also go away.
If we do decide to keep writing to every container (unlikely) I would just skip any container that is not ready. (using the data from your initial pod list above) No need for additional api requests and polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgoodwin This is necessary sometimes pods are not ready and we are writing to it will fail the case. We should check end to end not only config, also check config is working correctly or not. otherwise we will regression.
i check now case timing only max 9 mins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you ever seen a container with the setting on but it was not working?
This would be a test for the actual feature in kube/kernel, not something we need to spend 30 minutes testing in every CI run of openshift conformance tests, that is extremely expensive to push into the wild. If you believe the actual underlying readonlyfs feature is unreliable, you're going to have to break this portion of the testing out, into a separate job and run it once or twice a week. It doesn't make sense to hammer a feature we should be able to assume is working, so hard and so often at this level, with such a time cost involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier we see after bug fixes we didn't see, So I have update the test check only config.
Looking good, just curious if we can use the more central list of namespaces rather than having two lists. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dusk125, gangwgr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@gangwgr: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: fdf03da
Risk analysis has seen new tests most likely introduced by this PR. New tests seen in this PR at sha: fdf03da
|
Refactored the single large readOnlyRootFilesystem test case into 18 individual test cases, each focusing on a specific namespace. This change improves test execution time, provides better isolation, and makes debugging easier.
Before: Single large test case with 50-minute timeout testing all namespaces
After: 18 individual test cases with 10-minute timeout each, using a loop-based approach