Skip to content

CMP-4116: Fix platform scan pod stuck when RawResultStorage is disabled#1097

Open
Vincent056 wants to merge 3 commits intoComplianceAsCode:masterfrom
Vincent056:fix-platform-scan-raw-result-storage-disabled
Open

CMP-4116: Fix platform scan pod stuck when RawResultStorage is disabled#1097
Vincent056 wants to merge 3 commits intoComplianceAsCode:masterfrom
Vincent056:fix-platform-scan-raw-result-storage-disabled

Conversation

@Vincent056
Copy link

Summary

  • When RawResultStorage.Enabled=false, addResultsCollectionPods unconditionally added a TLS volume referencing the result-client-cert-{scanName} secret, which is only created when raw result storage is enabled. This caused the platform scan pod to get stuck in Init:0/2.
  • Reuse existing getLogCollectorVolumeMounts() and conditionally append the TLS volume only when RawResultStorage.Enabled=true, matching the existing behavior in getNodeScannerPodVolumes.
  • Add e2e test TestScheduledSuitePlatformNoStorage covering platform scans with disabled raw result storage.

Made with Cursor

The addResultsCollectionPods function unconditionally added the TLS
volume and mount referencing the result-client-cert secret, which is
only created when RawResultStorage.Enabled=true. This caused the
platform scan pod to get stuck in Init:0/2 when RawResultStorage was
disabled.

Reuse getLogCollectorVolumeMounts and conditionally append the TLS
volume, matching the existing behavior in getNodeScannerPodVolumes.

Made-with: Cursor
@openshift-ci
Copy link

openshift-ci bot commented Feb 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Vincent056

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1097-c31c6c686380ba795e3914bd7142de390c521595

Copy link
Collaborator

@rhmdnd rhmdnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one recommendation about improving the test check by moving it to after the scan completes, which ensures we've exercised the conditional given the scan is done and must have gone through the aggregating phase successfully.

I'm concerned if we check PVCs too soon, we'll short-circuit the check. We take the same approach in TestScanSettingBindingNoStorage.

defer f.Client.Delete(context.TODO(), testSuite)

pvcList := &corev1.PersistentVolumeClaimList{}
err = f.Client.List(context.TODO(), pvcList, client.InNamespace(f.OperatorNamespace), client.MatchingLabels(map[string]string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to happen really fast after the scan is created, where the reconcilation loop might not have picked up the change yet and even had the opportunity to test the conditional.

Would it make sense to move this to after the scan completes?

for _, pvc := range pvcList.Items {
t.Fatalf("Found unexpected PVC %s", pvc.Name)
}
t.Fatal("Expected not to find PVC associated with the scan.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the list is not empty (conditional on line 2356), we will always exit on line 2358, right? I don't think it's possible to reach this code.

@rhmdnd rhmdnd changed the title Fix platform scan pod stuck when RawResultStorage is disabled CMP-4116: Fix platform scan pod stuck when RawResultStorage is disabled Feb 27, 2026
@openshift-ci-robot
Copy link
Collaborator

@Vincent056: This pull request references CMP-4116 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • When RawResultStorage.Enabled=false, addResultsCollectionPods unconditionally added a TLS volume referencing the result-client-cert-{scanName} secret, which is only created when raw result storage is enabled. This caused the platform scan pod to get stuck in Init:0/2.
  • Reuse existing getLogCollectorVolumeMounts() and conditionally append the TLS volume only when RawResultStorage.Enabled=true, matching the existing behavior in getNodeScannerPodVolumes.
  • Add e2e test TestScheduledSuitePlatformNoStorage covering platform scans with disabled raw result storage.

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@xiaojiey
Copy link
Collaborator

Verification pass. Really confused

$ oc-compliance bind -N test-cis profile/ocp4-cis profile/ocp4-cis-node
Creating ScanSettingBinding test-cis
$ oc get pod
NAME                                             READY   STATUS    RESTARTS      AGE
compliance-operator-6d56dc6c9f-4nqbs             1/1     Running   2 (45m ago)   45m
ocp4-openshift-compliance-pp-7474f47c7c-h2rpw    1/1     Running   0             44m
rhcos4-openshift-compliance-pp-6cf7d7c49-zfjlk   1/1     Running   0             44m
$ oc get scan
NAME                   PHASE     RESULT
ocp4-cis               RUNNING   NOT-AVAILABLE
ocp4-cis-node-master   RUNNING   NOT-AVAILABLE
ocp4-cis-node-worker   RUNNING   NOT-AVAILABLE
$ oc get pod -n openshift-compliance -l compliance.openshift.io/scan-name=ocp4-cis -o yaml | grep -A20 "volumes:"
    volumes:
    - emptyDir: {}
      name: content-dir
    - name: kube-api-access-n4v9s
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            expirationSeconds: 3607
            path: token
        - configMap:
            items:
            - key: ca.crt
              path: ca.crt
            name: kube-root-ca.crt
        - downwardAPI:
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
              path: namespace
--
    volumes:
    - emptyDir: {}
      name: tmp-dir
    - emptyDir: {}
      name: fetch-results
    - emptyDir: {}
      name: report-dir
    - emptyDir: {}
      name: content-dir
    - configMap:
        defaultMode: 493
        name: ocp4-cis-openscap-container-entrypoint
      name: ocp4-cis-openscap-container-entrypoint
    - name: kube-api-access-kd6xw
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            expirationSeconds: 3607
            path: token
        - configMap:
$ oc get scan
NAME                   PHASE   RESULT
ocp4-cis               DONE    NON-COMPLIANT
ocp4-cis-node-master   DONE    COMPLIANT
ocp4-cis-node-worker   DONE    COMPLIANT
$ oc get pv
No resources found
$ oc patch ss default --type='merge' -p '{"rawResultStorage":{"enabled":true}}'
scansetting.compliance.openshift.io/default patched
$ oc-compliance rerun-now scansettingbinding test-cis
Rerunning scans from 'test-cis': ocp4-cis, ocp4-cis-node-master, ocp4-cis-node-worker
Re-running scan 'openshift-compliance/ocp4-cis'
Re-running scan 'openshift-compliance/ocp4-cis-node-master'
Re-running scan 'openshift-compliance/ocp4-cis-node-worker'
$ oc get scan -w
NAME                   PHASE     RESULT
ocp4-cis               RUNNING   NOT-AVAILABLE
ocp4-cis-node-master   RUNNING   NOT-AVAILABLE
ocp4-cis-node-worker   RUNNING   NOT-AVAILABLE
ocp4-cis-node-worker   AGGREGATING   NOT-AVAILABLE
ocp4-cis               AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-master   AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-worker   AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-worker   DONE          COMPLIANT
ocp4-cis               AGGREGATING   NOT-AVAILABLE
ocp4-cis               DONE          NON-COMPLIANT
ocp4-cis-node-master   AGGREGATING   NOT-AVAILABLE
ocp4-cis-node-master   DONE          COMPLIANT
$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                       STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
pvc-38052cd7-5490-45e2-b08b-dbc6dfeaf5c1   1Gi        RWO            Delete           Bound    openshift-compliance/ocp4-cis-node-worker   gp3-csi        <unset>                          2m24s
pvc-4e5a2fa7-e77d-4875-b344-1becbacd1fea   1Gi        RWO            Delete           Bound    openshift-compliance/ocp4-cis-node-master   gp3-csi        <unset>                          2m24s
pvc-ec19d635-0ca2-484a-9665-5e41d7cd3a1b   1Gi        RWO            Delete           Bound    openshift-compliance/ocp4-cis               gp3-csi        <unset>   

@github-actions
Copy link

🤖 To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:1097-58bc304ba7e0a848d24711cc75d94e460ba6a07b

@openshift-ci
Copy link

openshift-ci bot commented Mar 20, 2026

@Vincent056: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-rosa 58bc304 link true /test e2e-rosa
ci/prow/e2e-aws-parallel-arm 58bc304 link true /test e2e-aws-parallel-arm
ci/prow/e2e-aws-parallel 58bc304 link true /test e2e-aws-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants