Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Sep 4, 2025

This deploys scansetting and scansettingbinding for profile defined in values.yaml to run in a sync-wave before other layer-0,1, or 2 apps.

---
# ClusterRole with permissions to manage compliance resources
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a clusterRole? If all the objects you're touching are in openshift-compliance it might not need to be.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not need to be. The fact I set it that way shows my ignorance of remediation controller within compliance operator. I did set all elements to be openshift-compliance namespace focused but don't know if namespace scoped auth is sufficient for something that can reboot nodes during remediation

name: {{ .Values.compliance.scanSettingBinding.name }}
namespace: openshift-compliance
annotations:
argocd.argoproj.io/sync-wave: '-10'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally only use sync-waves when not using them leads to errors or other problems with deployments. I am very unfamiliar with the compliance operator, so they may indeed be necessary. But if they aren't it's better not to have them. (As discussed we definitely need them in clustergroup, though).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was me trying to ensure that the scan stuff was there before the results viewer and remediation updaters were available

@ghost
Copy link
Author

ghost commented Sep 8, 2025

So compliance scans do run and remediations are identified currently
Problem #1 is that setting automated remediation doesn't actually perform a remediation. all the complianceremediation CRs are marked with spec:apply false
Problem #2 is that when I do attempt to create a remediation job that will go and look for those CRs and mark/patch them to spec:apply true the script fails to exec any kubectl/oc commands (I am using the vp team imperative-container) due to an inability to find the KUBECONFIG to use to connect to the cluster
Problem #3 is that when I ignore problem #2 and test the bash script independently it does run (b/c I am logged into the cluster) and changes the spec:apply to true but the remediation status doesn't change from Pending to Applied. The remediation controller never does anything after the patch to spec:apply
That third problem may be due to me executing it from a local oc login. However I am logged in as a cluster-admin.
I am only literature deep on the compliance operator scanning stuff. Per the docs setting compliance.openshift.io/default-auto-apply: "true" should have been enough to run the remediations without the extra effort of creating a remediation-job to run a shell script inside a container image to oc patch the system. That does not appear to be the case.
Another set of eyes (and mind) would be very much appreciated

@ghost
Copy link
Author

ghost commented Sep 8, 2025

Screenshot from 2025-09-08 09-09-40 Identifies what things look like with current PR deployed. Scans have run, remediations are identified. My reemdiation-job will eventually timeout due to bash script failing to get a proper env for a KUBECONFIG to use to execute the kubectl commands in the script

Phil Osip added 2 commits September 8, 2025 09:19
…cussion with VP team and no need to do that in other examples
@ghost
Copy link
Author

ghost commented Sep 8, 2025

Updated PR with removal of env for KUBECONFIG and now imperative-container executes the bash script in remediation job and can use oc/kubectl commands
Still not seeing remediationcontroller act on the change to spec:apply true so while I have addressed Problem #2 from above, Problem # 3 still exists.
Script runs and patch occurs to complianceremediation CRs. Then nothing....

@ghost
Copy link
Author

ghost commented Sep 8, 2025

Screenshot from 2025-09-08 09-45-50 All of the complianceremediation CRs have been updated by the remediation job to have spec:apply set to true. However all of the status is still Pending after the patch Screenshot from 2025-09-08 09-47-09

…ussion with compliance operator team. It should get generated at runtime
@ghost
Copy link
Author

ghost commented Sep 8, 2025

Screenshot from 2025-09-08 12-31-40 Seems like I uncovered an edge case where automated remediation just fails silently on ROSA clusters. Things work as expected on a non ROSA cluster. Spun up aws 4.18.22 with clusterbot Remediations run automatically with ScanSetting and ScanSettingBinding as defined (I did remove the annotation on the ScanSettingBinding) There are a lot of complianceremediations run but all either show Applied or Missing Dependencies Overall compliancescans show RESULT as NON-COMPLIANT. Wondering if there is a way to automatically trigger another scan after remediations?

…m ComplianceCheckResults in Compliance Operator
@ghost ghost requested a review from sabre1041 September 8, 2025 18:27
@ghost
Copy link
Author

ghost commented Sep 8, 2025

Since 15 of these commits were attempts to figure out why auto remediation wasn't working I am going to close this PR, rebase and only add the 5 files required along with updates to values-hub.yaml

@ghost ghost closed this Sep 8, 2025
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants