Skip to content

Default MetricStorage NetworkAttachments to ctlplane#827

Open
vkmc wants to merge 21 commits intomainfrom
OSPRH-22189/use-dataplane-nw-default-nad
Open

Default MetricStorage NetworkAttachments to ctlplane#827
vkmc wants to merge 21 commits intomainfrom
OSPRH-22189/use-dataplane-nw-default-nad

Conversation

@vkmc
Copy link
Collaborator

@vkmc vkmc commented Jan 5, 2026

Updates the MetricStorage CRD to default NetworkAttachments to ["ctlplane"]. This aligns the field with DataplaneNetwork and removes the manual requirement for users to override this in the OpenStackControlPlane CR.

Closes: OSPRH-22189

@openshift-ci openshift-ci bot requested review from jlarriba and olliewalsh January 5, 2026 12:05
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f39d1351ff2349968c081ab8464f9879

✔️ openstack-k8s-operators-content-provider SUCCESS in 57m 55s
telemetry-operator-multinode-cloudkitty FAILURE in 39m 57s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 00m 48s
telemetry-operator-multinode-default-telemetry FAILURE in 37m 09s
functional-tests-osp18 FAILURE in 44m 19s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5d35876c89884b759063abdca1fee62a

✔️ openstack-k8s-operators-content-provider SUCCESS in 59m 25s
⚠️ telemetry-operator-multinode-cloudkitty SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master
telemetry-openstack-meta-content-provider-master FAILURE in 16m 13s
telemetry-operator-multinode-default-telemetry FAILURE in 38m 08s
⚠️ functional-tests-osp18 SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master

@SeanMooney
Copy link

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8ccff3636cb34d07b2913c8f3e082d69

✔️ openstack-k8s-operators-content-provider SUCCESS in 59m 53s
telemetry-operator-multinode-cloudkitty FAILURE in 44m 16s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 05m 50s
telemetry-operator-multinode-default-telemetry FAILURE in 39m 20s
functional-tests-osp18 FAILURE in 46m 24s

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 9, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2a9e299272f34edc9e9a641abf210883

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 18m 09s
telemetry-operator-multinode-cloudkitty FAILURE in 42m 33s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 51m 17s
telemetry-operator-multinode-default-telemetry FAILURE in 39m 00s
functional-tests-osp18 FAILURE in 40m 56s

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 13, 2026

CI issues seem unrelated to the change

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 13, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c6ee37a2bfbf41f9bd9e11f1eb695ad9

✔️ openstack-k8s-operators-content-provider SUCCESS in 56m 55s
telemetry-operator-multinode-cloudkitty FAILURE in 43m 07s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 1h 05m 07s
telemetry-operator-multinode-default-telemetry FAILURE in 37m 01s
functional-tests-osp18 FAILURE in 45m 48s

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vkmc
Once this PR has been reviewed and has the lgtm label, please ask for approval from elfiesmelfie. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 13, 2026

Thanks for the review Emma! I missed that

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 15, 2026

Kuttl tests are failing since the NAD required for tests/default is not available

When inspecting the logs, I noticed that the user supplied namespace creation is being skipped for all tests.

I think this is an issue that should be fixed in a follow up patch.

For this PR, I added the NAD creation as a step in default/tests

@vkmc vkmc force-pushed the OSPRH-22189/use-dataplane-nw-default-nad branch from 66204d4 to 78d50f6 Compare January 15, 2026 17:44
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3d70a6fa7e824443bba7f40e0fa9e467

openstack-k8s-operators-content-provider FAILURE in 13m 16s
⚠️ telemetry-operator-multinode-cloudkitty SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master
telemetry-openstack-meta-content-provider-master FAILURE in 13m 00s
⚠️ telemetry-operator-multinode-default-telemetry SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ functional-tests-osp18 SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 16, 2026

recheck

@vkmc
Copy link
Collaborator Author

vkmc commented Jan 16, 2026

/retest

@vkmc vkmc force-pushed the OSPRH-22189/use-dataplane-nw-default-nad branch from 78d50f6 to 90f720d Compare January 16, 2026 10:11
@vkmc
Copy link
Collaborator Author

vkmc commented Jan 19, 2026

Testing status of Kuttl tests in main in #836 to understand better the output in this PR

For some reason, after adding the NAD in the default test, Prometheus fail to start with an storage issue

Logs in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_telemetry-operator/827/pull-ci-openstack-k8s-operators-telemetry-operator-main-telemetry-operator-build-deploy-kuttl/2013291199812603904/artifacts/telemetry-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-2af7f286b6453522975b5de70b41aecb541c915047854f0e78afc578e250b844/namespaces/telemetry-kuttl-tests/events.log

LAST SEEN   TYPE      REASON                            OBJECT                                                                                                         MESSAGE
18m         Warning   FailedScheduling                  pod/prometheus-telemetry-kuttl-metricstorage-0                                                                 0/3 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 2 node(s) didn't find available persistent volumes to bind. preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
15m         Normal    Scheduled                         pod/cloudkitty-db-sync-6h5c7                                                                                   Successfully assigned telemetry-kuttl-tests/cloudkitty-db-sync-6h5c7 to oko-11-hv7zr-master-2
14m         Normal    Scheduled                         pod/minio                                                                                                      Successfully assigned telemetry-kuttl-tests/minio to oko-11-hv7zr-master-1

If we do have a resources issue, this should be happening in main as well.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/41dcda669a0c46b687ddcb124f3c7716

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 10m 50s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 27m 12s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 28m 45s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 27m 57s
functional-tests-osp18 FAILURE in 2h 01m 51s

- script: |
oc apply -f deps/loki-operator.yaml
until oc api-resources | grep -q grafana; do sleep 1; done
- script: oc apply -n telemetry-kuttl-tests -f https://raw.githubusercontent.com/openstack-k8s-operators/infra-operator/main/config/samples/network_v1beta1_netconfig.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you are applying the NetConfig configuration to the wrong namespace. I think that our tests are actually running in telemetry-kuttl-default namespace.

Maybe changing the namespace here fixes the issue you are seeing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@vkmc vkmc Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately didn't do the trick. We set the namespace for kuttl tests in https://github.com/openstack-k8s-operators/telemetry-operator/blob/main/kuttl-test.yaml#L19. The config for individual kuttl tests https://github.com/openstack-k8s-operators/telemetry-operator/blob/main/test/kuttl/tests/default/kuttl-test.yaml#L5 is being skipped (for all tests, we have the lines

=== CONT  kuttl/harness/default
    logger.go:42: 15:54:16 | default | Ignoring deps as it does not match file name regexp: ^(\d+)-(?:[^\.]+)(?:\.yaml)?$
    logger.go:42: 15:54:16 | default | Ignoring kuttl-test.yaml as it does not match file name regexp: ^(\d+)-(?:[^\.]+)(?:\.yaml)?$
    logger.go:42: 15:54:16 | default | Ignoring output as it does not match file name regexp: ^(\d+)-(?:[^\.]+)(?:\.yaml)?$
    logger.go:42: 15:54:16 | default | Skipping creation of user-supplied namespace: telemetry-kuttl-tests

)

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/42503d098d974bc88be0b8ee96569d17

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 06m 20s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 32m 11s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 20m 23s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 29m 05s
functional-tests-osp18 FAILURE in 1h 44m 29s

@vkmc vkmc force-pushed the OSPRH-22189/use-dataplane-nw-default-nad branch from 1910d23 to 52eb95a Compare January 27, 2026 17:42
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3a27506b34424d0f837764ee830aef99

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 47m 11s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 28m 11s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 09m 18s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 28m 31s
functional-tests-osp18 FAILURE in 1h 50m 27s

vkmc added 2 commits February 24, 2026 16:52
Updates the MetricStorage CRD to default NetworkAttachments
to ["ctlplane"]. This aligns the field with DataplaneNetwork and
removes the manual requirement for users to override this in the
OpenStackControlPlane CR.

Closes: OSPRH-22189
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4d284cab973f4129a9bead3f9f9366f9

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 45m 35s
⚠️ telemetry-operator-multinode-cloudkitty SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master
telemetry-openstack-meta-content-provider-master FAILURE in 5m 46s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 29m 21s
⚠️ functional-tests-osp18 SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master

Add defaulting logic to set NetworkAttachments to ["ctlplane"]
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/64270b4725494555b462db600197aede

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 46m 22s
⚠️ telemetry-operator-multinode-cloudkitty SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master
telemetry-openstack-meta-content-provider-master FAILURE in 5m 52s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 28m 38s
⚠️ functional-tests-osp18 SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/a481a243ad75430684def9e647c97828

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 46m 30s
⚠️ telemetry-operator-multinode-cloudkitty SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master
telemetry-openstack-meta-content-provider-master FAILURE in 6m 21s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 29m 31s
⚠️ functional-tests-osp18 SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0e03dce3e68448dd949f1ec8841593ba

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 54s
⚠️ telemetry-operator-multinode-cloudkitty SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master
telemetry-openstack-meta-content-provider-master FAILURE in 5m 36s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 27m 28s
⚠️ functional-tests-osp18 SKIPPED Skipped due to failed job telemetry-openstack-meta-content-provider-master

Comment on lines +53 to +56
if len(spec.NetworkAttachments) == 0 {
spec.NetworkAttachments = []string{"ctlplane"}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed on slack, so far, and as far as I am aware of, we are not defaulting any NetworkAttachment in operators, because the name can be anything, it must not be ctlplane. I am not against doing it, but in theory, depending on the OCP config and routing config, you might be able to use it just via the pod network and routing via the OCP worker node.
iiuc this code part here will not allow to deploy without any configured NAD, I am not sure if that is a forced config we want to do? I think it is fine to default it to ctlplane, but let the user to override to an empty list

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, just updated the PR to drop this, thanks for looking into it!

vkmc added 2 commits March 6, 2026 11:20
Prow Kuttl is not running with network isolation
(see https://github.com/openshift/release/blob/main/ci-operator/step-registry/openstack-k8s-operators/kuttl/openstack-k8s-operators-kuttl-commands.sh#L88)

Because of this, we cannot get a working NAD to use for the MetricStorage
(we are missing the NNCP required to create the NAD)

By default, we now set 'ctlplane' as the NAD (if no other value is set).
If we don't have a working NAD, the MetricStorage fails to deploy.

To overcome this limitation, we can structure tests as follow

- For all tests that require a working MetricStorage, we can pass
an empty list (relying on the routingViaHost config)

- In order to test that the NAD annotation is added to the MetricStorage
(without the proper MetricStorage being deployed), we create a non working NAD
and check the corresponding annotation is in place.
We currently do this in the TLS Kuttl scenario.
We shouldn't force the NetworkAttachments to be set to 'ctlplane'

Allow override for NetworkAttachments to an empty list. This is
required for cases in which the routing is done via the OCP worker node
(as in CI)
@vkmc vkmc force-pushed the OSPRH-22189/use-dataplane-nw-default-nad branch from 9f4b680 to 1c4698c Compare March 6, 2026 10:53
@vkmc
Copy link
Collaborator Author

vkmc commented Mar 6, 2026

Let's see if passing an empty list for NAD allow get us to passing Kuttl jobs. Once this is done, I will drop the unnecessary resources creation and debugging steps.

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9c368244ab6c4909835a584315bd503f

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 50m 40s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 32m 35s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 08m 58s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 31m 43s
functional-tests-osp18 FAILURE in 1h 50m 01s

@vkmc
Copy link
Collaborator Author

vkmc commented Mar 9, 2026

recheck

Keep the NAD creation in the test step
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3199f85a2fdd40749f442a6a005bf35e

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 29s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 30m 38s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 01m 10s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 27m 01s
functional-tests-osp18 FAILURE in 1h 43m 16s

@vkmc
Copy link
Collaborator Author

vkmc commented Mar 12, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9089f1168c374afca841a88d4b3cba42

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 47m 26s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 20m 47s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 22m 58s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 30m 23s
functional-tests-osp18 FAILURE in 2h 05m 58s

@vkmc
Copy link
Collaborator Author

vkmc commented Mar 13, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/00fa6c9bdf024201bf4eb874bb1eac00

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 45m 29s
✔️ telemetry-operator-multinode-cloudkitty SUCCESS in 1h 31m 39s
✔️ telemetry-openstack-meta-content-provider-master SUCCESS in 2h 03m 07s
✔️ telemetry-operator-multinode-default-telemetry SUCCESS in 1h 28m 23s
functional-tests-osp18 FAILURE in 1h 44m 16s

@vkmc
Copy link
Collaborator Author

vkmc commented Mar 20, 2026

/recheck

1 similar comment
@vkmc
Copy link
Collaborator Author

vkmc commented Mar 23, 2026

/recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants