NE-1476: Added network policies for DNS #443

knobunc · 2025-08-08T15:04:30Z

Added the framework for network policies for DNS for the operator and the dns pods.

The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP.

The operator installs a deny all network policy for the openshift-dns namespace.

Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics.

It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed.

It also needs access to the api server, but that is covered by the wildcard allow policy.

https://issues.redhat.com/browse/NE-1476

openshift-ci-robot · 2025-08-09T15:10:25Z

@knobunc: This pull request references NE-1476 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

Added the framework for network policies for DNS for the operator and the dns pods.

The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP.

The operator installs a deny all network policy for the openshift-dns namespace.

Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics.

It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed.

It also needs access to the api server, but that is covered by the wildcard allow policy.

https://issues.redhat.com/browse/NE-1476

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

knobunc · 2025-08-09T15:12:52Z

/jira refresh

openshift-ci-robot · 2025-08-09T15:12:55Z

@knobunc: This pull request references NE-1476 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

knobunc · 2025-08-09T23:41:19Z

/test e2e-aws-ovn

knobunc · 2025-08-11T02:00:23Z

/test unit

knobunc · 2025-08-11T10:45:43Z

/retest-required

knobunc · 2025-08-11T13:05:54Z

/retest

Added the framework for network policies for DNS for the operator and the dns pods. The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP. The operator installs a deny all network policy for the openshift-dns namespace. Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics. It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed. It also needs access to the api server, but that is covered by the wildcard allow policy. https://issues.redhat.com/browse/NE-1476

knobunc · 2025-08-11T19:07:35Z

/retest-required

knobunc · 2025-08-11T20:33:54Z

/retest-required

knobunc · 2025-08-12T02:24:30Z

/retest-required

Miciah · 2025-08-13T14:49:54Z

pkg/operator/controller/controller_dns_networkpolicy.go

+	if updated.Labels == nil {
+		updated.Labels = map[string]string{}
+	}
+	for k, v := range expectedLabels {
+		updated.Labels[k] = v


I believe you want to drop labels that are in current but not expected, or else the operator will get in a reconciliation loop.

Suggested change

if updated.Labels == nil {

updated.Labels = map[string]string{}

}

for k, v := range expectedLabels {

updated.Labels[k] = v

updated.Labels = map[string]string{}

for k, v := range expectedLabels {

updated.Labels[k] = v

Whoops! Thank you

rikatz · 2025-08-13T15:03:39Z

/assign

rikatz · 2025-08-13T16:38:52Z

manifests/0000_70_dns-operator_01-network-policy.yaml

+  ingress:
+  - ports:
+    - protocol: TCP
+      port: 9393


question here: should the metrics access be limited to specific prometheus pods, or is this generic access desired?

this can be marked as solved

rikatz · 2025-08-13T16:43:56Z

pkg/manifests/assets/dns/networkpolicy-allow.yaml

+    - protocol: TCP
+      port: 8080
+    - protocol: TCP
+      port: 8181


IIRC the health check ports don't need to be open, as kubelet should be able to reach the Pods with or without the network policy (unless ovn-kubernetes does differently).

It is worth testing tho

I was getting test errors (which surprised me) until these were opened. I wasn't sure if something else was hitting them to determine when it was live.

hum, weird.

I have just applied the network policy manually, so the port 8080 was blocked from a pod on another namespace but I could still curl it from kubelet:

oc debug node/ip-10-0-86-223.us-east-2.compute.internal -- curl -kv http://10.131.0.5:8080/health .... OK

then trying from my pod:

kubectl exec -it nginx-5869d7778c-pl7fq -- curl -m 3 http://10.131.0.5:8080/health curl: (28) Connection timed out after 3000 milliseconds

rikatz · 2025-08-13T16:56:24Z

pkg/operator/controller/controller_dns_networkpolicy.go

+
+	desired := desiredDNSNetworkPolicy(dns)
+
+	switch {


some not really important comment, but instead of using switch here wouldn't be better to do something as:

if !haveNP { if err := r.client.Create(context.TODO(), desired); err != nil { return false, nil, fmt.Errorf("failed to create dns networkpolicy: %v", err) } logrus.Infof("created dns networkpolicy: %s/%s", desired.Namespace, desired.Name) return r.currentDNSNetworkPolicy(dns) } updated, err := r.updateDNSNetworkPolicy(current, desired) if err != nil { return true, current, err } if updated { return r.currentDNSNetworkPolicy(dns) } return true, current nil

I did it that way to match the pattern in other files (e.g. controller_dns_configmap.go

sounds good, let's follow the same pattern then

candita · 2025-08-13T21:26:09Z

/assign @alebedev87

candita · 2025-08-13T21:28:18Z

=== RUN TestDNSForwarding
operator_test.go:694: failed to dig 172.30.224.202: failed to find "1.2.3.4"

/retest required

openshift-ci · 2025-08-13T21:28:22Z

@candita: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test e2e-aws-ovn

/test e2e-aws-ovn-operator

/test e2e-aws-ovn-serial

/test e2e-aws-ovn-upgrade

/test images

/test okd-scos-images

/test unit

/test verify

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-ovn-single-node

/test e2e-aws-ovn-techpreview

/test okd-scos-e2e-aws-ovn

Use /test all to run all jobs.

In response to this:

=== RUN TestDNSForwarding
operator_test.go:694: failed to dig 172.30.224.202: failed to find "1.2.3.4"

/retest required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-08-21T14:33:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rikatz · 2025-08-21T16:55:59Z

pkg/manifests/assets/dns/networkpolicy-allow.yaml

+    from:
+    - namespaceSelector:
+        matchLabels:
+          kubernetes.io/metadata.name: openshift-dns-operatorXXX


Is this something from a previous test that was left here?

+1 on the XXX suffixes.

rikatz · 2025-08-21T16:57:35Z

@knobunc 2 comments still on network policy manifests, one is that I have tested the kubelet theory and it worked (unless the operator also tries to reach the health port, which in that case would make sense), and some information that was left, otherwise from a network policy manifest lgtm.

lihongan · 2025-08-25T01:49:16Z

cc @melvinjoseph86

alebedev87 · 2025-08-26T12:39:01Z

pkg/manifests/assets/dns/networkpolicy-allow.yaml

+    from:
+    - namespaceSelector:
+        matchLabels:
+          kubernetes.io/metadata.name: openshift-dns-operatorXXX


+1 on the XXX suffixes.

alebedev87 · 2025-08-26T12:40:03Z

pkg/manifests/assets/dns/networkpolicy-allow.yaml

+    - namespaceSelector:
+        matchLabels:
+          kubernetes.io/metadata.name: openshift-monitoring
+  # Allow the dns operator namespaces to hit the healthcheck ports


For which reason we have to allow hits to healthcheck ports from the operator namespace?

alebedev87 · 2025-08-26T12:46:23Z

pkg/manifests/assets/dns/networkpolicy-allow.yaml

+  egress:
+  - to:
+    - ipBlock:
+        cidr: 0.0.0.0/0


e2e-aws-ovn-operator test is permafailing on TestDNSForwarding:

=== RUN TestDNSForwarding operator_test.go:694: failed to dig 172.30.74.128: failed to find "1.2.3.4" --- FAIL: TestDNSForwarding (170.25s)

The test traffic of TestDNSForwarding is as follows:

test-client pod from openshift-dns namespace gets dig +short foo.com command from oc exec

traffic hits CoreDNS which has forward plugin configured to use a test upstream

test upstream is another pod in openshift-dns namespace which responds 1.2.3.4 for foo.com

ClusterIP address of the service created for the test upstream pod is used in the forward plugin

Does ipBlock: 0.0.0.0/0 cover traffic to virtual ips?

a thing to be considered here is that if we want to allow egress to any traffic, we could remove the egress from the deny all rule.

Another thing that can be done here is probably make the rule more permissive:

egress: - to: - ipBlock: cidr: 0.0.0.0/0 - to: - podSelector: {} - to: - namespaceSelector: {}

alebedev87 · 2025-08-26T12:51:00Z

pkg/manifests/assets/networkpolicy-deny-all.yaml

What is the reason for having denyall networkpolicy outside dns directory? Taking into account that it's a namespaced resource and openshift-dns namespace is placed in dns directory.

openshift-ci · 2025-10-03T10:47:28Z

@knobunc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-operator	`333c04b`	link	true	`/test e2e-aws-ovn-operator`
ci/prow/okd-scos-e2e-aws-ovn	`333c04b`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-aws-ovn-single-node	`333c04b`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/e2e-aws-ovn-serial	`333c04b`	link	true	`/test e2e-aws-ovn-serial`
ci/prow/e2e-aws-ovn-serial-2of2	`333c04b`	link	true	`/test e2e-aws-ovn-serial-2of2`
ci/prow/e2e-aws-ovn-serial-1of2	`333c04b`	link	true	`/test e2e-aws-ovn-serial-1of2`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

rikatz · 2025-10-08T17:48:19Z

pkg/operator/controller/controller.go

 		if _, _, err := r.ensureDNSConfigMap(dns, clusterDomain, cmMap); err != nil {
 			errs = append(errs, fmt.Errorf("failed to create configmap for dns %s: %v", dns.Name, err))
 		}
+		if _, _, err := r.ensureDNSNetworkPolicy(dns); err != nil {


does it matter if this Network Policy creation fails, but the other ensure steps pass?

As Network Policies can be a bit tricky on their order of creation and publishing on nodes, and as you ensure the DNS Daemonset exists even before the Netpol (on line 535), I was considering if NetworkPolicy creation shouldn't be a blocker for the rest of reconciliation.

Before even you ensure the DNSDaemonset, maybe it is better to guarantee that the NetworkPolicy was created successfully, and directly retrying in case it fails.

rikatz · 2025-10-08T19:54:54Z

manifests/0000_70_dns-operator_00-cluster-role.yaml

+  resources:
+  - networkpolicies
+  verbs:
+  - "*"


as commented by @Miciah on https://github.com/openshift/cluster-ingress-operator/pull/1263/files#r2411400476 we need to avoid wildcards here as well

Miciah · 2025-12-02T18:43:16Z

Please also add the new networkpolicies to relatedObjects:

cluster-dns-operator/pkg/operator/controller/status/controller.go

Lines 129 to 146 in 333c04b

    
           related := []configv1.ObjectReference{ 
        
           	{ 
        
           		Resource: "namespaces", 
        
           		Name:     r.Config.OperatorNamespace, 
        
           	}, 
        
           	{ 
        
           		Group:    operatorv1.GroupName, 
        
           		Resource: "dnses", 
        
           		Name:     "default", 
        
           	}, 
        
           } 
        
           if state.haveNamespace { 
        
           	related = append(related, configv1.ObjectReference{ 
        
           		Resource: "namespaces", 
        
           		Name:     state.namespace.Name, 
        
           	}) 
        
           } 
        
           co.Status.RelatedObjects = related

See also https://issues.redhat.com//browse/OCPBUGS-65498.

knobunc added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2025

openshift-ci bot requested review from Miciah and candita August 8, 2025 15:06

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 2 times, most recently from af84e8d to 87bf0e3 Compare August 9, 2025 15:06

knobunc changed the title ~~NE-1476 - Added network policies for DNS~~ NE-1476: Added network policies for DNS Aug 9, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 9, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 9, 2025

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch from 87bf0e3 to 095baff Compare August 9, 2025 16:06

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 5 times, most recently from 3041bba to a564e83 Compare August 11, 2025 01:58

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 4 times, most recently from f9c1b3e to 78192ee Compare August 11, 2025 02:41

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 3 times, most recently from 5695df2 to 834a6e1 Compare August 11, 2025 15:34

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch from 834a6e1 to fb1a7b4 Compare August 11, 2025 16:14

Miciah reviewed Aug 13, 2025

View reviewed changes

openshift-ci bot assigned rikatz Aug 13, 2025

rikatz reviewed Aug 13, 2025

View reviewed changes

openshift-ci bot assigned alebedev87 Aug 13, 2025

Updated with review suggestions

333c04b

knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch from 8576975 to 333c04b Compare August 21, 2025 14:43

rikatz reviewed Aug 21, 2025

View reviewed changes

alebedev87 reviewed Aug 26, 2025

View reviewed changes

rikatz reviewed Oct 8, 2025

View reviewed changes

NE-1476: Added network policies for DNS #443

Are you sure you want to change the base?

NE-1476: Added network policies for DNS #443

Uh oh!

Conversation

knobunc commented Aug 8, 2025

Uh oh!

openshift-ci-robot commented Aug 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knobunc commented Aug 9, 2025

Uh oh!

openshift-ci-robot commented Aug 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knobunc commented Aug 9, 2025

Uh oh!

knobunc commented Aug 11, 2025

Uh oh!

knobunc commented Aug 11, 2025

Uh oh!

knobunc commented Aug 11, 2025

Uh oh!

knobunc commented Aug 11, 2025

Uh oh!

knobunc commented Aug 11, 2025

Uh oh!

knobunc commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rikatz commented Aug 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rikatz Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

candita commented Aug 13, 2025

Uh oh!

candita commented Aug 13, 2025

Uh oh!

openshift-ci bot commented Aug 13, 2025

Uh oh!

openshift-ci bot commented Aug 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alebedev87 Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rikatz commented Aug 21, 2025

Uh oh!

lihongan commented Aug 25, 2025

Uh oh!

alebedev87 Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

openshift-ci-robot commented Aug 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Aug 9, 2025 •

edited by openshift-ci bot

Loading

rikatz Aug 13, 2025 •

edited

Loading

alebedev87 Aug 26, 2025 •

edited

Loading

alebedev87 Aug 26, 2025 •

edited

Loading