Skip to content

Conversation

@knobunc
Copy link

@knobunc knobunc commented Aug 8, 2025

Added the framework for network policies for DNS for the operator and the dns pods.

The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP.

The operator installs a deny all network policy for the openshift-dns namespace.

Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics.

It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed.

It also needs access to the api server, but that is covered by the wildcard allow policy.

https://issues.redhat.com/browse/NE-1476

@knobunc knobunc added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2025
@openshift-ci openshift-ci bot requested review from Miciah and candita August 8, 2025 15:06
@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 2 times, most recently from af84e8d to 87bf0e3 Compare August 9, 2025 15:06
@knobunc knobunc changed the title NE-1476 - Added network policies for DNS NE-1476: Added network policies for DNS Aug 9, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 9, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 9, 2025

@knobunc: This pull request references NE-1476 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

Added the framework for network policies for DNS for the operator and the dns pods.

The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP.

The operator installs a deny all network policy for the openshift-dns namespace.

Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics.

It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed.

It also needs access to the api server, but that is covered by the wildcard allow policy.

https://issues.redhat.com/browse/NE-1476

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 9, 2025
@knobunc
Copy link
Author

knobunc commented Aug 9, 2025

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 9, 2025

@knobunc: This pull request references NE-1476 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch from 87bf0e3 to 095baff Compare August 9, 2025 16:06
@knobunc
Copy link
Author

knobunc commented Aug 9, 2025

/test e2e-aws-ovn

@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 5 times, most recently from 3041bba to a564e83 Compare August 11, 2025 01:58
@knobunc
Copy link
Author

knobunc commented Aug 11, 2025

/test unit

@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 4 times, most recently from f9c1b3e to 78192ee Compare August 11, 2025 02:41
@knobunc
Copy link
Author

knobunc commented Aug 11, 2025

/retest-required

@knobunc
Copy link
Author

knobunc commented Aug 11, 2025

/retest

@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch 3 times, most recently from 5695df2 to 834a6e1 Compare August 11, 2025 15:34
Added the framework for network policies for DNS for the operator and
the dns pods.

The operator has a deny all network policy that for the
openshift-dns-operator namespace and an allow policy for egress to the
apiserver and dns ports at any IP.

The operator installs a deny all network policy for the openshift-dns
namespace.

Then for each dns that it manages it installs an allow policy for
ingress for dns traffic and metrics.

It has to allow ingress from the dns pods to any IP because we allow
configuration to set the upstream server and port, so any valid IP and
port needs to be allowed.

It also needs access to the api server, but that is covered by the
wildcard allow policy.

https://issues.redhat.com/browse/NE-1476
@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch from 834a6e1 to fb1a7b4 Compare August 11, 2025 16:14
@knobunc
Copy link
Author

knobunc commented Aug 11, 2025

/retest-required

2 similar comments
@knobunc
Copy link
Author

knobunc commented Aug 11, 2025

/retest-required

@knobunc
Copy link
Author

knobunc commented Aug 12, 2025

/retest-required

Comment on lines 113 to 117
if updated.Labels == nil {
updated.Labels = map[string]string{}
}
for k, v := range expectedLabels {
updated.Labels[k] = v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you want to drop labels that are in current but not expected, or else the operator will get in a reconciliation loop.

Suggested change
if updated.Labels == nil {
updated.Labels = map[string]string{}
}
for k, v := range expectedLabels {
updated.Labels[k] = v
updated.Labels = map[string]string{}
for k, v := range expectedLabels {
updated.Labels[k] = v

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! Thank you

@rikatz
Copy link
Member

rikatz commented Aug 13, 2025

/assign

ingress:
- ports:
- protocol: TCP
port: 9393
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question here: should the metrics access be limited to specific prometheus pods, or is this generic access desired?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be marked as solved

Comment on lines +19 to +22
- protocol: TCP
port: 8080
- protocol: TCP
port: 8181
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the health check ports don't need to be open, as kubelet should be able to reach the Pods with or without the network policy (unless ovn-kubernetes does differently).

It is worth testing tho

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting test errors (which surprised me) until these were opened. I wasn't sure if something else was hitting them to determine when it was live.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum, weird.

I have just applied the network policy manually, so the port 8080 was blocked from a pod on another namespace but I could still curl it from kubelet:

 oc debug node/ip-10-0-86-223.us-east-2.compute.internal -- curl -kv http://10.131.0.5:8080/health
....
OK

then trying from my pod:

kubectl exec -it nginx-5869d7778c-pl7fq -- curl -m 3 http://10.131.0.5:8080/health
curl: (28) Connection timed out after 3000 milliseconds


desired := desiredDNSNetworkPolicy(dns)

switch {
Copy link
Member

@rikatz rikatz Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some not really important comment, but instead of using switch here wouldn't be better to do something as:

if !haveNP {
  if err := r.client.Create(context.TODO(), desired); err != nil {
     return false, nil, fmt.Errorf("failed to create dns networkpolicy: %v", err)
   }
   logrus.Infof("created dns networkpolicy: %s/%s", desired.Namespace, desired.Name)
	return r.currentDNSNetworkPolicy(dns)
}
updated, err := r.updateDNSNetworkPolicy(current, desired)
if err != nil {
   return true, current, err
} 
if updated {
	return r.currentDNSNetworkPolicy(dns)
}
return true, current nil

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it that way to match the pattern in other files (e.g. controller_dns_configmap.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, let's follow the same pattern then

@candita
Copy link
Contributor

candita commented Aug 13, 2025

/assign @alebedev87

@candita
Copy link
Contributor

candita commented Aug 13, 2025

=== RUN TestDNSForwarding
operator_test.go:694: failed to dig 172.30.224.202: failed to find "1.2.3.4"

/retest required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 13, 2025

@candita: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test e2e-aws-ovn
/test e2e-aws-ovn-operator
/test e2e-aws-ovn-serial
/test e2e-aws-ovn-upgrade
/test images
/test okd-scos-images
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-aws-ovn-single-node
/test e2e-aws-ovn-techpreview
/test okd-scos-e2e-aws-ovn

Use /test all to run all jobs.

In response to this:

=== RUN TestDNSForwarding
operator_test.go:694: failed to dig 172.30.224.202: failed to find "1.2.3.4"

/retest required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 21, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knobunc knobunc force-pushed the NE-1476---Network-Policies-for-DNS branch from 8576975 to 333c04b Compare August 21, 2025 14:43
from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-dns-operatorXXX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something from a previous test that was left here?

Copy link
Contributor

@alebedev87 alebedev87 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the XXX suffixes.

@rikatz
Copy link
Member

rikatz commented Aug 21, 2025

@knobunc 2 comments still on network policy manifests, one is that I have tested the kubelet theory and it worked (unless the operator also tries to reach the health port, which in that case would make sense), and some information that was left, otherwise from a network policy manifest lgtm.

@lihongan
Copy link
Contributor

cc @melvinjoseph86

from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-dns-operatorXXX
Copy link
Contributor

@alebedev87 alebedev87 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the XXX suffixes.

- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-monitoring
# Allow the dns operator namespaces to hit the healthcheck ports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For which reason we have to allow hits to healthcheck ports from the operator namespace?

Comment on lines +37 to +40
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e-aws-ovn-operator test is permafailing on TestDNSForwarding:

 === RUN   TestDNSForwarding
    operator_test.go:694: failed to dig 172.30.74.128: failed to find "1.2.3.4"
--- FAIL: TestDNSForwarding (170.25s) 

The test traffic of TestDNSForwarding is as follows:

  • test-client pod from openshift-dns namespace gets dig +short foo.com command from oc exec
  • traffic hits CoreDNS which has forward plugin configured to use a test upstream
  • test upstream is another pod in openshift-dns namespace which responds 1.2.3.4 for foo.com
  • ClusterIP address of the service created for the test upstream pod is used in the forward plugin

Does ipBlock: 0.0.0.0/0 cover traffic to virtual ips?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a thing to be considered here is that if we want to allow egress to any traffic, we could remove the egress from the deny all rule.

Another thing that can be done here is probably make the rule more permissive:

  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
    - to:
        - podSelector: {}
    - to:
        - namespaceSelector: {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for having denyall networkpolicy outside dns directory? Taking into account that it's a namespaced resource and openshift-dns namespace is placed in dns directory.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

@knobunc: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-operator 333c04b link true /test e2e-aws-ovn-operator
ci/prow/okd-scos-e2e-aws-ovn 333c04b link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-single-node 333c04b link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-serial 333c04b link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-serial-2of2 333c04b link true /test e2e-aws-ovn-serial-2of2
ci/prow/e2e-aws-ovn-serial-1of2 333c04b link true /test e2e-aws-ovn-serial-1of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

if _, _, err := r.ensureDNSConfigMap(dns, clusterDomain, cmMap); err != nil {
errs = append(errs, fmt.Errorf("failed to create configmap for dns %s: %v", dns.Name, err))
}
if _, _, err := r.ensureDNSNetworkPolicy(dns); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter if this Network Policy creation fails, but the other ensure steps pass?

As Network Policies can be a bit tricky on their order of creation and publishing on nodes, and as you ensure the DNS Daemonset exists even before the Netpol (on line 535), I was considering if NetworkPolicy creation shouldn't be a blocker for the rest of reconciliation.

Before even you ensure the DNSDaemonset, maybe it is better to guarantee that the NetworkPolicy was created successfully, and directly retrying in case it fails.

resources:
- networkpolicies
verbs:
- "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as commented by @Miciah on https://github.com/openshift/cluster-ingress-operator/pull/1263/files#r2411400476 we need to avoid wildcards here as well

@Miciah
Copy link
Contributor

Miciah commented Dec 2, 2025

Please also add the new networkpolicies to relatedObjects:

related := []configv1.ObjectReference{
{
Resource: "namespaces",
Name: r.Config.OperatorNamespace,
},
{
Group: operatorv1.GroupName,
Resource: "dnses",
Name: "default",
},
}
if state.haveNamespace {
related = append(related, configv1.ObjectReference{
Resource: "namespaces",
Name: state.namespace.Name,
})
}
co.Status.RelatedObjects = related

See also https://issues.redhat.com//browse/OCPBUGS-65498.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants