-
Notifications
You must be signed in to change notification settings - Fork 99
NE-1476: Added network policies for DNS #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
NE-1476: Added network policies for DNS #443
Conversation
af84e8d to
87bf0e3
Compare
|
@knobunc: This pull request references NE-1476 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@knobunc: This pull request references NE-1476 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
87bf0e3 to
095baff
Compare
|
/test e2e-aws-ovn |
3041bba to
a564e83
Compare
|
/test unit |
f9c1b3e to
78192ee
Compare
|
/retest-required |
|
/retest |
5695df2 to
834a6e1
Compare
Added the framework for network policies for DNS for the operator and the dns pods. The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP. The operator installs a deny all network policy for the openshift-dns namespace. Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics. It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed. It also needs access to the api server, but that is covered by the wildcard allow policy. https://issues.redhat.com/browse/NE-1476
834a6e1 to
fb1a7b4
Compare
|
/retest-required |
2 similar comments
|
/retest-required |
|
/retest-required |
| if updated.Labels == nil { | ||
| updated.Labels = map[string]string{} | ||
| } | ||
| for k, v := range expectedLabels { | ||
| updated.Labels[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you want to drop labels that are in current but not expected, or else the operator will get in a reconciliation loop.
| if updated.Labels == nil { | |
| updated.Labels = map[string]string{} | |
| } | |
| for k, v := range expectedLabels { | |
| updated.Labels[k] = v | |
| updated.Labels = map[string]string{} | |
| for k, v := range expectedLabels { | |
| updated.Labels[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! Thank you
|
/assign |
| ingress: | ||
| - ports: | ||
| - protocol: TCP | ||
| port: 9393 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question here: should the metrics access be limited to specific prometheus pods, or is this generic access desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be marked as solved
| - protocol: TCP | ||
| port: 8080 | ||
| - protocol: TCP | ||
| port: 8181 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the health check ports don't need to be open, as kubelet should be able to reach the Pods with or without the network policy (unless ovn-kubernetes does differently).
It is worth testing tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting test errors (which surprised me) until these were opened. I wasn't sure if something else was hitting them to determine when it was live.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum, weird.
I have just applied the network policy manually, so the port 8080 was blocked from a pod on another namespace but I could still curl it from kubelet:
oc debug node/ip-10-0-86-223.us-east-2.compute.internal -- curl -kv http://10.131.0.5:8080/health
....
OK
then trying from my pod:
kubectl exec -it nginx-5869d7778c-pl7fq -- curl -m 3 http://10.131.0.5:8080/health
curl: (28) Connection timed out after 3000 milliseconds
|
|
||
| desired := desiredDNSNetworkPolicy(dns) | ||
|
|
||
| switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some not really important comment, but instead of using switch here wouldn't be better to do something as:
if !haveNP {
if err := r.client.Create(context.TODO(), desired); err != nil {
return false, nil, fmt.Errorf("failed to create dns networkpolicy: %v", err)
}
logrus.Infof("created dns networkpolicy: %s/%s", desired.Namespace, desired.Name)
return r.currentDNSNetworkPolicy(dns)
}
updated, err := r.updateDNSNetworkPolicy(current, desired)
if err != nil {
return true, current, err
}
if updated {
return r.currentDNSNetworkPolicy(dns)
}
return true, current nilThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it that way to match the pattern in other files (e.g. controller_dns_configmap.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, let's follow the same pattern then
|
/assign @alebedev87 |
/retest required |
|
@candita: The The following commands are available to trigger optional jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8576975 to
333c04b
Compare
| from: | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: openshift-dns-operatorXXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something from a previous test that was left here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the XXX suffixes.
|
@knobunc 2 comments still on network policy manifests, one is that I have tested the kubelet theory and it worked (unless the operator also tries to reach the health port, which in that case would make sense), and some information that was left, otherwise from a network policy manifest lgtm. |
| from: | ||
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: openshift-dns-operatorXXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the XXX suffixes.
| - namespaceSelector: | ||
| matchLabels: | ||
| kubernetes.io/metadata.name: openshift-monitoring | ||
| # Allow the dns operator namespaces to hit the healthcheck ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For which reason we have to allow hits to healthcheck ports from the operator namespace?
| egress: | ||
| - to: | ||
| - ipBlock: | ||
| cidr: 0.0.0.0/0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e-aws-ovn-operator test is permafailing on TestDNSForwarding:
=== RUN TestDNSForwarding
operator_test.go:694: failed to dig 172.30.74.128: failed to find "1.2.3.4"
--- FAIL: TestDNSForwarding (170.25s)
The test traffic of TestDNSForwarding is as follows:
test-clientpod fromopenshift-dnsnamespace getsdig +short foo.comcommand fromoc exec- traffic hits CoreDNS which has forward plugin configured to use a test upstream
- test upstream is another pod in
openshift-dnsnamespace which responds1.2.3.4forfoo.com ClusterIPaddress of the service created for the test upstream pod is used in the forward plugin
Does ipBlock: 0.0.0.0/0 cover traffic to virtual ips?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thing to be considered here is that if we want to allow egress to any traffic, we could remove the egress from the deny all rule.
Another thing that can be done here is probably make the rule more permissive:
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
- to:
- podSelector: {}
- to:
- namespaceSelector: {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for having denyall networkpolicy outside dns directory? Taking into account that it's a namespaced resource and openshift-dns namespace is placed in dns directory.
|
@knobunc: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| if _, _, err := r.ensureDNSConfigMap(dns, clusterDomain, cmMap); err != nil { | ||
| errs = append(errs, fmt.Errorf("failed to create configmap for dns %s: %v", dns.Name, err)) | ||
| } | ||
| if _, _, err := r.ensureDNSNetworkPolicy(dns); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter if this Network Policy creation fails, but the other ensure steps pass?
As Network Policies can be a bit tricky on their order of creation and publishing on nodes, and as you ensure the DNS Daemonset exists even before the Netpol (on line 535), I was considering if NetworkPolicy creation shouldn't be a blocker for the rest of reconciliation.
Before even you ensure the DNSDaemonset, maybe it is better to guarantee that the NetworkPolicy was created successfully, and directly retrying in case it fails.
| resources: | ||
| - networkpolicies | ||
| verbs: | ||
| - "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as commented by @Miciah on https://github.com/openshift/cluster-ingress-operator/pull/1263/files#r2411400476 we need to avoid wildcards here as well
|
Please also add the new networkpolicies to cluster-dns-operator/pkg/operator/controller/status/controller.go Lines 129 to 146 in 333c04b
|
Added the framework for network policies for DNS for the operator and the dns pods.
The operator has a deny all network policy that for the openshift-dns-operator namespace and an allow policy for egress to the apiserver and dns ports at any IP.
The operator installs a deny all network policy for the openshift-dns namespace.
Then for each dns that it manages it installs an allow policy for ingress for dns traffic and metrics.
It has to allow ingress from the dns pods to any IP because we allow configuration to set the upstream server and port, so any valid IP and port needs to be allowed.
It also needs access to the api server, but that is covered by the wildcard allow policy.
https://issues.redhat.com/browse/NE-1476