-
Notifications
You must be signed in to change notification settings - Fork 559
Support custom list of services to be added to /etc/hosts in cluster DNS operator - RFE-4145 #2435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: t-cas The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hello @t-cas! Some important instructions when contributing to openshift/api: |
Hi @t-cas. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this is a change to an existing v1 API, which means that new fields need to be feature-gated and go through the feature promotion process
operator/v1/types_dns.go
Outdated
// nodeServices specifies K8s services for which entries should be added | ||
// to /etc/hosts by the node resolver. These services will be added in addition to | ||
// the default "image-registry.openshift-image-registry.svc" service. | ||
// Each service should be a relative name following the format "<service>.<namespace>.svc"; | ||
// for each relative name, an alias with the CLUSTER_DOMAIN suffix will also be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really sounds like a list of references to services on the cluster that we are just asking users to specify using the domain format.
Based on this existing description, I think it might be more intuitive to end users if we follow the pattern of a list of object references instead of requiring users to put it in the format <service>.<namespace>.svc
.
Using a list of object references would also simplify the validations we have to put in place because we don't need a regular expression and there are known validations for namespaces and service names that we can implement via CEL validations.
The resulting end user usage would look something like:
nodeServices:
- name: bar
namespace: foo
Then, in the operator code you can build the domain format using that information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your feedback, that's indeed a better format, PR updated accordingly
Do we have an EP explaining what this new feature is? As @everettraven mentions, this should be behind a feature gate as a new feature extending an existing API |
@JoelSpeed I just added more details about the proposal as part of RFE https://issues.redhat.com/browse/RFE-4145, tell me if you have any questions or need more details. Also, thanks for the suggestion regarding the feature gate but I’d like to clarify the rationale behind not introducing one in this case. |
@t-cas The purpose of the feature-gate process is to ensure that new features:
More specifically for your feature, you mention:
This is the purpose of an enhancement proposal. It allows us to better understand the motivation of the feature, proposed implementation and evaluate the feature for some of those potential concerns. If this wasn't previously configurable, how does making it configurable impact assumptions users and other systems on an OpenShift cluster may have made about the existing behavior? If the linked RFE, you mentioned These are all the kinds of questions I would expect to be answered by an EP. |
@everettraven Thanks for the detailed explanation, this helps clarify the broader purpose of the feature gate process, and I fully appreciate its value in ensuring stability, readiness, and supportability of new features. In this specific case, I’m trying to assess where the feature gate adds value. Regarding the rollout: the daemonset may be restarted when the custom resource changes, but this follows standard Kubernetes rolling update daemonset behavior. The node-resolver does not perform any live processing that could be impacted; it simply sets up /etc/hosts on the node where it runs. So in practice, restarting it has no observable impact on cluster operations. Finally, I initially didn’t expect this small change to require a dedicated enhancement proposal, since it doesn’t really alter the behavior of the node-resolver daemonset, what do you think ? I’ll make sure to include this clarification in linked RFE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@t-cas OpenShift has moved in a direction of gating every new feature, independent of size. We have a defined process which allows features to be developed in tech preview, and gain automated testing to prove out their stability before we add them to the default feature set, and release them to customers.
A benefit of our feature gating mechanics, is that it allows features to be implemented across release boundaries. We are currently about 2 weeks from feature freeze for 4.20. Normally at this point in time, we wouldn't be looking at adding new feature APIs as they risk shipping without the backing implementation. That doesn't matter when it's gated, but does matter when it isn't.
I've left a review for now with some suggestions for improvements to the API, but in general the shape looks fairly simple and I agree this is a fairly minor feature.
That said, there are things we need to consider, which I'll reach out internally about.
On the top of my mind are:
- Concerns about this from the product team who will own the feature in the long term
- Allocating QE and docs capacity for testing and documenting the new feature
- E2E testing, we expect automated E2E testing for all features to prevent regressions in the feature in the future
I'll try and get a conversation going about next steps here, so lets pause the enhancement and feature gating conversation for the moment
// and an alias with the CLUSTER_DOMAIN suffix will also be added. | ||
// | ||
// +optional | ||
// +kubebuilder:validation:MaxItems=20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get input from the network team about the appropriate maximum for this list. 20 sounds ok to me, but I wonder if we can reasonably expand this to account for a larger use case.
@Miciah could you or one of your team consider the implications of adding new entries, and what an appropriate maximum scale might be?
operator/v1/types_dns.go
Outdated
// nodeServices specifies K8s services for which entries should be added | ||
// to /etc/hosts by the node resolver. These services will be added in addition to | ||
// the default "image-registry.openshift-image-registry.svc" service. | ||
// For each service reference, entries will be created using the format "<name>.<namespace>.svc" | ||
// and an alias with the CLUSTER_DOMAIN suffix will also be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// nodeServices specifies K8s services for which entries should be added | |
// to /etc/hosts by the node resolver. These services will be added in addition to | |
// the default "image-registry.openshift-image-registry.svc" service. | |
// For each service reference, entries will be created using the format "<name>.<namespace>.svc" | |
// and an alias with the CLUSTER_DOMAIN suffix will also be added. | |
// nodeServices specifies a list of service objects for which host level resolvable entries should be added. | |
// Services in this list will be added to /etc/hosts on each node in the cluster by the node resolver. | |
// When not specified, only the default image registry service is resolvable. | |
// Services in this list will be added in addition to the default "image-registry.openshift-image-registry.svc" service. | |
// The default image registry service cannot be removed. | |
// For each service reference, entries will be created using the format "<name>.<namespace>.svc" | |
// and an alias with the CLUSTER_DOMAIN suffix will also be added. |
Just want to add a little more context
and an alias with the CLUSTER_DOMAIN suffix will also be added
Which CLUSTER_DOMAIN is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems values is currently hardcoded to cluster.local
by operator (see here https://github.com/openshift/cluster-dns-operator/blob/48ebc1269caad1e9ec7f422b24f3bccbe134d0c4/pkg/operator/controller/controller.go#L498-L499)
then operator is adding it as env variable CLUSTER_DOMAIN
in daemonset from this value here https://github.com/openshift/cluster-dns-operator/blob/48ebc1269caad1e9ec7f422b24f3bccbe134d0c4/pkg/operator/controller/controller_dns_node_resolver_daemonset.go#L93-L98
// | ||
// +optional | ||
// +kubebuilder:validation:MaxItems=20 | ||
NodeServices []DNSNodeService `json:"nodeServices,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add
NodeServices []DNSNodeService `json:"nodeServices,omitempty"` | |
// +kubebuilder:validation:MinItems=1 | |
// +listType=map | |
// +listMapKey=name | |
// +listMapKey=namespace | |
NodeServices []DNSNodeService `json:"nodeServices,omitempty"` |
operator/v1/types_dns.go
Outdated
type DNSNodeService struct { | ||
// name is the name of the service. | ||
// +required | ||
// +kubebuilder:validation:Required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is redundant as you have +required
above.
// +kubebuilder:validation:Required |
operator/v1/types_dns.go
Outdated
|
||
// namespace is the namespace of the service. | ||
// +required | ||
// +kubebuilder:validation:Required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is redundant as you have +required
above.
// +kubebuilder:validation:Required |
@@ -163,6 +173,25 @@ var ( | |||
DNSLogLevelTrace DNSLogLevel = "Trace" | |||
) | |||
|
|||
// DNSNodeService represents a Kubernetes service by name and namespace for node services. | |||
type DNSNodeService struct { | |||
// name is the name of the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Service names are validated as DNS 1035 labels, so we should explain that here
// name is the name of the service. | |
// name is the name of the service. | |
// The name should consist of at most 63 characters, and of only lowercase alphanumeric characters and hyphens, | |
// and should start with an alphabetic character and end with an alphanumeric character. |
operator/v1/types_dns.go
Outdated
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=63 | ||
// +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to use CEL as opposed to a pattern here
// +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` | |
// +kubebuilder:validation:XValidation:rule=`!format.dns1035Label().validate(self).hasValue()`,message="a DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character" |
// +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` | ||
Name string `json:"name"` | ||
|
||
// namespace is the namespace of the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace names are DNS 1123 Labels, so we will explain that here too
// namespace is the namespace of the service. | |
// namespace is the namespace of the service. | |
// The namespace should consist of at most 63 characters, and of only lowercase alphanumeric characters and hyphens, | |
// and should start and end with an alphanumeric character. |
operator/v1/types_dns.go
Outdated
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:MinLength=1 | ||
// +kubebuilder:validation:MaxLength=63 | ||
// +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer CEL to pattern here
// +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` | |
// +kubebuilder:validation:XValidation:rule=`!format.dns1123Label().validate(self).hasValue()`,message="the value must consist of only lowercase alphanumeric characters and hyphens" |
…DNS operator - RFE-4145
@JoelSpeed thanks for all the feedback, PR has been updated to integrate all your comments Please tell me for the next steps how should I proceed with that change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Services in this list will be added in addition to the default "image-registry.openshift-image-registry.svc" service. | ||
// The default image registry service cannot be removed. | ||
// For each service reference, entries will be created using the format "<name>.<namespace>.svc" | ||
// and an alias with the CLUSTER_DOMAIN suffix of cluster.local will also be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we probably want to shorten this and drop CLUSTER_DOMAIN
if it is not configurable, CC @Miciah to confirm
/assign |
/assign @alebedev87 |
prerequisite for PR openshift/cluster-dns-operator#441 due to new field introduced in DNS CRD