Skip to content

Commit 19fac97

Browse files
neeraj-laadaswinayyolath
authored andcommitted
fix: Moved sentences to separate lines
Moved sentences to separate lines to help with reviews Signed-off-by: neeraj-laad <[email protected]>
1 parent fe01963 commit 19fac97

File tree

1 file changed

+56
-24
lines changed

1 file changed

+56
-24
lines changed

083-stretch-cluster.md

Lines changed: 56 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,57 @@
11
# Stretch Kafka cluster
22

3-
The Strimzi Kafka operator currently manages Kafka clusters within a single Kubernetes cluster. This proposal aims to extend support to stretch Kafka clusters, where brokers and controllers of a single Kafka cluster are distributed across multiple Kubernetes clusters.
3+
The Strimzi Kafka operator currently manages Kafka clusters within a single Kubernetes cluster.
4+
This proposal aims to extend support to stretch Kafka clusters, where brokers and controllers of a single Kafka cluster are distributed across multiple Kubernetes clusters.
45

56
## Current situation
67

7-
At present, the availability of Strimzi-managed Kafka clusters is directly tied to the availability of the underlying Kubernetes cluster. If a Kubernetes cluster experiences an outage, the entire Kafka cluster becomes unavailable, disrupting all connected Kafka clients.
8+
At present, the availability of Strimzi-managed Kafka clusters is directly tied to the availability of the underlying Kubernetes cluster.
9+
If a Kubernetes cluster experiences an outage, the entire Kafka cluster becomes unavailable, disrupting all connected Kafka clients.
810

911
## Motivation
1012

11-
A stretch Kafka cluster allows Kafka nodes to be distributed across multiple Kubernetes clusters, significantly enhancing resilience by enabling the system to tolerate the outage of an entire Kubernetes cluster without disrupting service to clients. This configuration ensures high availability and seamless client operations, even in the event of cluster-specific failures.
13+
A stretch Kafka cluster allows Kafka nodes to be distributed across multiple Kubernetes clusters, significantly enhancing resilience by enabling the system to tolerate the outage of an entire Kubernetes cluster without disrupting service to clients.
14+
This configuration ensures high availability and seamless client operations, even in the event of cluster-specific failures.
1215

13-
In addition to improving fault tolerance, this approach also facilitates other valuable use cases, such as
16+
In addition to improving fault tolerance, this approach also facilitates other valuable use cases, such as:
1417

1518
- **Migration Flexibility**: The ability to move Kafka clusters between Kubernetes environments without downtime, supporting maintenance or migrations.
1619
- **Resource Optimization**: Efficiently utilizing resources across multiple clusters, which can be advantageous in environments with varying cluster capacities or during scaling operations.
1720

1821
### Limitations and Considerations
1922
While a stretch Kafka cluster offers several advantages, it also introduces some challenges and considerations:
2023

21-
- **Increased Network Complexity and Costs**: The communication between brokers and controllers across clusters relies on network connectivity, which can be less reliable and more costly than intra-cluster communication. This necessitates careful consideration of network architecture and associated costs.
24+
- **Increased Network Complexity and Costs**: The communication between brokers and controllers across clusters relies on network connectivity, which can be less reliable and more costly than intra-cluster communication.
25+
This necessitates careful consideration of network architecture and associated costs.
2226

23-
- **Latency Requirements**: The stretch Kafka cluster is best suited for environments with low-latency network connections between the Kubernetes clusters. High latency can adversely affect the performance and synchronization of Kafka nodes, potentially leading to delays or errors in replication and client communication. Defining the minimal acceptable latency between clusters is crucial to ensure optimal performance.
27+
- **Latency Requirements**: The stretch Kafka cluster is best suited for environments with low-latency network connections between the Kubernetes clusters.High latency can adversely affect the performance and synchronization of Kafka nodes, potentially leading to delays or errors in replication and client communication.
28+
Defining the minimal acceptable latency between clusters is crucial to ensure optimal performance.
2429

2530
## Proposal
2631

27-
This proposal seeks to enhance the Strimzi Kafka operator to support stretch Kafka clusters, distributing brokers and controllers across multiple Kubernetes clusters. The intent is to focus on high-availability of the data plane. The proposal outlines high-level topology and design concepts for such deployments, with a plan to incrementally include finer design and implementation details for various aspects.
32+
This proposal seeks to enhance the Strimzi Kafka operator to support stretch Kafka clusters, distributing brokers and controllers across multiple Kubernetes clusters.
33+
The intent is to focus on high-availability of the data plane.The proposal outlines high-level topology and design concepts for such deployments, with a plan to incrementally include finer design and implementation details for various aspects.
2834

2935
### Prerequisites
3036

31-
- **Multiple Kubernetes Clusters**: Stretch Kafka clusters will require multiple Kubernetes clusters. Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.
37+
- **Multiple Kubernetes Clusters**: Stretch Kafka clusters will require multiple Kubernetes clusters.
38+
Ideally, an odd number of clusters (at least three) is needed to maintain quorum in the event of a cluster outage.
3239

33-
- **Low Latency**: Kafka clusters should be deployed in environments that allow low-latency communication between Kafka brokers and controllers. Stretch Kafka clusters should be deployed in environments such as data centers or availability zones within a single region, and not across distant regions where high latency could impair performance.
40+
- **Low Latency**: Kafka clusters should be deployed in environments that allow low-latency communication between Kafka brokers and controllers.
41+
Stretch Kafka clusters should be deployed in environments such as data centers or availability zones within a single region, and not across distant regions where high latency could impair performance.
3442

35-
- **KRaft**: As Kafka and Strimzi transition towards KRaft-based clusters, this proposal focuses exclusively on enabling stretch deployments for KRaft-based Kafka clusters. While Zookeeper-based deployments are still supported, they are outside the scope of this proposal.
43+
- **KRaft**: As Kafka and Strimzi transition towards KRaft-based clusters, this proposal focuses exclusively on enabling stretch deployments for KRaft-based Kafka clusters.
44+
While Zookeeper-based deployments are still supported, they are outside the scope of this proposal.
3645

3746
### Design
3847

39-
The cluster operator will be deployed in all Kubernetes clusters and will manage Kafka brokers/controllers running on that cluster. One Kubernetes cluster will act as the control point for defining custom resources (Kafka, KafkaNodePool) required for stretch Kafka cluster. The KafkaNodePool custom resource will be extended to include information about a Kubernetes cluster where the pool should be deployed. The cluster operator will create necessary resources (StrimziPodSets, services etc.) on the target clusters specified within the KafkaNodePool resource.
48+
The cluster operator will be deployed in all Kubernetes clusters and will manage Kafka brokers/controllers running on that cluster.
49+
One Kubernetes cluster will act as the control point for defining custom resources (Kafka, KafkaNodePool) required for stretch Kafka cluster.
50+
The KafkaNodePool custom resource will be extended to include information about a Kubernetes cluster where the pool should be deployed.
51+
The cluster operator will create necessary resources (StrimziPodSets, services etc.) on the target clusters specified within the KafkaNodePool resource.
4052

41-
This approach will allow users to specify/manage the definition of stretch Kafka cluster in a single location. The operators will then create necessary resources in target Kubernetes clusters, which can then be reconciled/managed by operators on those clusters.
53+
This approach will allow users to specify/manage the definition of stretch Kafka cluster in a single location.
54+
The operators will then create necessary resources in target Kubernetes clusters, which can then be reconciled/managed by operators on those clusters.
4255

4356
### Reconciling Kafka and KafkaNodePool resources
4457
![Reconciling Kafka and KafkaNodePool resources](./images/083-reconciling-kafka-knp.png)
@@ -47,7 +60,8 @@ This approach will allow users to specify/manage the definition of stretch Kafka
4760
![Reconciling SPS](./images/083-reconciling-sps.png)
4861

4962
#### KafkaNodePool changes
50-
A new optional field (`target`) will be introduced in the KafkaNodePool resource specification, to allow users to specify the details of the Kubernetes cluster where the node pool should be deployed. This section will include the target cluster's URL (Kubernetes cluster where resources for this node pool will be created) and the secret containing the kubeconfig data for that cluster.
63+
A new optional field (`target`) will be introduced in the KafkaNodePool resource specification, to allow users to specify the details of the Kubernetes cluster where the node pool should be deployed.
64+
This section will include the target cluster's URL (Kubernetes cluster where resources for this node pool will be created) and the secret containing the kubeconfig data for that cluster.
5165

5266
An example of the KafkaNodePool resource with the new fields might look like:
5367

@@ -106,28 +120,41 @@ spec:
106120
type: ingress
107121
```
108122
109-
A new annotation (`stretch-mode: enabled`) will be introduced in Kafka custom resource to indicate when it is representing a stretch Kafka cluster. This approach is similar to how Strimzi currently enables features like KafkaNodePool (KNP) and KRaft mode.
123+
A new annotation (`stretch-mode: enabled`) will be introduced in Kafka custom resource to indicate when it is representing a stretch Kafka cluster
124+
This approach is similar to how Strimzi currently enables features like KafkaNodePool (KNP) and KRaft mode.
110125

111-
In a stretch Kafka cluster, we'll need bootstrap and broker services to be present on each Kubernetes cluster and be accessible from other clusters. The Kafka reconciler will identify all target clusters from KafkaNodePool resources and create these services in target Kubernetes clusters. This will ensure that even if the central cluster experiences an outage, external clients can still connect to the stretch cluster and continue their operations without interruption.
126+
In a stretch Kafka cluster, we'll need bootstrap and broker services to be present on each Kubernetes cluster and be accessible from other clusters.
127+
The Kafka reconciler will identify all target clusters from KafkaNodePool resources and create these services in target Kubernetes clusters.
128+
This will ensure that even if the central cluster experiences an outage, external clients can still connect to the stretch cluster and continue their operations without interruption.
112129

113130
#### Cross-cluster communication
114-
Kafka controllers/brokers are distributed across multiple Kubernetes environments and will need to communicate with each other. Currently, the Strimzi Kafka operator defines Kafka listeners for internal communication (controlplane and replication) between brokers/controllers (Kubernetes services using ports 9090 and 9091). The user is not able to influence how these services are set up and exposed outside the cluster. We would remove this limitation and allow users to define how these internal listeners are configured in the Kafka resource, just like they do for Kafka client listeners.
131+
Kafka controllers/brokers are distributed across multiple Kubernetes environments and will need to communicate with each other.
132+
Currently, the Strimzi Kafka operator defines Kafka listeners for internal communication (controlplane and replication) between brokers/controllers (Kubernetes services using ports 9090 and 9091).
133+
The user is not able to influence how these services are set up and exposed outside the cluster.
134+
We would remove this limitation and allow users to define how these internal listeners are configured in the Kafka resource, just like they do for Kafka client listeners.
115135

116-
Users will also be able to override listener configurations in each KafkaNodePool resource, if the listeners need to be exposed in different ways (ingress host names, Ingress annotations etc.) for each Kubernetes cluster. This will be similar to how KafkaNodePools are used to override other configuration like storage etc. To override a listener, KafkaNodePool will define configuration with same listener name as in Kafka resource.
136+
Users will also be able to override listener configurations in each KafkaNodePool resource, if the listeners need to be exposed in different ways (ingress host names, Ingress annotations etc.) for each Kubernetes cluster.
137+
This will be similar to how KafkaNodePools are used to override other configuration like storage etc.
138+
To override a listener, KafkaNodePool will define configuration with same listener name as in Kafka resource.
117139

118140
#### Resource cleanup on remote Kubernetes clusters
119-
As some of the Kubernetes resources will be created on a remote cluster, we will not be able to use standard Kubernetes approaches for deleting resources based on owner references. The operator will need to delete remote resources explicitly when the owning resource is deleted.
141+
As some of the Kubernetes resources will be created on a remote cluster, we will not be able to use standard Kubernetes approaches for deleting resources based on owner references.
142+
The operator will need to delete remote resources explicitly when the owning resource is deleted.
120143

121144
- The exact mechanism that will be used for such cleanup in various scenarios is not detailed out yet and will be added here before the proposal is complete.
122145

123146
#### Network policies
124-
In a stretch Kafka cluster, some Network policies will be relaxed to allow communication from other Kubernetes clusters that are specified as targets in various KafkaNodePool resources. This will allow brokers/controllers on separate Kubernetes clusters to communicate effectively.
147+
In a stretch Kafka cluster, some Network policies will be relaxed to allow communication from other Kubernetes clusters that are specified as targets in various KafkaNodePool resources.
148+
This will allow brokers/controllers on separate Kubernetes clusters to communicate effectively.
125149

126150
#### Secrets
127-
We need to create Kubernetes Secrets in the central cluster that will store the credentials required for creating resources on the target clusters. These secrets will be referenced in the KafkaNodePool custom resource.
151+
We need to create Kubernetes Secrets in the central cluster that will store the credentials required for creating resources on the target clusters.
152+
These secrets will be referenced in the KafkaNodePool custom resource.
128153

129154
#### Entity operator
130-
We would recommend that all KafkaTopic and KafkaUser resources are managed from the cluster that holds Kafka and KafkaNodePool resources, and that should be the cluster where the entity operator should be enabled. This will allow all resource management/configuration form a central place. The entity operator should not be impacted by changes in this proposal.
155+
We would recommend that all KafkaTopic and KafkaUser resources are managed from the cluster that holds Kafka and KafkaNodePool resources, and that should be the cluster where the entity operator should be enabled.
156+
This will allow all resource management/configuration form a central place.
157+
The entity operator should not be impacted by changes in this proposal.
131158

132159
## Additional considerations
133160

@@ -140,7 +167,8 @@ Once the general approach is agreed, this proposal will be updated to include an
140167

141168
## Affected/not affected projects
142169

143-
This proposal only impacts strimzi-kafka-operator project.
170+
This proposal only impacts strimzi-kafka-operator project.
171+
144172

145173
## Rejected alternatives
146174

@@ -150,9 +178,13 @@ This proposal only impacts strimzi-kafka-operator project.
150178
![Synchronized ClusterOperator](./images/083-synchronized-clusteroperator.png)
151179

152180

153-
An alternative approach considered was setting up a stretch Kafka cluster with synchronized `KafkaStretchCluster` and `Kafka` custom resources (CRs). The idea was to introduce a new CR called `KafkaStretchCluster`, which would contain details of all the clusters involved in the stretch Kafka deployment. The spec would include information such as cluster names, secrets for connecting to each Kubernetes cluster, and a list of node pools across the entire stretch cluster.
181+
An alternative approach considered was setting up a stretch Kafka cluster with synchronized `KafkaStretchCluster` and `Kafka` custom resources (CRs).
182+
The idea was to introduce a new CR called `KafkaStretchCluster`, which would contain details of all the clusters involved in the stretch Kafka deployment.
183+
The spec would include information such as cluster names, secrets for connecting to each Kubernetes cluster, and a list of node pools across the entire stretch cluster.
154184

155-
The Kafka CR could be created in any of the Kubernetes clusters, and it would be propagated to the remaining clusters through coordinated actions by the Cluster Operator. Similarly, changes to the Kafka CR could be made in any Kubernetes cluster, and once detected by the Cluster Operator, the changes would be propagated to the CRs in the other clusters. The `KafkaNodePool` resources would be deployed to individual Kubernetes clusters, requiring users to apply the KafkaNodePool CR in each cluster separately.
185+
The Kafka CR could be created in any of the Kubernetes clusters, and it would be propagated to the remaining clusters through coordinated actions by the Cluster Operator.
186+
Similarly, changes to the Kafka CR could be made in any Kubernetes cluster, and once detected by the Cluster Operator, the changes would be propagated to the CRs in the other clusters.
187+
The `KafkaNodePool` resources would be deployed to individual Kubernetes clusters, requiring users to apply the KafkaNodePool CR in each cluster separately.
156188

157189
### Pros
158190

0 commit comments

Comments
 (0)