Skip to content

Commit 528d716

Browse files
authored
Address Multi-Cluster Health Check Configuration Inconsistency (#789)
* Task 1: Enhance target group manager to resolve TargetGroupPolicy for ServiceExport target groups * Task 2: Enhance policy helper to support service-based policy resolution * Task 3: Update target group synthesizer to use policy-derived health check configuration * Task 4: Implement health check configuration resolution logic * Task 5: Update ServiceExport controller to watch TargetGroupPolicy changes * Task 6: Add unit tests for policy resolution logic * Task 7: Add integration tests for TargetGroupPolicy application to ServiceExport target groups * Task 8: Add end-to-end tests for health check configuration consistency * Task 9: Update documentation to reflect multi-cluster health check changes * Task 10: Address propagation delay in tests * Address PR comments.
1 parent f9979cd commit 528d716

31 files changed

+3194
-93
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,7 @@ mocks/controller-runtime/client/gomock_reflect_*
2020
pkg/**/prog.*
2121

2222
# Image build tarballed bundles
23-
*.tgz
23+
*.tgz
24+
25+
# Kiro
26+
.kiro

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,12 +120,12 @@ e2e-test: ## Run e2e tests against cluster pointed to by ~/.kube/config
120120
cd test && go test \
121121
-p 1 \
122122
-count 1 \
123-
-timeout 90m \
123+
-timeout 120m \
124124
-v \
125125
./suites/integration/... \
126126
--ginkgo.focus="${FOCUS}" \
127127
--ginkgo.skip="${SKIP}" \
128-
--ginkgo.timeout=90m \
128+
--ginkgo.timeout=120m \
129129
--ginkgo.v
130130

131131
.SILENT:

docs/api-types/service-export.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ for example, using target groups in the VPC Lattice setup outside Kubernetes.
1212
Note that ServiceExport is not the implementation of Kubernetes [Multicluster Service APIs](https://multicluster.sigs.k8s.io/concepts/multicluster-services-api/);
1313
instead AWS Gateway API Controller uses its own version of the resource for the purpose of Gateway API integration.
1414

15+
### TargetGroupPolicy Integration
16+
17+
ServiceExport resources can be targeted by [`TargetGroupPolicy`](target-group-policy.md) to configure protocol, protocol version, and health check settings. When a TargetGroupPolicy is applied to a ServiceExport, the configuration is automatically propagated to all target groups across all clusters that participate in the multi-cluster service mesh, ensuring consistent behavior regardless of which cluster contains the route resource.
18+
1519
### Annotations (Legacy Method)
1620

1721
* `application-networking.k8s.aws/port`
@@ -69,3 +73,45 @@ spec:
6973
This configuration will:
7074
1. Export port 80 to be used with HTTP routes
7175
2. Export port 8081 to be used with gRPC routes
76+
77+
### ServiceExport with TargetGroupPolicy
78+
79+
The following example shows how to combine ServiceExport with TargetGroupPolicy for consistent multi-cluster health check configuration:
80+
81+
```yaml
82+
# ServiceExport
83+
apiVersion: application-networking.k8s.aws/v1alpha1
84+
kind: ServiceExport
85+
metadata:
86+
name: inventory-service
87+
spec:
88+
exportedPorts:
89+
- port: 8080
90+
routeType: HTTP
91+
---
92+
# TargetGroupPolicy for the ServiceExport
93+
apiVersion: application-networking.k8s.aws/v1alpha1
94+
kind: TargetGroupPolicy
95+
metadata:
96+
name: inventory-health-policy
97+
spec:
98+
targetRef:
99+
group: "application-networking.k8s.aws"
100+
kind: ServiceExport
101+
name: inventory-service
102+
protocol: HTTP
103+
protocolVersion: HTTP2
104+
healthCheck:
105+
enabled: true
106+
intervalSeconds: 10
107+
timeoutSeconds: 5
108+
healthyThresholdCount: 2
109+
unhealthyThresholdCount: 3
110+
path: "/health"
111+
port: 8080
112+
protocol: HTTP
113+
protocolVersion: HTTP1
114+
statusMatch: "200-299"
115+
```
116+
117+
This configuration ensures that all target groups created for the `inventory-service` across all clusters will use the same health check configuration, providing consistent health monitoring in multi-cluster deployments.

docs/api-types/target-group-policy.md

Lines changed: 44 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ When attaching a policy to a resource, the following restrictions apply:
1212
- A policy can be attached to `ServiceExport`.
1313
- The attached resource should exist in the same namespace as the policy resource.
1414

15+
### Multi-Cluster Health Check Configuration
16+
17+
In multi-cluster deployments, TargetGroupPolicy health check configurations are automatically propagated across all clusters that participate in the service mesh. When a TargetGroupPolicy is applied to a ServiceExport, all target groups created for that service across different clusters will use the same health check configuration, ensuring consistent health monitoring regardless of which cluster contains the route resource.
18+
1519
The policy will not take effect if:
1620
- The resource does not exist
1721
- The resource is not referenced by any route
@@ -32,12 +36,15 @@ However, the policy will not take effect unless the target is valid.
3236
of VPC Lattice TargetGroup resource, except for health check updates.
3337
- Attaching TargetGroupPolicy to an existing ServiceExport will result in a replacement of VPC Lattice TargetGroup resource, except for health check updates.
3438
- Removing TargetGroupPolicy of a resource will roll back protocol configuration to default setting. (HTTP1/HTTP plaintext)
39+
- In multi-cluster deployments, TargetGroupPolicy changes will automatically propagate to all clusters participating in the service mesh, ensuring consistent configuration across the deployment.
40+
41+
## Example Configurations
3542

36-
## Example Configuration
43+
### Single Cluster Configuration
3744

3845
This will enable HTTPS traffic between the gateway and Kubernetes service, with customized health check configuration.
3946

40-
```
47+
```yaml
4148
apiVersion: application-networking.k8s.aws/v1alpha1
4249
kind: TargetGroupPolicy
4350
metadata:
@@ -61,3 +68,38 @@ spec:
6168
protocolVersion: HTTP1
6269
statusMatch: "200"
6370
```
71+
72+
### Multi-Cluster Configuration
73+
74+
This example shows how to configure health checks for a ServiceExport in a multi-cluster deployment. The health check configuration will be automatically applied to all target groups across all clusters that participate in the service mesh.
75+
76+
```yaml
77+
apiVersion: application-networking.k8s.aws/v1alpha1
78+
kind: TargetGroupPolicy
79+
metadata:
80+
name: multi-cluster-policy
81+
spec:
82+
targetRef:
83+
group: "application-networking.k8s.aws"
84+
kind: ServiceExport
85+
name: inventory-service
86+
protocol: HTTP
87+
protocolVersion: HTTP2
88+
healthCheck:
89+
enabled: true
90+
intervalSeconds: 10
91+
timeoutSeconds: 5
92+
healthyThresholdCount: 2
93+
unhealthyThresholdCount: 3
94+
path: "/health"
95+
port: 8080
96+
protocol: HTTP
97+
protocolVersion: HTTP2
98+
statusMatch: "200-299"
99+
```
100+
101+
In this multi-cluster example:
102+
- The policy targets a `ServiceExport` named `inventory-service`
103+
- All clusters with target groups for this service will use HTTP/2 for traffic and the specified health check configuration
104+
- Health checks will use HTTP/1 on port 8080 with the `/health` endpoint
105+
- The configuration ensures consistent health monitoring across all participating clusters

docs/concepts/concepts.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ AWS Gateway API Controller integrates with Amazon VPC Lattice and allows you to:
1010
* Discover VPC Lattice services spanning multiple Kubernetes clusters.
1111
* Implement a defense-in-depth strategy to secure communication between those services.
1212
* Observe the request/response traffic across the services.
13+
* Ensure consistent health check configuration across multi-cluster deployments through automatic policy propagation.
1314

1415
This documentation describes how to set up the AWS Gateway API Controller, provides example use cases, development concepts, and API references. AWS Gateway API Controller will provide developers the ability to publish services running on Kubernetes cluster and other compute platforms on AWS such as AWS Lambda or Amazon EC2. Once the AWS Gateway API controller deployed and running, you will be able to manage services for multiple Kubernetes clusters and other compute targets on AWS through the following:
1516

docs/concepts/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ In the context of Kubernetes, Amazon VPC Lattice helps to simplify the following
5252
- **Kubernetes multi-cluster connectivity**: Architecting multiple clusters across multiple VPCs.
5353
After configuring your services with the AWS Gateway API Controller, you can facilitate advanced traffic management and application layer routing between services on those clusters without dealing with the underlying infrastructure.
5454
VPC Lattice handles a lot of the details for you without needing things like sidecars.
55+
56+
**Multi-cluster health check consistency**: When using TargetGroupPolicy with ServiceExport resources, health check configurations are automatically propagated across all clusters participating in the service mesh. This ensures consistent health monitoring behavior regardless of which cluster contains the route resource, eliminating configuration drift and improving reliability in multi-cluster deployments.
5557
- **Cross-platform access**: VPC Lattice allows access to serverless and Amazon EC2 features, as well as Kubernetes cluster features.
5658
This gives you a way to have a consistent interface to multiple types of platforms.
5759
- **Implement a defense-in-depth strategy**: Secure communication between services and networks.

docs/faq.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,8 @@ Your AWS VPC CNI must be v1.8.0 or later to work with VPC Lattice.
1616

1717
**Which versions of Gateway API are supported?**
1818

19-
AWS Gateway API Controller supports Gateway API CRD bundle versions `v1.1` or greater. Not all features of Gateway API are supported - for detailed features and limitation, please refer to individual API references. Please note that users are required to install Gateway API CRDs themselves as these are no longer bundled as of release `v1.1.0`. The latest Gateway API CRDs are available [here](https://gateway-api.sigs.k8s.io/). Please [follow this installation](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) process.
19+
AWS Gateway API Controller supports Gateway API CRD bundle versions `v1.1` or greater. Not all features of Gateway API are supported - for detailed features and limitation, please refer to individual API references. Please note that users are required to install Gateway API CRDs themselves as these are no longer bundled as of release `v1.1.0`. The latest Gateway API CRDs are available [here](https://gateway-api.sigs.k8s.io/). Please [follow this installation](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) process.
20+
21+
**How do health checks work in multi-cluster deployments?**
22+
23+
In multi-cluster deployments, when you apply a TargetGroupPolicy to a ServiceExport, the health check configuration is automatically propagated to all target groups across all clusters that participate in the service mesh. This ensures consistent health monitoring behavior regardless of which cluster contains the route resource.

docs/guides/advanced-configurations.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,33 @@ The `{index}` in the annotation corresponds to the zero-based index of the rule
4848

4949
Higher priority values indicate higher precedence, so requests to `/api/v2` will be matched by the first rule (priority 200) before the second rule (priority 100) is considered.
5050

51+
#### Configuring Health Checks for ServiceExport
52+
53+
When you apply a TargetGroupPolicy to a ServiceExport, the health check configuration is automatically propagated to all target groups across all clusters that participate in the service mesh:
54+
55+
```yaml
56+
apiVersion: application-networking.k8s.aws/v1alpha1
57+
kind: TargetGroupPolicy
58+
metadata:
59+
name: multi-cluster-health-policy
60+
spec:
61+
targetRef:
62+
group: "application-networking.k8s.aws"
63+
kind: ServiceExport
64+
name: my-service
65+
healthCheck:
66+
enabled: true
67+
intervalSeconds: 10
68+
timeoutSeconds: 5
69+
healthyThresholdCount: 2
70+
unhealthyThresholdCount: 3
71+
path: "/health"
72+
port: 8080
73+
protocol: HTTP
74+
protocolVersion: HTTP1
75+
statusMatch: "200-299"
76+
```
77+
5178
### IPv6 support
5279

5380
IPv6 address type is automatically used for your services and pods if

docs/guides/getstarted.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,64 @@ This section builds on the previous one. We will be migrating the Kubernetes `in
280280
kubectl apply -f files/examples/inventory-ver2-export.yaml
281281
```
282282
283+
### Configuring Health Checks for Multi-Cluster Services (Optional)
284+
285+
When deploying services across multiple clusters, you can ensure consistent health check configuration by applying a TargetGroupPolicy to your ServiceExport. This ensures that all target groups created for the service across different clusters use the same health check settings.
286+
287+
For example, to configure custom health checks for the inventory-ver2 service:
288+
289+
```yaml
290+
apiVersion: application-networking.k8s.aws/v1alpha1
291+
kind: TargetGroupPolicy
292+
metadata:
293+
name: inventory-health-policy
294+
spec:
295+
targetRef:
296+
group: "application-networking.k8s.aws"
297+
kind: ServiceExport
298+
name: inventory-ver2
299+
healthCheck:
300+
enabled: true
301+
intervalSeconds: 10
302+
timeoutSeconds: 5
303+
healthyThresholdCount: 2
304+
unhealthyThresholdCount: 3
305+
path: "/health"
306+
port: 80
307+
protocol: HTTP
308+
protocolVersion: HTTP1
309+
statusMatch: "200-299"
310+
```
311+
312+
Apply this policy in the same cluster where the ServiceExport is created:
313+
314+
```bash
315+
kubectl apply -f - <<EOF
316+
apiVersion: application-networking.k8s.aws/v1alpha1
317+
kind: TargetGroupPolicy
318+
metadata:
319+
name: inventory-health-policy
320+
spec:
321+
targetRef:
322+
group: "application-networking.k8s.aws"
323+
kind: ServiceExport
324+
name: inventory-ver2
325+
healthCheck:
326+
enabled: true
327+
intervalSeconds: 10
328+
timeoutSeconds: 5
329+
healthyThresholdCount: 2
330+
unhealthyThresholdCount: 3
331+
path: "/health"
332+
port: 80
333+
protocol: HTTP
334+
protocolVersion: HTTP1
335+
statusMatch: "200-299"
336+
EOF
337+
```
338+
339+
This configuration will be automatically applied to all target groups for the inventory-ver2 service across all clusters in your multi-cluster deployment.
340+
283341
**Switch back to the first cluster**
284342
285343
1. Switch context back to the first cluster

pkg/controllers/eventhandlers/service.go

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,14 @@ func (h *serviceEventHandler) MapToServiceExport() handler.EventHandler {
3939
func (h *serviceEventHandler) mapToServiceExport(ctx context.Context, obj client.Object) []reconcile.Request {
4040
var requests []reconcile.Request
4141

42+
// Handle TargetGroupPolicy changes more directly for ServiceExport
43+
if tgp, ok := obj.(*v1alpha1.TargetGroupPolicy); ok {
44+
requests = h.mapTargetGroupPolicyToServiceExport(ctx, tgp)
45+
if len(requests) > 0 {
46+
return requests
47+
}
48+
}
49+
4250
svc := h.mapToService(ctx, obj)
4351
svcExport := h.mapper.ServiceToServiceExport(ctx, svc)
4452
if svcExport != nil {
@@ -65,6 +73,53 @@ func (h *serviceEventHandler) mapToService(ctx context.Context, obj client.Objec
6573
return nil
6674
}
6775

76+
func (h *serviceEventHandler) mapTargetGroupPolicyToServiceExport(ctx context.Context, tgp *v1alpha1.TargetGroupPolicy) []reconcile.Request {
77+
var requests []reconcile.Request
78+
79+
targetRef := tgp.GetTargetRef()
80+
if targetRef == nil {
81+
return requests
82+
}
83+
84+
// Check if the policy directly targets a ServiceExport
85+
if targetRef.Kind == "ServiceExport" && (targetRef.Group == "" || targetRef.Group == v1alpha1.GroupName) {
86+
svcExport := &v1alpha1.ServiceExport{}
87+
key := client.ObjectKey{
88+
Name: string(targetRef.Name),
89+
Namespace: tgp.Namespace,
90+
}
91+
if err := h.client.Get(ctx, key, svcExport); err == nil {
92+
requests = append(requests, reconcile.Request{
93+
NamespacedName: k8s.NamespacedName(svcExport),
94+
})
95+
h.log.Infow(ctx, "TargetGroupPolicy change triggered ServiceExport update",
96+
"policyName", tgp.Namespace+"/"+tgp.Name,
97+
"serviceExportName", svcExport.Namespace+"/"+svcExport.Name)
98+
}
99+
return requests
100+
}
101+
102+
// Check if the policy targets a Service that has a corresponding ServiceExport
103+
if targetRef.Kind == "Service" && (targetRef.Group == "" || targetRef.Group == corev1.GroupName) {
104+
svcExport := &v1alpha1.ServiceExport{}
105+
key := client.ObjectKey{
106+
Name: string(targetRef.Name),
107+
Namespace: tgp.Namespace,
108+
}
109+
if err := h.client.Get(ctx, key, svcExport); err == nil {
110+
requests = append(requests, reconcile.Request{
111+
NamespacedName: k8s.NamespacedName(svcExport),
112+
})
113+
h.log.Infow(ctx, "TargetGroupPolicy change for Service triggered ServiceExport update",
114+
"policyName", tgp.Namespace+"/"+tgp.Name,
115+
"serviceName", string(targetRef.Name),
116+
"serviceExportName", svcExport.Namespace+"/"+svcExport.Name)
117+
}
118+
}
119+
120+
return requests
121+
}
122+
68123
func (h *serviceEventHandler) mapToRoute(ctx context.Context, obj client.Object, routeType core.RouteType) []reconcile.Request {
69124
svc := h.mapToService(ctx, obj)
70125
routes := h.mapper.ServiceToRoutes(ctx, svc, routeType)

0 commit comments

Comments
 (0)