-
Notifications
You must be signed in to change notification settings - Fork 131
Description
Describe the bug
We use Cilium, and it's configured in such a way that the IP addresses of Completed Pods of Jobs can be immediately reused by newly scheduled Pods. So we can end up in the following situation:
NAMESPACE NAME STATUS IP
jobs job-12345-abcde Completed 100.80.196.33
gateway shared-gw-lm979 Running 100.80.196.33
Log message from nginx-gateway-controller pod:
shared-gw-lm979 nginx time=2025-07-24T08:12:54.771Z level=ERROR msg="Failed to create connection" error="rpc error: code = Internal desc = expected one Pod to have IP address 100.80.196.33, found 2" correlation_id=3dd8925c-6863-11f0-aa7f-26b317e658a7
I think that the problem lies in this line since it's listing all pods despite their current status.
To Reproduce
- Run a job on a specific node.
- The job's POD is in Completed state.
- Run nginx-gateway on the same node after the pod's completion.
- The nginx-gateway pod shows error messages and traffic isn't served through nginx-gateway.
Expected behavior
Not seeing errors when the nginx-gateway pod is using the same IP address that was previously used by the Completed Pod of the Job.
Your environment
Nginx Gateway Fabric 2.0.2
Cilium 1.15.14, AWS ENI IPAM mode with prefix delegation
K8S 1.31.7-eks-473151a
Metadata
Metadata
Assignees
Labels
Type
Projects
Status