Skip to content

Incorrect check of allocated IP addresses #3657

@ikarlashov

Description

@ikarlashov

Describe the bug
We use Cilium, and it's configured in such a way that the IP addresses of Completed Pods of Jobs can be immediately reused by newly scheduled Pods. So we can end up in the following situation:

NAMESPACE     NAME                    STATUS      IP
jobs           job-12345-abcde      Completed  100.80.196.33
gateway        shared-gw-lm979      Running    100.80.196.33

Log message from nginx-gateway-controller pod:
shared-gw-lm979 nginx time=2025-07-24T08:12:54.771Z level=ERROR msg="Failed to create connection" error="rpc error: code = Internal desc = expected one Pod to have IP address 100.80.196.33, found 2" correlation_id=3dd8925c-6863-11f0-aa7f-26b317e658a7

I think that the problem lies in this line since it's listing all pods despite their current status.

To Reproduce

  • Run a job on a specific node.
  • The job's POD is in Completed state.
  • Run nginx-gateway on the same node after the pod's completion.
  • The nginx-gateway pod shows error messages and traffic isn't served through nginx-gateway.

Expected behavior
Not seeing errors when the nginx-gateway pod is using the same IP address that was previously used by the Completed Pod of the Job.

Your environment
Nginx Gateway Fabric 2.0.2
Cilium 1.15.14, AWS ENI IPAM mode with prefix delegation
K8S 1.31.7-eks-473151a

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcommunityrefinedRequirements are refined and the issue is ready to be implemented.

Type

No type

Projects

Status

✅ Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions