Retry socket connection failures to Kubernetes service

We see this once in a while for a job that fails:

> SocketError - Failed to open TCP connection to kubernetes.default.svc:443 (getaddrinfo: Name or service not known)

The issue is that `kube-dns` didn't respond with an address. This can happen when the CPU on the node is under extreme load.

As we are already retrying `KubeClient::HttpError` matching `/Timed out/` we should add this pattern to the list that we retry.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry socket connection failures to Kubernetes service #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Retry socket connection failures to Kubernetes service #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions