Skip to content

Retry socket connection failures to Kubernetes service #26

@jeremywadsack

Description

@jeremywadsack

We see this once in a while for a job that fails:

SocketError - Failed to open TCP connection to kubernetes.default.svc:443 (getaddrinfo: Name or service not known)

The issue is that kube-dns didn't respond with an address. This can happen when the CPU on the node is under extreme load.

As we are already retrying KubeClient::HttpError matching /Timed out/ we should add this pattern to the list that we retry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions