We see this once in a while for a job that fails:
SocketError - Failed to open TCP connection to kubernetes.default.svc:443 (getaddrinfo: Name or service not known)
The issue is that kube-dns didn't respond with an address. This can happen when the CPU on the node is under extreme load.
As we are already retrying KubeClient::HttpError matching /Timed out/ we should add this pattern to the list that we retry.