-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
We have multiple Vault instances (in different cluster) using Zookeeper as a backend that suffer complete freeze after some network issues. Symptoms are still the same: Vault doesn't respond anymore and requests (using the CLI or Curl) are stuck for dozens of seconds. A netstat indicates that Vault has no connection to the Zookeeper anymore.
The issue always begin during episode of network issues. Due to network instability, Vault suffers multiple connection failure with Zookeeper and at one point, for no apparent reason, it seems to giveup entirely, leaving the instance without any connection to its backend.
Below are information about one (dev) instance. I've also attached logs (starting at the beginning of the networks issues until we restart it).
Finally, on the same server we have small agent (a sidecar) for registering Vault in our service discovery system. This agent use Zookeeper for registration (the same instance as Vault) and also use the same Go library for the Zookeeper connection (https://github.com/samuel/go-zookeeper). I've also attach the logs of this agent.
What's interesting is that we can see that the agent also suffer connection failure with Zookeeper but eventually recovers and continues as usual the network is one again stable.
Environment:
Key Value
--- -----
Seal Type shamir
Sealed false
Total Shares 5
Threshold 2
Version 0.9.3
Cluster Name dev-cluster
Cluster ID 439ec622-58c4-0199-c815-4a7798985862
HA Enabled true
HA Mode active
HA Cluster https://vault.dev.net:8201
- Vault Version: 0.9.3
- Operating System/Architecture: CentOS 7
Vault Config File:
backend "zookeeper" {
path = "vault"
address = "zookeeper.dev.net"
znode_owner = "digest:vault:************"
auth_info = "digest:vault:*******"
redirect_addr ="http://vault.dev.net:8200"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}