Skip to content

Vault suffering complete freeze after connection issues with the backend #3896

@MrArtichaut

Description

@MrArtichaut

We have multiple Vault instances (in different cluster) using Zookeeper as a backend that suffer complete freeze after some network issues. Symptoms are still the same: Vault doesn't respond anymore and requests (using the CLI or Curl) are stuck for dozens of seconds. A netstat indicates that Vault has no connection to the Zookeeper anymore.

The issue always begin during episode of network issues. Due to network instability, Vault suffers multiple connection failure with Zookeeper and at one point, for no apparent reason, it seems to giveup entirely, leaving the instance without any connection to its backend.

Below are information about one (dev) instance. I've also attached logs (starting at the beginning of the networks issues until we restart it).

Finally, on the same server we have small agent (a sidecar) for registering Vault in our service discovery system. This agent use Zookeeper for registration (the same instance as Vault) and also use the same Go library for the Zookeeper connection (https://github.com/samuel/go-zookeeper). I've also attach the logs of this agent.

What's interesting is that we can see that the agent also suffer connection failure with Zookeeper but eventually recovers and continues as usual the network is one again stable.

vault-logs.txt

agent-logs.txt

Environment:

Key             Value
---             -----
Seal Type       shamir
Sealed          false
Total Shares    5
Threshold       2
Version         0.9.3
Cluster Name    dev-cluster
Cluster ID      439ec622-58c4-0199-c815-4a7798985862
HA Enabled      true
HA Mode         active
HA Cluster      https://vault.dev.net:8201
  • Vault Version: 0.9.3
  • Operating System/Architecture: CentOS 7

Vault Config File:

backend "zookeeper" {
	path = "vault"
	address = "zookeeper.dev.net"
	znode_owner = "digest:vault:************"
	auth_info = "digest:vault:*******"
	redirect_addr ="http://vault.dev.net:8200"
}

listener "tcp" {
	address = "0.0.0.0:8200"
	tls_disable = 1
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions