|
| 1 | +--- |
| 2 | +description: Learn to use the built-in network debugger to debug overlay networking problems |
| 3 | +keywords: network, troubleshooting, debug |
| 4 | +title: Debug overlay or swarm networking issues |
| 5 | +--- |
| 6 | + |
| 7 | +**WARNING** |
| 8 | +This tool can change the internal state of the libnetwork API, be really mindful |
| 9 | +on its use and read carefully the following guide. Improper use of it will damage |
| 10 | +or permanently destroy the network configuration. |
| 11 | + |
| 12 | + |
| 13 | +Docker CE 17.12 and higher introduce a network debugging tool designed to help |
| 14 | +debug issues with overlay networks and swarm services running on Linux hosts. |
| 15 | +When enabled, a network diagnostic server listens on the specified port and |
| 16 | +provides diagnostic information. The network debugging tool should only be |
| 17 | +started to debug specific issues, and should not be left running all the time. |
| 18 | + |
| 19 | +Information about networks is stored in the database, which can be examined using |
| 20 | +the API. Currently the database contains information about the overlay network |
| 21 | +as well as the service discovery data. |
| 22 | + |
| 23 | +The Docker API exposes endpoints to query and control the network debugging |
| 24 | +tool. CLI integration is provided as a preview, but the implementation is not |
| 25 | +yet considered stable and commands and options may change without notice. |
| 26 | + |
| 27 | +The tool is available into 2 forms: |
| 28 | +1) client only: dockereng/network-diagnostic:onlyclient |
| 29 | +2) docker in docker version: dockereng/network-diagnostic:17.12-dind |
| 30 | +The latter allows to use the tool with a cluster running an engine older than 17.12 |
| 31 | + |
| 32 | +## Enable the diagnostic server |
| 33 | + |
| 34 | +The tool currently only works on Docker hosts running on Linux. To enable it on a node |
| 35 | +follow the step below. |
| 36 | + |
| 37 | +1. Set the `network-diagnostic-port` to a port which is free on the Docker |
| 38 | + host, in the `/etc/docker/daemon.json` configuration file. |
| 39 | + |
| 40 | + ```json |
| 41 | + “network-diagnostic-port”: <port> |
| 42 | + ``` |
| 43 | + |
| 44 | +2. Get the process ID (PID) of the `dockerd` process. It is the second field in |
| 45 | + the output, and is typically a number from 2 to 6 digits long. |
| 46 | + |
| 47 | + ```bash |
| 48 | + $ ps aux |grep dockerd | grep -v grep |
| 49 | + ``` |
| 50 | + |
| 51 | +3. Reload the Docker configuration without restarting Docker, by sending the |
| 52 | + `HUP` signal to the PID you found in the previous step. |
| 53 | + |
| 54 | + ```bash |
| 55 | + kill -HUP <pid-of-dockerd> |
| 56 | + ``` |
| 57 | + |
| 58 | +If systemd is used the command `systemctl reload docker` will be enough |
| 59 | + |
| 60 | + |
| 61 | +A message like the following will appear in the Docker host logs: |
| 62 | + |
| 63 | +```none |
| 64 | +Starting the diagnostic server listening on <port> for commands |
| 65 | +``` |
| 66 | + |
| 67 | +## Disable the diagnostic tool |
| 68 | + |
| 69 | +Repeat these steps for each node participating in the swarm. |
| 70 | + |
| 71 | +1. Remove the `network-diagnostic-port` key from the `/etc/docker/daemon.json` |
| 72 | + configuration file. |
| 73 | + |
| 74 | +2. Get the process ID (PID) of the `dockerd` process. It is the second field in |
| 75 | + the output, and is typically a number from 2 to 6 digits long. |
| 76 | + |
| 77 | + ```bash |
| 78 | + $ ps aux |grep dockerd | grep -v grep |
| 79 | + ``` |
| 80 | + |
| 81 | +3. Reload the Docker configuration without restarting Docker, by sending the |
| 82 | + `HUP` signal to the PID you found in the previous step. |
| 83 | + |
| 84 | + ```bash |
| 85 | + kill -HUP <pid-of-dockerd> |
| 86 | + ``` |
| 87 | + |
| 88 | +A message like the following will appear in the Docker host logs: |
| 89 | + |
| 90 | +```none |
| 91 | +Disabling the diagnostic server |
| 92 | +``` |
| 93 | + |
| 94 | +## Access the diagnostic tool's API |
| 95 | + |
| 96 | +The network diagnostic tool exposes its own RESTful API. To access the API, |
| 97 | +send a HTTP request to the port where the tool is listening. The following |
| 98 | +commands assume the tool is listening on port 2000. |
| 99 | + |
| 100 | +Examples are not given for every endpoint. |
| 101 | + |
| 102 | +### Get help |
| 103 | + |
| 104 | +```bash |
| 105 | +$ curl localhost:2000/help |
| 106 | +
|
| 107 | +OK |
| 108 | +/updateentry |
| 109 | +/getentry |
| 110 | +/gettable |
| 111 | +/leavenetwork |
| 112 | +/createentry |
| 113 | +/help |
| 114 | +/clusterpeers |
| 115 | +/ready |
| 116 | +/joinnetwork |
| 117 | +/deleteentry |
| 118 | +/networkpeers |
| 119 | +/ |
| 120 | +/join |
| 121 | +``` |
| 122 | + |
| 123 | +### Join or leave the network database cluster |
| 124 | + |
| 125 | +```bash |
| 126 | +$ curl localhost:2000/join?members=ip1,ip2,... |
| 127 | +``` |
| 128 | + |
| 129 | +```bash |
| 130 | +$ curl localhost:2000/leave?members=ip1,ip2,... |
| 131 | +``` |
| 132 | + |
| 133 | +`ip1`, `ip2`, ... are the swarm node ips (usually one is enough) |
| 134 | + |
| 135 | +### Join or leave a network |
| 136 | + |
| 137 | +```bash |
| 138 | +$ curl localhost:2000/joinnetwork?nid=<network id> |
| 139 | +``` |
| 140 | + |
| 141 | +```bash |
| 142 | +$ curl localhost:2000/leavenetwork?nid=<network id> |
| 143 | +``` |
| 144 | + |
| 145 | +`network id` can be retrieved on the manager with `docker network ls --no-trunc` and has |
| 146 | +to be the full length identifier |
| 147 | + |
| 148 | +### List cluster peers |
| 149 | + |
| 150 | +```bash |
| 151 | +$ curl localhost:2000/clusterpeers |
| 152 | +``` |
| 153 | + |
| 154 | +### List nodes connected to a given network |
| 155 | + |
| 156 | +```bash |
| 157 | +$ curl localhost:2000/networkpeers?nid=<network id> |
| 158 | +``` |
| 159 | +`network id` can be retrieved on the manager with `docker network ls --no-trunc` and has |
| 160 | +to be the full length identifier |
| 161 | + |
| 162 | +### Dump database tables |
| 163 | + |
| 164 | +The tables are called `endpoint_table` and `overlay_peer_table`. |
| 165 | +The `overlay_peer_table` contains all the overlay forwarding information |
| 166 | +The `endpoint_table` contains all the service discovery information |
| 167 | + |
| 168 | +```bash |
| 169 | +$ curl localhost:2000/gettable?nid=<network id>&tname=<table name> |
| 170 | +``` |
| 171 | + |
| 172 | +### Interact with a specific database table |
| 173 | + |
| 174 | +The tables are called `endpoint_table` and `overlay_peer_table`. |
| 175 | + |
| 176 | +```bash |
| 177 | +$ curl localhost:2000/<method>?nid=<network id>&tname=<table name>&key=<key>[&value=<value>] |
| 178 | +``` |
| 179 | + |
| 180 | +Note: |
| 181 | +operations on tables have node ownership, this means that are going to remain persistent till |
| 182 | +the node that inserted them is part of the cluster |
| 183 | + |
| 184 | +## Access the diagnostic tool's CLI |
| 185 | + |
| 186 | +The CLI is provided as a preview and is not yet stable. Commands or options may |
| 187 | +change at any time. |
| 188 | + |
| 189 | +The CLI executable is called `diagnosticClient` and is made available using a |
| 190 | +standalone container. |
| 191 | + |
| 192 | +`docker run --net host dockereng/network-diagnostic:onlyclient -v -net <full network id> -t sd` |
| 193 | + |
| 194 | +The following flags are supported: |
| 195 | + |
| 196 | +| Flag | Description | |
| 197 | +|---------------|-------------------------------------------------| |
| 198 | +| -t <string> | Table one of `sd` or `overlay`. | |
| 199 | +| -ip <string> | The IP address to query. Defaults to 127.0.0.1. | |
| 200 | +| -net <string> | The target network ID. | |
| 201 | +| -port <int> | The target port. (default port is 2000) | |
| 202 | +| -v | Enable verbose output. | |
| 203 | + |
| 204 | +### Container version of the diagnostic tool |
| 205 | + |
| 206 | +The CLI is provided as a container with a 17.12 engine that needs to run using privileged mode. |
| 207 | +*NOTE* |
| 208 | +Remember that table operations have ownership, so any `create entry` will be persistent till |
| 209 | +the diagnostic container is part of the swarm. |
| 210 | + |
| 211 | +1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f` |
| 212 | + |
| 213 | +2. To run the container, use a command like the following: |
| 214 | + |
| 215 | + ```bash |
| 216 | + $ docker container run --name net-diagnostic -d --privileged --network host dockereng/network-diagnostic:17.12-dind |
| 217 | + ``` |
| 218 | + |
| 219 | +3. Connect to the container using `docker exec -it <container-ID> sh`, |
| 220 | + and start the server using the following command: |
| 221 | + |
| 222 | + ```bash |
| 223 | + $ kill -HUP 1 |
| 224 | + ``` |
| 225 | + |
| 226 | +4. Join the diagnostic container to the swarm, then run the diagnostic CLI within the container. |
| 227 | + |
| 228 | + ```bash |
| 229 | + $ ./diagnosticClient <flags>... |
| 230 | + ``` |
| 231 | + |
| 232 | +4. When finished debugging, leave the swarm and stop the container. |
| 233 | + |
| 234 | +### Examples |
| 235 | + |
| 236 | +The following commands dump the service discovery table and verify node |
| 237 | +ownership. |
| 238 | + |
| 239 | +*NOTE* |
| 240 | +Remember to use the full network ID, you can easily find that with `docker network ls --no-trunc` |
| 241 | + |
| 242 | +**Service discovery and load balancer:** |
| 243 | + |
| 244 | +```bash |
| 245 | +$ diagnostiClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
| 246 | +``` |
| 247 | + |
| 248 | +**Overlay network:** |
| 249 | + |
| 250 | +```bash |
| 251 | +$ diagnostiClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
| 252 | +``` |
0 commit comments