Skip to content

Commit 20dd462

Browse files
author
Flavio Crisciani
authored
Merge pull request #2032 from fcrisciani/debug-client
Diagnostic client
2 parents df78639 + be91c3e commit 20dd462

File tree

15 files changed

+614
-148
lines changed

15 files changed

+614
-148
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ build-local:
2828
@mkdir -p "bin"
2929
go build -tags experimental -o "bin/dnet" ./cmd/dnet
3030
go build -o "bin/docker-proxy" ./cmd/proxy
31+
GOOS=linux go build -o "./cmd/diagnostic/diagnosticClient" ./cmd/diagnostic
3132

3233
clean:
3334
@echo "🐳 $@"

agent.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -297,8 +297,8 @@ func (c *controller) agentInit(listenAddr, bindAddrOrInterface, advertiseAddr, d
297297
return err
298298
}
299299

300-
// Register the diagnose handlers
301-
c.DiagnoseServer.RegisterHandler(nDB, networkdb.NetDbPaths2Func)
300+
// Register the diagnostic handlers
301+
c.DiagnosticServer.RegisterHandler(nDB, networkdb.NetDbPaths2Func)
302302

303303
var cancelList []func()
304304
ch, cancel := nDB.Watch(libnetworkEPTable, "", "")

cmd/diagnostic/Dockerfile.client

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
FROM alpine
2+
RUN apk add --no-cache curl
3+
COPY diagnosticClient /usr/local/bin/diagnosticClient
4+
ENTRYPOINT ["/usr/local/bin/diagnosticClient"]

cmd/diagnostic/Dockerfile.dind

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
FROM docker:17.12-dind
2+
RUN apk add --no-cache curl
3+
COPY daemon.json /etc/docker/daemon.json
4+
COPY diagnosticClient /usr/local/bin/diagnosticClient

cmd/diagnostic/README.md

Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
---
2+
description: Learn to use the built-in network debugger to debug overlay networking problems
3+
keywords: network, troubleshooting, debug
4+
title: Debug overlay or swarm networking issues
5+
---
6+
7+
**WARNING**
8+
This tool can change the internal state of the libnetwork API, be really mindful
9+
on its use and read carefully the following guide. Improper use of it will damage
10+
or permanently destroy the network configuration.
11+
12+
13+
Docker CE 17.12 and higher introduce a network debugging tool designed to help
14+
debug issues with overlay networks and swarm services running on Linux hosts.
15+
When enabled, a network diagnostic server listens on the specified port and
16+
provides diagnostic information. The network debugging tool should only be
17+
started to debug specific issues, and should not be left running all the time.
18+
19+
Information about networks is stored in the database, which can be examined using
20+
the API. Currently the database contains information about the overlay network
21+
as well as the service discovery data.
22+
23+
The Docker API exposes endpoints to query and control the network debugging
24+
tool. CLI integration is provided as a preview, but the implementation is not
25+
yet considered stable and commands and options may change without notice.
26+
27+
The tool is available into 2 forms:
28+
1) client only: dockereng/network-diagnostic:onlyclient
29+
2) docker in docker version: dockereng/network-diagnostic:17.12-dind
30+
The latter allows to use the tool with a cluster running an engine older than 17.12
31+
32+
## Enable the diagnostic server
33+
34+
The tool currently only works on Docker hosts running on Linux. To enable it on a node
35+
follow the step below.
36+
37+
1. Set the `network-diagnostic-port` to a port which is free on the Docker
38+
host, in the `/etc/docker/daemon.json` configuration file.
39+
40+
```json
41+
“network-diagnostic-port”: <port>
42+
```
43+
44+
2. Get the process ID (PID) of the `dockerd` process. It is the second field in
45+
the output, and is typically a number from 2 to 6 digits long.
46+
47+
```bash
48+
$ ps aux |grep dockerd | grep -v grep
49+
```
50+
51+
3. Reload the Docker configuration without restarting Docker, by sending the
52+
`HUP` signal to the PID you found in the previous step.
53+
54+
```bash
55+
kill -HUP <pid-of-dockerd>
56+
```
57+
58+
If systemd is used the command `systemctl reload docker` will be enough
59+
60+
61+
A message like the following will appear in the Docker host logs:
62+
63+
```none
64+
Starting the diagnostic server listening on <port> for commands
65+
```
66+
67+
## Disable the diagnostic tool
68+
69+
Repeat these steps for each node participating in the swarm.
70+
71+
1. Remove the `network-diagnostic-port` key from the `/etc/docker/daemon.json`
72+
configuration file.
73+
74+
2. Get the process ID (PID) of the `dockerd` process. It is the second field in
75+
the output, and is typically a number from 2 to 6 digits long.
76+
77+
```bash
78+
$ ps aux |grep dockerd | grep -v grep
79+
```
80+
81+
3. Reload the Docker configuration without restarting Docker, by sending the
82+
`HUP` signal to the PID you found in the previous step.
83+
84+
```bash
85+
kill -HUP <pid-of-dockerd>
86+
```
87+
88+
A message like the following will appear in the Docker host logs:
89+
90+
```none
91+
Disabling the diagnostic server
92+
```
93+
94+
## Access the diagnostic tool's API
95+
96+
The network diagnostic tool exposes its own RESTful API. To access the API,
97+
send a HTTP request to the port where the tool is listening. The following
98+
commands assume the tool is listening on port 2000.
99+
100+
Examples are not given for every endpoint.
101+
102+
### Get help
103+
104+
```bash
105+
$ curl localhost:2000/help
106+
107+
OK
108+
/updateentry
109+
/getentry
110+
/gettable
111+
/leavenetwork
112+
/createentry
113+
/help
114+
/clusterpeers
115+
/ready
116+
/joinnetwork
117+
/deleteentry
118+
/networkpeers
119+
/
120+
/join
121+
```
122+
123+
### Join or leave the network database cluster
124+
125+
```bash
126+
$ curl localhost:2000/join?members=ip1,ip2,...
127+
```
128+
129+
```bash
130+
$ curl localhost:2000/leave?members=ip1,ip2,...
131+
```
132+
133+
`ip1`, `ip2`, ... are the swarm node ips (usually one is enough)
134+
135+
### Join or leave a network
136+
137+
```bash
138+
$ curl localhost:2000/joinnetwork?nid=<network id>
139+
```
140+
141+
```bash
142+
$ curl localhost:2000/leavenetwork?nid=<network id>
143+
```
144+
145+
`network id` can be retrieved on the manager with `docker network ls --no-trunc` and has
146+
to be the full length identifier
147+
148+
### List cluster peers
149+
150+
```bash
151+
$ curl localhost:2000/clusterpeers
152+
```
153+
154+
### List nodes connected to a given network
155+
156+
```bash
157+
$ curl localhost:2000/networkpeers?nid=<network id>
158+
```
159+
`network id` can be retrieved on the manager with `docker network ls --no-trunc` and has
160+
to be the full length identifier
161+
162+
### Dump database tables
163+
164+
The tables are called `endpoint_table` and `overlay_peer_table`.
165+
The `overlay_peer_table` contains all the overlay forwarding information
166+
The `endpoint_table` contains all the service discovery information
167+
168+
```bash
169+
$ curl localhost:2000/gettable?nid=<network id>&tname=<table name>
170+
```
171+
172+
### Interact with a specific database table
173+
174+
The tables are called `endpoint_table` and `overlay_peer_table`.
175+
176+
```bash
177+
$ curl localhost:2000/<method>?nid=<network id>&tname=<table name>&key=<key>[&value=<value>]
178+
```
179+
180+
Note:
181+
operations on tables have node ownership, this means that are going to remain persistent till
182+
the node that inserted them is part of the cluster
183+
184+
## Access the diagnostic tool's CLI
185+
186+
The CLI is provided as a preview and is not yet stable. Commands or options may
187+
change at any time.
188+
189+
The CLI executable is called `diagnosticClient` and is made available using a
190+
standalone container.
191+
192+
`docker run --net host dockereng/network-diagnostic:onlyclient -v -net <full network id> -t sd`
193+
194+
The following flags are supported:
195+
196+
| Flag | Description |
197+
|---------------|-------------------------------------------------|
198+
| -t <string> | Table one of `sd` or `overlay`. |
199+
| -ip <string> | The IP address to query. Defaults to 127.0.0.1. |
200+
| -net <string> | The target network ID. |
201+
| -port <int> | The target port. (default port is 2000) |
202+
| -v | Enable verbose output. |
203+
204+
### Container version of the diagnostic tool
205+
206+
The CLI is provided as a container with a 17.12 engine that needs to run using privileged mode.
207+
*NOTE*
208+
Remember that table operations have ownership, so any `create entry` will be persistent till
209+
the diagnostic container is part of the swarm.
210+
211+
1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f`
212+
213+
2. To run the container, use a command like the following:
214+
215+
```bash
216+
$ docker container run --name net-diagnostic -d --privileged --network host dockereng/network-diagnostic:17.12-dind
217+
```
218+
219+
3. Connect to the container using `docker exec -it <container-ID> sh`,
220+
and start the server using the following command:
221+
222+
```bash
223+
$ kill -HUP 1
224+
```
225+
226+
4. Join the diagnostic container to the swarm, then run the diagnostic CLI within the container.
227+
228+
```bash
229+
$ ./diagnosticClient <flags>...
230+
```
231+
232+
4. When finished debugging, leave the swarm and stop the container.
233+
234+
### Examples
235+
236+
The following commands dump the service discovery table and verify node
237+
ownership.
238+
239+
*NOTE*
240+
Remember to use the full network ID, you can easily find that with `docker network ls --no-trunc`
241+
242+
**Service discovery and load balancer:**
243+
244+
```bash
245+
$ diagnostiClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4
246+
```
247+
248+
**Overlay network:**
249+
250+
```bash
251+
$ diagnostiClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4
252+
```

cmd/diagnostic/daemon.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"debug": true,
3+
"network-diagnostic-port": 2000
4+
}

0 commit comments

Comments
 (0)