Skip to content

Commit 191a459

Browse files
H-Huangfacebook-github-bot
authored andcommitted
Allow ports to be reused in gloo
Summary: ProcessGroupGloo and gloo seem to be opening and closing sockets without allowing the port to be reused. We see this issue pop up in larger training jobs "Address already in use" and we assume it to be because all the ephemeral ports are exhausted. This diff allows ports to be reused, we see a reduced number of ports being in `TIME_WAIT` state. context: https://fb.workplace.com/groups/319878845696681/permalink/5988899781205532/ another issue: https://fb.workplace.com/groups/319878845696681/permalink/958768178474408/ Differential Revision: D44029927 fbshipit-source-id: 45e7305df8c5fae764a5d93478ac007f604be7dd
1 parent 56b221c commit 191a459

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

gloo/transport/tcp/device.cc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,16 @@ static void lookupAddrForHostname(struct attr& attr) {
101101
struct addrinfo* rp;
102102
for (rp = result; rp != nullptr; rp = rp->ai_next) {
103103
auto fd = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
104+
105+
// Set SO_REUSEADDR to signal that reuse of the listening port is OK.
106+
printf("in tcp/device.cc");
107+
int on = 1;
108+
rv = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
109+
if (rv == -1) {
110+
close(fd);
111+
GLOO_ENFORCE_NE(rv, -1);
112+
}
113+
104114
if (fd == -1) {
105115
continue;
106116
}

gloo/transport/tcp/pair.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,9 @@ void Pair::listen() {
162162
signalAndThrowException(GLOO_ERROR_MSG("socket: ", strerror(errno)));
163163
}
164164

165-
// Set SO_REUSEADDR to signal that reuse of the listening port is OK.
165+
// Set SO_REUSEPORT to signal that reuse of the listening port is OK.
166166
int on = 1;
167-
rv = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
167+
rv = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &on, sizeof(on));
168168
if (rv == -1) {
169169
::close(fd);
170170
signalAndThrowException(GLOO_ERROR_MSG("setsockopt: ", strerror(errno)));

0 commit comments

Comments
 (0)