Skip to content

Conversation

@bthery
Copy link

@bthery bthery commented May 17, 2019

Here is a PR that adds to Mooshika some kind of support for applications that forks.
It adds two things:

  • msk_fork_init() : an helper that exposes ibv_fork_init() in the Mooshika API

  • msk_lib_reset() : a function a process can call to reset the library global state (to be called by child process after fork())

mks_lib_reset() doesn't free the rdma resources that may have been allocated by parent and haven't been freed before fork (they are leaked in the child process), but prevents some leaks, hangs and crashes if the process doesn't exec() after fork(). It is a "better than nothing" solution.

msk_reset_lib() also calls Mellanox-specific routine rdma_lib_reset() to reset librdmacm state if Mooshika has been compiled with Mellanox's librdmacm.

In our product (IO interception via a LD_PRELOAD'ed library and redirection to a proxy via IB), we need these interfaces, because we don't know in advance what is the behavior of the processes we're instrumenting. We don't if they will fork() or not, and we must be prepared for this case.

bthery added 4 commits May 9, 2019 18:48
Add msk_reset_lib() interface that allows to reinitialize the
library global structure in a child process after a fork.

It doesn't free the rdma resources that may have been allocated
by parent and haven't been freed before fork (they are leaked in the
child process), but prevents some hangs and crashes if the process
doesn't exec() after fork(). Better than nothing.

msk_reset_lib() also calls Mellanox-specific routine rdma_lib_reset()
to reset librdmacm state if Mooshika has been compiled with
Mellanox's librdmacm.
Leak less fd's in child process by closing epoll fd stored in global
structure.
/* Brutal re-initialization of global state */
memset(msk_global_state, 0, sizeof(*msk_global_state));

msk_global_state->run_threads = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do that beforefreeing the various items/closing fd and try to join threads like in msk_internals_fini

Actually we might be able to reuse these directly, it will free/realloc msk_global_state but the cost is negligible compared to thread joins etc so just something like msk_internals_fini(); rdma_lib_reset(); msk_internals_init(); is probably cleanest?

free(msk_global_state->worker_pool.thrids);

if (msk_global_state->worker_pool.wd_queue)
free(msk_global_state->worker_pool.wd_queue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(these are missing in msk_internals_fini and should be added there regardless of reusing the functions; thanks for finding these)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants