Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions src/trans_rdma.c
Original file line number Diff line number Diff line change
Expand Up @@ -1186,7 +1186,8 @@ static void *msk_cq_thread(void *arg) {
msk_mutex_lock(trans->debug & MSK_DEBUG_CM_LOCKS, &trans->cm_lock);
if (trans->state >= MSK_CLOSING) { /* CLOSING, CLOSED, ERROR */
// closing trans, skip this, will be done on flush
msk_cq_delfd(trans);
if (trans->comp_channel)
msk_cq_delfd(trans);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I realize the old code (the other place where we check comp_channel before calling msk_cq_delfd) has the same problem, but I'm wondering if this is safe race-wise...
If the application calls msk_destroy_trans themselves for a reason or another I believe this could happen; I'd call msk_destroy_qp with trans->cm_lock taken and move the other msk_cq_delfd within the locked section as well. What do you think?

(In the first place we probably don't need a different comp_channel for each child trans now that I'm reading this code again, if we call ibv_create_cq with the "parent's" comp_channel it should just work.. We'd need to shuffle code a bit and call ibv_get_cq_event to grab the cq and get the trans from its cq_context though)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've just realized I left this PR on its own for a long time.

I don't remember exactly how the hang occurs. I think it was related with processes that fork.
(I've some code for adding some kind of support for fork in mooshika (this is something we need to support in our app even if fork should be avoided when using verbs))

Concerning the crash, I don't know the internals of mooshika very well, but I tend to agree with you about doing the msk_destroy_qp() and msk_cq_delfd() with the cm_lock held.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder how ib deals with forking when memory is locked and already registered... I'll have to admit I never tried.

Anyway, how do you want to proceed? Can you still reproduce somehow to test if holding the cm_lock also fixes the problem?
Given I don't have the reproducer I can't really go about adding the lock and closing this if I don't know if it really helps; but I'd rather not merge this one if the race gets fixed with a lock.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the fork() + ib, the general advice is: don't fork!
Anyway there is the interface ibv_fork_init() that's supposed to help with fork() support. My understanding is it keeps track of registered memory regions and mark them as "don't fork".

I've a branch in my mooshika repo which adds an helper to expose ibv_fork_init() (via msk_fork_init(), clever name, isn't it) and adds a new interface msk_lib_reset() similar to what is proposed by the Mellanox version of librdmacm: rdma_lib_reset(), which can be called to reset the library global state in a child process. I will create a PR for it sometime.

About the reproducer for the hang, I think it happens while I was doing tests around the fork(), and msk_destroy_trans() was called twice for the same transport (probably once in the parent and once in the child). It was in some faulty code.

msk_mutex_unlock(trans->debug & MSK_DEBUG_CM_LOCKS, &trans->cm_lock);
continue;
}
Expand Down Expand Up @@ -1333,6 +1334,7 @@ static void msk_destroy_qp(struct msk_trans *trans) {
*/
void msk_destroy_trans(struct msk_trans **ptrans) {
struct msk_trans *trans = *ptrans;
int ret = 0;

if (trans) {
trans->destroy_on_disconnect = 0;
Expand All @@ -1341,8 +1343,13 @@ void msk_destroy_trans(struct msk_trans **ptrans) {
if (trans->state != MSK_CLOSED && trans->state != MSK_LISTENING && trans->state != MSK_ERROR)
trans->state = MSK_CLOSING;

if (trans->cm_id && trans->cm_id->verbs)
rdma_disconnect(trans->cm_id);
if (trans->cm_id && trans->cm_id->verbs) {
ret = rdma_disconnect(trans->cm_id);
if (ret) {
ERROR_LOG("rdma_disconnect() failed: %s", strerror(errno));
return;
}
}

while (trans->state != MSK_CLOSED && trans->state != MSK_LISTENING && trans->state != MSK_ERROR) {
INFO_LOG(trans->debug & MSK_DEBUG_SETUP, "we're not closed yet, waiting for disconnect_event");
Expand Down