Skip to content

glusterd with high cpu usage and no response for gluster commands #4601

@harleyxu-xhl

Description

@harleyxu-xhl

int
glusterd_shdsvc_restart()
{
...

pthread_mutex_lock(&conf->volume_lock);
cds_list_for_each_entry_safe(volinfo, tmp, &conf->volumes, vol_list)
{
    glusterd_volinfo_ref(volinfo);
    pthread_mutex_unlock(&conf->volume_lock);
    /* Start per volume shd svc */
    if (volinfo->status == GLUSTERD_STATUS_STARTED) {
        svc = &(volinfo->shd.svc);
        ret = svc->manager(svc, volinfo, PROC_START_NO_WAIT);
        if (ret) {
            gf_msg(this->name, GF_LOG_ERROR, 0, GD_MSG_SHD_START_FAIL,
                   "Couldn't start shd for "
                   "vol: %s on restart",
                   volinfo->volname);
            gf_event(EVENT_SVC_MANAGER_FAILED, "volume=%s;svc_name=%s",
                     volinfo->volname, svc->name);
            glusterd_volinfo_unref(volinfo);
            goto out;
        }
    }
    glusterd_volinfo_unref(volinfo);
    pthread_mutex_lock(&conf->volume_lock);
}
pthread_mutex_unlock(&conf->volume_lock);

out:
return ret;
}

I found glusterd stucked in this for loop by gdb:

Thread 18 "glfs_sproc3" hit Breakpoint 2, glusterd_shdsvc_restart () at glusterd-shd-svc.c:636
636 glusterd_volinfo_ref(volinfo);
(gdb) p tmp
$17 = (glusterd_volinfo_t *) 0x560bf8bbc1f0
(gdb) p tmp->vol_list
$18 = {next = 0x560bf8bbc248, prev = 0x560bf8bbc248}
(gdb) p volinfo
$19 = (glusterd_volinfo_t *) 0x560bf8bbc1f0
(gdb) p volinfo->vol_list
$20 = {next = 0x560bf8bbc248, prev = 0x560bf8bbc248}
(gdb) p &volinfo->vol_list
$21 = (struct cds_list_head *) 0x560bf8bbc248

while traversing to volinfo x, another process deletes and initializes volinfo x
(i.e. its next and prev Pointers point to itself), an endless loop may occur.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions