Skip to content

Subsamples some times have more clusters than Full dataset. #20

@Xuyuch

Description

@Xuyuch

Thank you providing this useful pipeline. It is helpful to my data analysis. I met a problem when I tried to use 70% of the data and calculate the clusters using louvain_communities detection. When I do bootstrap Nboot=100, sometimes there will be 1 more cluster in my subsamples portion than my full data set.
For example, using the same parameter setting for K and resolution, full dataset will give me 10 clusters and subsamples sometimes give me 10 clusters and sometimes give me 11 clusters. The chance get more clusters decrease when I increase the percentage of sample selection.
It is make sense for me if I use small subset of the data may create higher deviation would lead to the increase the clusters in the end. I am writing this issue to ask if you met these problem before and do you have a way to solve the problem? Any information is appreciated. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions