-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Thank you providing this useful pipeline. It is helpful to my data analysis. I met a problem when I tried to use 70% of the data and calculate the clusters using louvain_communities detection. When I do bootstrap Nboot=100, sometimes there will be 1 more cluster in my subsamples portion than my full data set.
For example, using the same parameter setting for K and resolution, full dataset will give me 10 clusters and subsamples sometimes give me 10 clusters and sometimes give me 11 clusters. The chance get more clusters decrease when I increase the percentage of sample selection.
It is make sense for me if I use small subset of the data may create higher deviation would lead to the increase the clusters in the end. I am writing this issue to ask if you met these problem before and do you have a way to solve the problem? Any information is appreciated. Thank you.