library(SuperLearner)
library(ranger)
library(xgboost)
library(parallel)
data(Boston, package = "MASS")
y <- as.numeric(Boston$medv > 22)
x <- subset(Boston, select = -medv)
options(mc.cores = 2)
getOption("mc.cores")
## Multicore works with ranger
set.seed(1, "L'Ecuyer-CMRG")
system.time({
cv_sl = CV.SuperLearner(Y = y, X = x, family = binomial(),
cvControl = list(V = 10),
parallel = "multicore",
SL.library = c("SL.mean", "SL.ranger"))
})
summary(cv_sl)
## xgboost works sequentially
tune <- list(ntrees = c(10, 20),
max_depth = 2,
shrinkage = c(0.01))
learners <- create.Learner("SL.xgboost",
tune = tune,
detailed_names = TRUE,
name_prefix = "xgb")
cv_sl2 <- CV.SuperLearner(Y = y,
X = x,
family = binomial(),
cvControl = list(V = 3),
verbose = TRUE,
parallel = "seq",
SL.library = c(
learners$names
, "SL.ranger"
)
)
summary(cv_sl2)
## xgboost with parallel hangs
set.seed(1, "L'Ecuyer-CMRG")
cv_sl3 <- CV.SuperLearner(Y = y,
X = x,
family = binomial(),
cvControl = list(V = 3),
verbose = TRUE,
parallel = "multicore",
SL.library = c(
learners$names
, "SL.ranger"
)
)
summary(cv_sl3)
## Snow cluster. This works
cluster <- parallel::makeCluster(2)
cluster
## Do separately, to make sure each OK
parallel::clusterEvalQ(cluster, library(SuperLearner))
parallel::clusterEvalQ(cluster, library(ranger))
parallel::clusterEvalQ(cluster, library(xgboost))
parallel::clusterExport(cluster, learners$names)
parallel::clusterSetRNGStream(cluster, 1)
cv_sl4 <- CV.SuperLearner(Y = y,
X = x,
family = binomial(),
cvControl = list(V = 3),
verbose = TRUE,
parallel = cluster,
SL.library = c(
learners$names
, "SL.ranger"
)
)
summary(cv_sl4)
I was trying to reproduce example 15 in "Guide to SuperLearner" (https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html#xgboost-hyperparameter-exploration) but it hangs. It works fine with:
I wonder if I am doing something wrong. Searching around, maybe xgboost (in particular the xgb.DMatrix operations) does not work well with fork clusters? (e.g., https://stackoverflow.com/questions/52080209/xgb-dmatrix-hangs-in-mclapply , ck37/varimpact#20).
Reproducible example, tested in two different machines using Linux: