Speeding up random forest with R

If you want to create a random forest model with 500 trees, and your computer has 2 cores, you can execute the randomForest function parallely with 2 cores, with the ntree argument set to 250. and then combine the resulting randomForest objects.
# Installed the required libraries

# Setting number of cores in your machine. In this case, it is 2
registerDoSNOW(makeCluster(2, type="SOCK"))

# Loading data
mydata = iris

# Optimal mtry
mtry <- tuneRF(iris[,-5],iris[,5], stepFactor=0.5)
best.m <- mtry[mtry[, 2] == min(mtry[, 2]), 1]

# Main Random Forest Code. Run 250 trees on 2 cores parallely and then combine them
rf <- foreach(ntree = rep(250, 2), .combine = combine, .packages = "randomForest") %dopar% randomForest(Species~.,data=mydata,ntree=ntree, mtry=best.m, importance=TRUE)

# Check rf object

# Check variable importance
importance(rf, type=1)
varImpPlot(rf, type=1)
randomForest(formula = Species ~ ., data = mydata, ntree = ntree, mtry = best.m, importance = TRUE)

Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
Coursera Data Science

R Tutorials : 75 Free R Tutorials

Statistics Tutorials : 50 Statistics Tutorials

Statistics Tutorials : 50 Statistics Tutorials

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

0 Response to "Speeding up random forest with R"

Post a Comment

Next → ← Prev