Speeding up random forest with R

If you want to create a random forest model with 500 trees, and your computer has 2 cores, you can execute the randomForest function parallely with 2 cores, with the ntree argument set to 250. and then combine the resulting randomForest objects.
# Installed the required libraries

# Setting number of cores in your machine. In this case, it is 2
registerDoSNOW(makeCluster(2, type="SOCK"))

# Loading data
mydata = iris

# Optimal mtry
mtry <- tuneRF(iris[,-5],iris[,5], stepFactor=0.5)
best.m <- mtry[mtry[, 2] == min(mtry[, 2]), 1]

# Main Random Forest Code. Run 250 trees on 2 cores parallely and then combine them
rf <- foreach(ntree = rep(250, 2), .combine = combine, .packages = "randomForest") %dopar% randomForest(Species~.,data=mydata,ntree=ntree, mtry=best.m, importance=TRUE)

# Check rf object

# Check variable importance
importance(rf, type=1)
varImpPlot(rf, type=1)
randomForest(formula = Species ~ ., data = mydata, ntree = ntree, mtry = best.m, importance = TRUE)

Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

0 Response to "Speeding up random forest with R"

Post a comment

Next → ← Prev
Love this Post? Spread the Word!