Speeding up random forest with R

If you want to create a random forest model with 500 trees, and your computer has 2 cores, you can execute the randomForest function parallely with 2 cores, with the ntree argument set to 250. and then combine the resulting randomForest objects.
# Installed the required libraries
library("foreach")
library("doSNOW")
library(randomForest)

# Setting number of cores in your machine. In this case, it is 2
registerDoSNOW(makeCluster(2, type="SOCK"))

# Loading data
data(iris)
mydata = iris

# Optimal mtry
mtry <- tuneRF(iris[,-5],iris[,5], stepFactor=0.5)
print(mtry)
best.m <- mtry[mtry[, 2] == min(mtry[, 2]), 1]

# Main Random Forest Code. Run 250 trees on 2 cores parallely and then combine them
rf <- foreach(ntree = rep(250, 2), .combine = combine, .packages = "randomForest") %dopar% randomForest(Species~.,data=mydata,ntree=ntree, mtry=best.m, importance=TRUE)

# Check rf object
rf

# Check variable importance
importance(rf, type=1)
varImpPlot(rf, type=1)
Call:
randomForest(formula = Species ~ ., data = mydata, ntree = ntree, mtry = best.m, importance = TRUE)

Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and Human Resource.

0 Response to "Speeding up random forest with R"

Post a comment

Next → ← Prev
Love this Post? Spread the Word!
Share