Speeding up random forest with R

If you want to create a random forest model with 500 trees, and your computer has 2 cores, you can execute the randomForest function parallely with 2 cores, with the ntree argument set to 250. and then combine the resulting randomForest objects.

# Installed the required libraries
library("foreach")
library("doSNOW")
library(randomForest)

# Setting number of cores in your machine. In this case, it is 2
registerDoSNOW(makeCluster(2, type="SOCK"))

# Loading data
data(iris)
mydata = iris

# Optimal mtry

mtry <- tuneRF(iris[,-5],iris[,5], stepFactor=0.5)
print(mtry)
best.m <- mtry[mtry[, 2] == min(mtry[, 2]), 1]

# Main Random Forest Code. Run 250 trees on 2 cores parallely and then combine them
rf <- foreach(ntree = rep(250, 2), .combine = combine, .packages = "randomForest") %dopar% randomForest(Species~.,data=mydata,ntree=ntree, mtry=best.m, importance=TRUE)

# Check rf object

rf

# Check variable importance

importance(rf, type=1)

varImpPlot(rf, type=1)

Call:

randomForest(formula = Species ~ ., data = mydata, ntree = ntree, mtry = best.m, importance = TRUE)

Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn