Speeding up random forest with R

Deepanshu Bhalla Add Comment , , , , ,
If you want to create a random forest model with 500 trees, and your computer has 2 cores, you can execute the randomForest function parallely with 2 cores, with the ntree argument set to 250. and then combine the resulting randomForest objects.
# Installed the required libraries

# Setting number of cores in your machine. In this case, it is 2
registerDoSNOW(makeCluster(2, type="SOCK"))

# Loading data
mydata = iris

# Optimal mtry
mtry <- tuneRF(iris[,-5],iris[,5], stepFactor=0.5)
best.m <- mtry[mtry[, 2] == min(mtry[, 2]), 1]

# Main Random Forest Code. Run 250 trees on 2 cores parallely and then combine them
rf <- foreach(ntree = rep(250, 2), .combine = combine, .packages = "randomForest") %dopar% randomForest(Species~.,data=mydata,ntree=ntree, mtry=best.m, importance=TRUE)

# Check rf object

# Check variable importance
importance(rf, type=1)
varImpPlot(rf, type=1)
randomForest(formula = Species ~ ., data = mydata, ntree = ntree, mtry = best.m, importance = TRUE)

Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
Related Posts
Spread the Word!
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "Speeding up random forest with R"
Next → ← Prev