Missing Value Imputations with Random Forest

In most of the predictive model techniques, it is required to impute missing values before training a predictive model. There is a way you can impute missing data with Random Forest Algorithm.

I. Impute missing values in predictor data using proximity from randomForest.

The proximity matrix from the randomForest is used to update the imputation of the NAs. For continuous predictors, the imputed value is the weighted average of the non-missing obervations, where the weights are the proximities. For categorical predictors, the imputed value is the category with the largest average proximity. This process is iterated iter times.

Default Method :  5 Iterations and 300 Trees

data(iris)
str(iris)
iris.na <- iris
set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA
set.seed(222)
library(randomForest)
iris.imputed <- rfImpute(Species ~ ., iris.na, iter=5, ntree=500)
set.seed(333)
iris.rf <- randomForest(Species ~ ., iris.imputed)
print(iris.rf)


II. Impute missing values in predictor data using median / mode.
data(iris)
iris.na <- iris
set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA
library(randomForest)
iris.roughfix <- na.roughfix(iris.na)
iris.narf <- randomForest(Species ~ ., iris.na, na.action=na.roughfix)
print(iris.narf)
Note :
Median for numeric variables
Mode for categorical variables 
Love this Post? Spread the Word!
Comment and share to motivate us to write more!
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*
Related Posts:
0 Response to "Missing Value Imputations with Random Forest"

Post a Comment

We have Zero Tolerance to Spam. Comments with links will be deleted immediately upon our review.

Next → ← Prev
Scroll to Top