Shortcomings in Random Forest Variable Importance

Random Forest is very popular as a variable selection technique. However, it has some drawbacks as well (listed below) :

1. If independent variables are of different type (for example: some continuous or some categorical), random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package)

2. If independent variables are correlated, random forest (randomForest package in R) variable importance measure can be misleading. Even, conditional forest does not remove multicollinearity problem completely.It solves collinearity problem to some extent. While using conditional inference forest i.e. cforest (party package), we can include option varimp (obj, conditional=TRUE) which can add to the understanding of data

3. If independent variables are all categorical but having different categories, random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package).
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

0 Response to "Shortcomings in Random Forest Variable Importance"

Post a Comment

Next → ← Prev
Love this Post? Spread the Word!
Share