Shortcomings in Random Forest Variable Importance

Random Forest is very popular as a variable selection technique. However, it has some drawbacks as well (listed below) :

1. If independent variables are of different type (for example: some continuous or some categorical), random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package)

2. If independent variables are correlated, random forest (randomForest package in R) variable importance measure can be misleading. Even, conditional forest does not remove multicollinearity problem completely.It solves collinearity problem to some extent. While using conditional inference forest i.e. cforest (party package), we can include option varimp (obj, conditional=TRUE) which can add to the understanding of data

3. If independent variables are all categorical but having different categories, random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package).
Love this Post? Spread the Word!
Comment and share to motivate us to write more!
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*
Related Posts:
0 Response to "Shortcomings in Random Forest Variable Importance"

Post a Comment

We have Zero Tolerance to Spam. Comments with links will be deleted immediately upon our review.

Next → ← Prev
Scroll to Top