Shortcomings in Random Forest Variable Importance

Random Forest is very popular as a variable selection technique. However, it has some drawbacks as well (listed below) :

1. If independent variables are of different type (for example: some continuous or some categorical), random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package)

2. If independent variables are correlated, random forest (randomForest package in R) variable importance measure can be misleading. Even, conditional forest does not remove multicollinearity problem completely.It solves collinearity problem to some extent. While using conditional inference forest i.e. cforest (party package), we can include option varimp (obj, conditional=TRUE) which can add to the understanding of data

3. If independent variables are all categorical but having different categories, random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package).

R Tutorials : 75 Free R Tutorials

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

0 Response to "Shortcomings in Random Forest Variable Importance"

Post a Comment

Next → ← Prev