Shortcomings in Random Forest Variable Importance

Deepanshu Bhalla Add Comment , , ,
Random Forest is very popular as a variable selection technique. However, it has some drawbacks as well (listed below) :

1. If independent variables are of different type (for example: some continuous or some categorical), random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package)

2. If independent variables are correlated, random forest (randomForest package in R) variable importance measure can be misleading. Even, conditional forest does not remove multicollinearity problem completely.It solves collinearity problem to some extent. While using conditional inference forest i.e. cforest (party package), we can include option varimp (obj, conditional=TRUE) which can add to the understanding of data

3. If independent variables are all categorical but having different categories, random forest (randomForest package in R) variable importance measure can be misleading. To overcome this problem, we should use conditional inference forest i.e. cforest (party package).
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "Shortcomings in Random Forest Variable Importance"
Next → ← Prev