Correcting Multicollinearity with R

Suppose you want to remove multicollinearity problem in your regression model with R. All the variables having VIF higher than 2.5 are faced with a problem of multicollinearity. In the R custom function below, we are removing the variables with the largest VIF until all variables have VIF less than 2.5.

# reading data from R stored session
mydata = readRDS("logistic.rds")

# Checking number of  rows and columns in data

# Loading required packages

# Set dependent variable as numeric
mydata$Ins = as.numeric(mydata$Ins)

# Fit a linear model to the data
fit=lm(Ins ~ AcctAge+DDA + DDABal +CashBk, data=mydata)

# Calculating VIF for each independent variable

# Set a VIF threshold. All the variables having higher VIF than threshold
#are dropped from the model

# Sequentially drop the variable with the largest VIF until
# all variables have VIF less than threshold

while(drop==TRUE) {
  if(max(vfit)>threshold) { fit=
  update(fit,as.formula(paste(".","~",".","-",names(which.max(vfit))))) }
  else { drop=FALSE }}

# Model after removing correlated Variables

# How variables removed sequentially

# Final (uncorrelated) variables with their VIFs

# Exporting variables
write.csv (vfit_d, "C:\\Users\\Deepanshu Bhalla\\Desktop\\VIF.csv")

R Tutorials : 75 Free R Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*
Related Posts:
1 Response to "Correcting Multicollinearity with R"
  1. How do you select the threshold? What is the underlying statistical method to determine the threshold?


Next → ← Prev