Correcting Collinearity with Correlation Matrix in R

This article talks about how we can correct multicollinearity problem with correlation matrix.

In caret package, there is a function called findCorrelation that helps to identify correlated variables.

How it works - 
The absolute values of pair-wise correlations are considered. If some variables have a high correlation, the function looks at the mean absolute correlation of each variable and keeps only the variable with the smallest mean absolute correlation and remove the larger absolute correlation.
Example - Correlation Matrix

X1 X2 X3 X4 X5
X1 1.00 0.95 0.89 0.85 0.10
X2 0.95 1.00 0.85 0.81 0.09
X3 0.89 0.85 1.00 0.78 0.10
X4 0.85 0.81 0.78 1.00 0.09
X5 0.10 0.09 0.10 0.09 1.00

Variables to remove from X1 to X4 cluster - "X1" "X2" "X3" as they have larger mean absolute correlation than X4.

R Code - 

# Identifying numeric variables
numericData <- dat2[sapply(dat2, is.numeric)]
# Calculate correlation matrix
descrCor <- cor(numericData)
# find attributes that are highly corrected
highlyCorrelated <- findCorrelation(descrCor, cutoff=0.7)
ListenData Logo
Spread the Word!
Share
Related Posts
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and Human Resource.

1 Response to "Correcting Collinearity with Correlation Matrix in R"
  1. HI Deepanshu, I was trying to understand your explanation and I wanted to ask you what do you mean with mean absolute correlation?

    ReplyDelete

Next → ← Prev
Looks like you are using an ad blocker!

To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content.