R Data Science: R Programming A-Z: R For Data Science With Real Exercises!
This article talks about how we can correct multicollinearity problem with correlation matrix.
In caret package, there is a function called findCorrelation that helps to identify correlated variables.
How it works -
The absolute values of pair-wise correlations are considered. If some variables have a high correlation, the function looks at the mean absolute correlation of each variable and keeps only the variable with the smallest mean absolute correlation and remove the larger absolute correlation.Example - Correlation Matrix
Variables to remove from X1 to X4 cluster - "X1" "X2" "X3" as they have larger mean absolute correlation than X4.
R Code -
# Identifying numeric variables
numericData <- dat2[sapply(dat2, is.numeric)]
# Calculate correlation matrix
descrCor <- cor(numericData)
# find attributes that are highly corrected
highlyCorrelated <- findCorrelation(descrCor, cutoff=0.7)