Ridge Regression with SAS

Ridge Regression

Ridge Regression is an alternative technique to multiple regression. It helps alleviating multicollinearity (i.e. high correlation between independent variables) problem. Multicollinearity makes least squares estimates biased and increase standard error of the coefficients. Increased standard errors means some variables are coming out statistically insignificant when they might be significant without multicollinearity. Ridge regression helps to reduce the standard errors of the coefficients. This makes estimates more reliable.

Assumptions
1. Linear relationship between independent and dependent variable
2. Constant variance (Homogenity)
3. No Outlier
Note : Since ridge regression does not provide confidence limits, normality need not be assumed.

Step I : Standardize Variables
In ridge regression, the first step is to standardize the variables (both dependent and independent) by subtracting their means and dividing by their standard deviations.

Step II : Changing the diagonals of the correlation matrix, which would normally be 1, by adding a small bias or a k-value.  This is where the name ridge regression came from, since you are creating a “ridge” in the correlation matrix by adding a bit to the diagonal values.

How to choose K-value
1. Use a Ridge Trace plot to help visualize where the regression coefficients stabilizes.
2. Choose the smallest value of k after the regression coefficients stabilize
3. Choose the value of K where variance inflation factor (VIF) close to 2
4. Choose the value of K where R-Square is not changed significantly
Note : Increasing k will eventually drive the regression coefficients to zero. Hence, we should avoid large K-value.
SAS Code : Ridge Regression
proc reg data=mydata outvif
outest=b ridge=0 to 0.05 by 0.002;
model churn=var1 var2 var3;
plot / ridgeplot nomodel nostat;
run;
Explanation
1. outest = b - It creates a dataset called b with the model estimates.
2. outvif - It tells SAS to write the VIF to the outest = b.
3. ridge = 0 to 0.05 by 0.002:  It performs the ridge regression where your k-value will start at 0, go to 0.05 by increments of 0.002
To see the estimates and VIF corresponding to different K-values – run the following code:
proc print data=b;
run;
Choose the value of K and see the estimates corresponding to it.
Share
Related Posts