Ridge Regression with SAS

Ridge Regression

Ridge Regression is an alternative technique to multiple regression. It helps alleviating multicollinearity (i.e. high correlation between independent variables) problem. Multicollinearity makes least squares estimates biased and increase standard error of the coefficients. Increased standard errors means some variables are coming out statistically insignificant when they might be significant without multicollinearity. Ridge regression helps to reduce the standard errors of the coefficients. This makes estimates more reliable.

Assumptions

Linear relationship between independent and dependent variable
Constant variance (Homogenity)
No Outlier

Note : Since ridge regression does not provide confidence limits, normality need not be assumed.

Step I : Standardize Variables

In ridge regression, the first step is to standardize the variables (both dependent and independent) by subtracting their means and dividing by their standard deviations.

Step II : Changing the diagonals of the correlation matrix, which would normally be 1, by adding a small bias or a k-value. This is where the name ridge regression came from, since you are creating a “ridge” in the correlation matrix by adding a bit to the diagonal values.

How to choose K-value

Use a Ridge Trace plot to help visualize where the regression coefficients stabilizes.
Choose the smallest value of k after the regression coefficients stabilize
Choose the value of K where variance inflation factor (VIF) close to 2
Choose the value of K where R-Square is not changed significantly

Note : Increasing k will eventually drive the regression coefficients to zero. Hence, we should avoid large K-value.

SAS Code : Ridge Regression

proc reg data=mydata outvif
outest=b ridge=0 to 0.05 by 0.002;
model churn=var1 var2 var3;
plot / ridgeplot nomodel nostat;
run;

Explanation

outest = b - It creates a dataset called b with the model estimates.
outvif - It tells SAS to write the VIF to the outest = b.
ridge = 0 to 0.05 by 0.002: It performs the ridge regression where your k-value will start at 0, go to 0.05 by increments of 0.002

To see the estimates and VIF corresponding to different K-values – run the following code:

proc print data=b;
run;

Choose the value of K and see the estimates corresponding to it.

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 3 Responses to "Ridge Regression with SAS"

clumsycasanovaMay 31, 2016 at 11:12 PM
i think the second line needs correction. " Multicollinearity makes least squares estimates unbiased".. should be "Multicollinearity makes least squares estimates biased".
Madhup ModiJune 20, 2016 at 7:30 AM
I am running a ridge regression to identify the contributions of different types of contacts (data contains number of contacts made by sales head to their subordinates, we have 8-10 types of contacts) towards product sales. The scales for the # contact types are similar. The dependent variable is sales (dollar amount). Should I standardize both the independent and dependent variables in this case? Also, what is the idea behind the standardization?