Ridge Regression with SAS

Deepanshu Bhalla 3 Comments
Ridge Regression

Ridge Regression is an alternative technique to multiple regression. It helps alleviating multicollinearity (i.e. high correlation between independent variables) problem. Multicollinearity makes least squares estimates biased and increase standard error of the coefficients. Increased standard errors means some variables are coming out statistically insignificant when they might be significant without multicollinearity. Ridge regression helps to reduce the standard errors of the coefficients. This makes estimates more reliable.

Assumptions
  1. Linear relationship between independent and dependent variable
  2. Constant variance (Homogenity)
  3. No Outlier
Note : Since ridge regression does not provide confidence limits, normality need not be assumed.

Step I : Standardize Variables
In ridge regression, the first step is to standardize the variables (both dependent and independent) by subtracting their means and dividing by their standard deviations.

Step II : Changing the diagonals of the correlation matrix, which would normally be 1, by adding a small bias or a k-value.  This is where the name ridge regression came from, since you are creating a “ridge” in the correlation matrix by adding a bit to the diagonal values.

How to choose K-value
  1. Use a Ridge Trace plot to help visualize where the regression coefficients stabilizes.
  2. Choose the smallest value of k after the regression coefficients stabilize
  3. Choose the value of K where variance inflation factor (VIF) close to 2
  4. Choose the value of K where R-Square is not changed significantly
Note : Increasing k will eventually drive the regression coefficients to zero. Hence, we should avoid large K-value.
SAS Code : Ridge Regression
proc reg data=mydata outvif
outest=b ridge=0 to 0.05 by 0.002;
model churn=var1 var2 var3;
plot / ridgeplot nomodel nostat;
run;
Explanation 
  1. outest = b - It creates a dataset called b with the model estimates.
  2. outvif - It tells SAS to write the VIF to the outest = b.
  3. ridge = 0 to 0.05 by 0.002:  It performs the ridge regression where your k-value will start at 0, go to 0.05 by increments of 0.002
To see the estimates and VIF corresponding to different K-values – run the following code:
proc print data=b;
run;
Choose the value of K and see the estimates corresponding to it.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 3 Responses to "Ridge Regression with SAS"
  1. i think the second line needs correction. " Multicollinearity makes least squares estimates unbiased".. should be "Multicollinearity makes least squares estimates biased".

    ReplyDelete
  2. I am running a ridge regression to identify the contributions of different types of contacts (data contains number of contacts made by sales head to their subordinates, we have 8-10 types of contacts) towards product sales. The scales for the # contact types are similar. The dependent variable is sales (dollar amount). Should I standardize both the independent and dependent variables in this case? Also, what is the idea behind the standardization?

    ReplyDelete
Next → ← Prev