Checking Homoscedasticity with SAS

Deepanshu Bhalla 4 Comments
In a linear regression model, there should be homogeneity of variance of the residuals. In other words, the variance of residuals are approximately equal for all predicted dependent variable values.

Example

The Variation in income increases with years of work experience.

Income with work experience 4 years: 30,40,60 with absolute difference 10, 30 and relative difference 33%,100% and log difference 0.29, 0.69.

Income at work experience 8 years: 90,120, 180 with absolute difference 30, 90 and relative difference 33%, 100% and log difference 0.29, 0.69

Note : Often after log transformation of dependent variable makes variance constant.

Consequences of Heteroscedasticity
The regression prediction remains unbiased and consistent but inefficient. It is inefficient because the estimators are no longer the Best Linear Unbiased Estimators (BLUE). The hypothesis tests (t-test and F-test) are no longer valid.

How to check Homoscedasticity

1. White Test - This statistic is asymptotically distributed as chi-square with k-1 degrees of freedom, where k is the number of regressors, excluding the constant term.

2. Breusch-Pagan test

3. Lagrange multiplier (LM) test

With PROC AUTOREG (LM Test and Supports CLASS Statement)
proc autoreg data= bhalla.GLMSELECT;
model crime = yr_rnd mealcat some_col / archtest;
output out=r r=yresid;
run;
Note : Check P-value of Q statistics and LM tests. P-value greater than .05 indicates homoscedasticity.

With PROC MODEL (White and PAGAN Test)
proc model data= bhalla.GLMSELECT;
parms a1 b1 b2 b3;
api00 = a1 + b1*yr_rnd + b2*mealcat + b3*some_col;
fit api00 / white pagan=(1 yr_rnd mealcat some_col)
out=resid1 outresid;
run;
quit; 
If the p-value of white test and Breusch-Pagan test is greater than .05, the homogenity of variance of residual has been met (Homoscedasticity).

Note : PROC AUTOREG supports CLASS statement.


Remedy : 

1. Box-Cox transformations of the dependent variable

Box-Cox transformations are used to find potentially nonlinear transformations of a dependent variable.
PROC TRANSREG DATA = bhalla.GLMSELECT  TEST;
MODEL BOXCOX(api00) = IDENTITY(yr_rnd mealcat some_col);
RUN;
 Note : Categorical variables can be used with CLASS statement instead of IDENTITY.
Check Lambda score generated from PROC TRANSREG

Transformation Best Lambda
Square 1.5 to 2.5
None 0.75 to 1.5
Square-root 0.25 to 0.75
Natural log -0.25 to 0.25
Inverse square-root -0.75 to -0.25
Reciprocal -1.5 to -0.75
Inverse square -2.5 to -1.5

2. Weighted Least Squares
If variable transformation does not solve the problem, we can use weighted least squares.
How to construct weights :

  1. Compute the absolute and squared residuals
  2. Find the absolute and squared residuals vs. independent variables to get the estimated standard deviation and variance
  3. Compute the weights using the estimated standard deviations and variance.


SAS Code (Source)

proc reg data=Prob7870.Blood_pr;
   model Y=X;
   output out=WORK.PRED r=residual;
run;

data work.resid;
  set work.pred;
  absresid=abs(residual);
  sqresid=residual**2;

proc reg data=work.resid;
    model absresid=X;
    output out=WORK.s_weights p=s_hat;
   model sqresid=X;
    output out=WORK.v_weights p=v_hat;
run;

** compute the weights using the estimated standard deviations**;
data work.s_weights;
set work.s_weights;
s_weight=1/(s_hat**2);
label s_weight = "weights using absolute residuals";

** compute the weights using the estimated variances**;
data work.v_weights;
set work.v_weights;
v_weight=1/v_hat;
label v_weight = "weights using squared residuals";


** Do the weighted least squares using the weights from the estimated standard deviation**;
proc reg data=work.s_weights;
weight s_weight;
model Y = X;
run;

** Do the weighted least squares using the weights from the estimated variances**;
proc reg data=work.v_weights;
weight v_weight;
model Y = X;
run;
quit;

Related Posts : 
  1. Checking Assumptions of Multiple Linear Regression with SAS
  2. Linear Regression Model with PROC GLMSELECT
  3. Scoring Linear Regression Model with SAS
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

4 Responses to "Checking Homoscedasticity with SAS"
  1. Can you provide sample data sets for person to run codes on

    ReplyDelete
  2. Thank you very much from spain

    ReplyDelete
  3. Thank you so much for your help to correct heteroscedasticity. Do you have any paper supporting the stand deviation method you use to compute weighted least square please?

    ReplyDelete
Next → ← Prev