In a linear regression model, there should be homogeneity of variance of the residuals. In other words, the variance of residuals are approximately equal for all predicted dependent variable values.
The Variation in income increases with years of work experience.
Example
Income with work experience 4 years: 30,40,60 with absolute difference 10, 30 and relative difference 33%,100% and log difference 0.29, 0.69.
Consequences of Heteroscedasticity
How to check Homoscedasticity
1. White Test - This statistic is asymptotically distributed as chi-square with k-1 degrees of freedom, where k is the number of regressors, excluding the constant term.
Note : PROC AUTOREG supports CLASS statement.
Remedy :
1. Box-Cox transformations of the dependent variable
Box-Cox transformations are used to find potentially nonlinear transformations of a dependent variable.
2. Weighted Least Squares
SAS Code (Source)
proc reg data=Prob7870.Blood_pr;
model Y=X;
output out=WORK.PRED r=residual;
run;
data work.resid;
set work.pred;
absresid=abs(residual);
sqresid=residual**2;
proc reg data=work.resid;
model absresid=X;
output out=WORK.s_weights p=s_hat;
model sqresid=X;
output out=WORK.v_weights p=v_hat;
run;
** compute the weights using the estimated standard deviations**;
data work.s_weights;
set work.s_weights;
s_weight=1/(s_hat**2);
label s_weight = "weights using absolute residuals";
** compute the weights using the estimated variances**;
data work.v_weights;
set work.v_weights;
v_weight=1/v_hat;
label v_weight = "weights using squared residuals";
** Do the weighted least squares using the weights from the estimated standard deviation**;
proc reg data=work.s_weights;
weight s_weight;
model Y = X;
run;
** Do the weighted least squares using the weights from the estimated variances**;
proc reg data=work.v_weights;
weight v_weight;
model Y = X;
run;
quit;
Related Posts :
Income at work experience 8 years: 90,120, 180 with absolute difference 30, 90 and relative difference 33%, 100% and log difference 0.29, 0.69
Note : Often after log transformation of dependent variable makes variance constant.
Consequences of Heteroscedasticity
The regression prediction remains unbiased and consistent but inefficient. It is inefficient because the estimators are no longer the Best Linear Unbiased Estimators (BLUE). The hypothesis tests (t-test and F-test) are no longer valid.
How to check Homoscedasticity
1. White Test - This statistic is asymptotically distributed as chi-square with k-1 degrees of freedom, where k is the number of regressors, excluding the constant term.
2. Breusch-Pagan test
3. Lagrange multiplier (LM) test
With PROC AUTOREG (LM Test and Supports CLASS Statement)
With PROC MODEL (White and PAGAN Test)
3. Lagrange multiplier (LM) test
With PROC AUTOREG (LM Test and Supports CLASS Statement)
proc autoreg data= bhalla.GLMSELECT;Note : Check P-value of Q statistics and LM tests. P-value greater than .05 indicates homoscedasticity.
model crime = yr_rnd mealcat some_col / archtest;
output out=r r=yresid;
run;
With PROC MODEL (White and PAGAN Test)
proc model data= bhalla.GLMSELECT;If the p-value of white test and Breusch-Pagan test is greater than .05, the homogenity of variance of residual has been met (Homoscedasticity).
parms a1 b1 b2 b3;
api00 = a1 + b1*yr_rnd + b2*mealcat + b3*some_col;
fit api00 / white pagan=(1 yr_rnd mealcat some_col)
out=resid1 outresid;
run;
quit;
Note : PROC AUTOREG supports CLASS statement.
Remedy :
1. Box-Cox transformations of the dependent variable
PROC TRANSREG DATA = bhalla.GLMSELECT TEST;Note : Categorical variables can be used with CLASS statement instead of IDENTITY.
MODEL BOXCOX(api00) = IDENTITY(yr_rnd mealcat some_col);
RUN;
Check Lambda score generated from PROC TRANSREG
Transformation | Best Lambda |
---|---|
Square | 1.5 to 2.5 |
None | 0.75 to 1.5 |
Square-root | 0.25 to 0.75 |
Natural log | -0.25 to 0.25 |
Inverse square-root | -0.75 to -0.25 |
Reciprocal | -1.5 to -0.75 |
Inverse square | -2.5 to -1.5 |
2. Weighted Least Squares
If variable transformation does not solve the problem, we can use weighted least squares.How to construct weights :
- Compute the absolute and squared residuals
- Find the absolute and squared residuals vs. independent variables to get the estimated standard deviation and variance
- Compute the weights using the estimated standard deviations and variance.
SAS Code (Source)
proc reg data=Prob7870.Blood_pr;
model Y=X;
output out=WORK.PRED r=residual;
run;
data work.resid;
set work.pred;
absresid=abs(residual);
sqresid=residual**2;
proc reg data=work.resid;
model absresid=X;
output out=WORK.s_weights p=s_hat;
model sqresid=X;
output out=WORK.v_weights p=v_hat;
run;
** compute the weights using the estimated standard deviations**;
data work.s_weights;
set work.s_weights;
s_weight=1/(s_hat**2);
label s_weight = "weights using absolute residuals";
** compute the weights using the estimated variances**;
data work.v_weights;
set work.v_weights;
v_weight=1/v_hat;
label v_weight = "weights using squared residuals";
** Do the weighted least squares using the weights from the estimated standard deviation**;
proc reg data=work.s_weights;
weight s_weight;
model Y = X;
run;
** Do the weighted least squares using the weights from the estimated variances**;
proc reg data=work.v_weights;
weight v_weight;
model Y = X;
run;
quit;
Can you provide sample data sets for person to run codes on
ReplyDeleteQuite helpful :)
ReplyDeleteThank you very much from spain
ReplyDeleteThank you so much for your help to correct heteroscedasticity. Do you have any paper supporting the stand deviation method you use to compute weighted least square please?
ReplyDelete