In a linear regression model, there should be

The Variation in income increases with years of work experience.

**homogeneity of variance**of the residuals. In other words, the variance of residuals are approximately equal for all predicted dependent variable values.**Example**

Income with work experience 4 years: 30,40,60 with absolute difference 10, 30 and relative difference 33%,100% and log difference 0.29, 0.69.

1.

Box-Cox transformations are used to find potentially nonlinear transformations of a dependent variable.

proc reg data=Prob7870.Blood_pr;

model Y=X;

output out=WORK.PRED r=residual;

run;

data work.resid;

set work.pred;

absresid=abs(residual);

sqresid=residual**2;

proc reg data=work.resid;

model absresid=X;

output out=WORK.s_weights p=s_hat;

model sqresid=X;

output out=WORK.v_weights p=v_hat;

run;

** compute the weights using the estimated standard deviations**;

data work.s_weights;

set work.s_weights;

s_weight=1/(s_hat**2);

label s_weight = "weights using absolute residuals";

** compute the weights using the estimated variances**;

data work.v_weights;

set work.v_weights;

v_weight=1/v_hat;

label v_weight = "weights using squared residuals";

** Do the weighted least squares using the weights from the estimated standard deviation**;

proc reg data=work.s_weights;

weight s_weight;

model Y = X;

run;

** Do the weighted least squares using the weights from the estimated variances**;

proc reg data=work.v_weights;

weight v_weight;

model Y = X;

run;

quit;

Income at work experience 8 years: 90,120, 180 with absolute difference 30, 90 and relative difference 33%, 100% and log difference 0.29, 0.69

**Note :**Often after log transformation of dependent variable makes variance constant.

**Consequences of Heteroscedasticity**

The regression prediction remains unbiased and consistent but inefficient. It is inefficient because the estimators are no longer the Best Linear Unbiased Estimators (BLUE). The hypothesis tests (t-test and F-test) are no longer valid.

**How to check Homoscedasticity**

1.

**White Test -**This statistic is asymptotically distributed as chi-square with k-1 degrees of freedom, where k is the number of regressors, excluding the constant term.

2.

**Breusch-Pagan test****3.**

**Lagrange multiplier (LM) test**

**With PROC AUTOREG (LM Test and Supports CLASS Statement)**proc autoreg data= bhalla.GLMSELECT;

model crime = yr_rnd mealcat some_col /archtest;

output out=r r=yresid;

run;

**Note : Check P-value of Q statistics and LM tests. P-value greater than .05 indicates homoscedasticity.**

**With PROC MODEL (White and PAGAN Test)**proc model data= bhalla.GLMSELECT;

parms a1 b1 b2 b3;

api00 = a1 + b1*yr_rnd + b2*mealcat + b3*some_col;

fit api00 / white pagan=(1 yr_rnd mealcat some_col)

out=resid1 outresid;

run;

quit;

**If the p-value of white test and Breusch-Pagan test is greater than .05, the homogenity of variance of residual has been met (Homoscedasticity).**

**Note : PROC AUTOREG supports CLASS statement.**

**Remedy :**

**1. Box-Cox transformations of the dependent variable**

PROC TRANSREGDATA = bhalla.GLMSELECT TEST;

MODEL BOXCOX(api00) = IDENTITY(yr_rnd mealcat some_col);

RUN;

**Note :**Categorical variables can be used with

**CLASS**statement instead of

**IDENTITY**.

CheckLambda scoregenerated fromPROC TRANSREG

Transformation | Best Lambda |
---|---|

Square | 1.5 to 2.5 |

None | 0.75 to 1.5 |

Square-root | 0.25 to 0.75 |

Natural log | -0.25 to 0.25 |

Inverse square-root | -0.75 to -0.25 |

Reciprocal | -1.5 to -0.75 |

Inverse square | -2.5 to -1.5 |

**2. Weighted Least Squares**

If variable transformation does not solve the problem, we can useweighted least squares.

**How to construct weights :**

- Compute the absolute and squared residuals
- Find the absolute and squared residuals vs. independent variables to get the estimated standard deviation and variance
- Compute the weights using the estimated standard deviations and variance.

**SAS Code (Source)**

proc reg data=Prob7870.Blood_pr;

model Y=X;

output out=WORK.PRED r=residual;

run;

data work.resid;

set work.pred;

absresid=abs(residual);

sqresid=residual**2;

proc reg data=work.resid;

model absresid=X;

output out=WORK.s_weights p=s_hat;

model sqresid=X;

output out=WORK.v_weights p=v_hat;

run;

** compute the weights using the estimated standard deviations**;

data work.s_weights;

set work.s_weights;

s_weight=1/(s_hat**2);

label s_weight = "weights using absolute residuals";

** compute the weights using the estimated variances**;

data work.v_weights;

set work.v_weights;

v_weight=1/v_hat;

label v_weight = "weights using squared residuals";

** Do the weighted least squares using the weights from the estimated standard deviation**;

proc reg data=work.s_weights;

weight s_weight;

model Y = X;

run;

** Do the weighted least squares using the weights from the estimated variances**;

proc reg data=work.v_weights;

weight v_weight;

model Y = X;

run;

quit;

**Related Posts :**

Can you provide sample data sets for person to run codes on

ReplyDeleteQuite helpful :)

ReplyDeleteThank you very much from spain

ReplyDeleteThank you so much for your help to correct heteroscedasticity. Do you have any paper supporting the stand deviation method you use to compute weighted least square please?

ReplyDelete