The question arises " What's special about PROC GLMSELECT? Why not use PROC REG, PROC GLM for building a linear regression model?
/*******************************************************************/
/**Categorical Variables with CLASS Statement and PARAM option************/
/*******************************************************************/
PROC GLMSELECT data=bhalla.GLMSELECT;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ selection=stepwise select=SL showpvalues stats=all STB;
run;
SELECT= criterion specifies the criterion that PROC GLMSELECT uses to determine the order in which effects enter and/or leave at each step of the specified selection method.
/*******************************************************************/
/************** Scoring with SCORE statement **************************/
/*******************************************************************/
PROC GLMSELECT data=ABCD.elemapi2;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ selection=stepwise select=SL showpvalues stats=all STB;
score data = validation out= pred_val;
run;
/*******************************************************************/
/******* Choosing the Best Linear Regression Model with CHOOSE***********/
/*******************************************************************/
PROC GLMSELECT data=bhalla.GLMSELECT;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ CHOOSE=ADJRSQ showpvalues stats=all STB;
run;
/*******************************************************************/
/* Assumption I : Errors (Residuals) should be normally distributed*/
/*******************************************************************/
PROC GLMSELECT data=bhalla.GLMSELECT;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ CHOOSE=ADJRSQ showpvalues stats=all STB;
output out=stdres p= predict r = resid;
run;
proc univariate data=stdres normal;
var resid;
run;
/*****************************************************************/
/*********** Assumption II : Checking Heteroscedasticity *****************/
/****************************************************************/
proc autoreg data= bhalla.GLMSELECT;
class mealcat;
model crime = yr_rnd mealcat some_col / archtest;
output out=r r=yresid;
run;
Note : Check P-value of Q statistics and LM tests. P-value greater than .05 indicates homoscedasticity.
Related Posts :
- PROC GLMSELECT supports categorical variables selection with CLASS statement. Whereas, PROC REG does not support CLASS statement.
- PROC GLMSELECT supports BACKWARD, FORWARD, STEPWISE selection techniques. Whereas, PROC GLM does not support these algorithms.
- Checking Assumptions of Multiple Linear Regression with SAS
- Homoscedasticity Simplified with SAS
- Scoring Linear Regression Model with SAS
/**Categorical Variables with CLASS Statement and PARAM option************/
/*******************************************************************/
PROC GLMSELECT data=bhalla.GLMSELECT;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ selection=stepwise select=SL showpvalues stats=all STB;
run;
SELECT= criterion specifies the criterion that PROC GLMSELECT uses to determine the order in which effects enter and/or leave at each step of the specified selection method.
SELECT= SL request the traditional approach where effects enter and leave the model based on the significance level.
/************** Scoring with SCORE statement **************************/
/*******************************************************************/
PROC GLMSELECT data=ABCD.elemapi2;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ selection=stepwise select=SL showpvalues stats=all STB;
score data = validation out= pred_val;
run;
/*******************************************************************/
/******* Choosing the Best Linear Regression Model with CHOOSE***********/
/*******************************************************************/
PROC GLMSELECT data=bhalla.GLMSELECT;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ CHOOSE=ADJRSQ showpvalues stats=all STB;
run;
CHOOSE statement allows to choose from the list of models at the steps of the selection process the model that yields the best value of the specified criterion. If the optimal value of the specified criterion occurs for models at more than one step, then the model with the smallest number of parameters is chosen. If you do not specify the CHOOSE= option, then the model selected is the model at the final step in the selection process.
/*******************************************************************/
/* Assumption I : Errors (Residuals) should be normally distributed*/
/*******************************************************************/
PROC GLMSELECT data=bhalla.GLMSELECT;
class mealcat / param=ref order=data;
model api00 = yr_rnd mealcat some_col
/ CHOOSE=ADJRSQ showpvalues stats=all STB;
output out=stdres p= predict r = resid;
run;
proc univariate data=stdres normal;
var resid;
run;
/*****************************************************************/
/*********** Assumption II : Checking Heteroscedasticity *****************/
/****************************************************************/
proc autoreg data= bhalla.GLMSELECT;
class mealcat;
model crime = yr_rnd mealcat some_col / archtest;
output out=r r=yresid;
run;
Note : Check P-value of Q statistics and LM tests. P-value greater than .05 indicates homoscedasticity.
Related Posts :
Post a Comment