When building a predictive model, it is important to impute missing data. There are several ways to treat missing data.
The following is a list of options to impute missing values :
The following code fills in missing data with mean/median/mode for each of the variables assigned in the macro and saves it into a new data set.
*****************************************************************/;
************* Imputing Missing Data **************************/;
*****************************************************************/;
*Input : Specify your input dataset name (raw data).
The following is a list of options to impute missing values :
- Fill missing values with mean value of the continuous variable (for real numeric values) in which NO outlier exists.
- Fill missing values with median value of the continuous variable (for real numeric values) in which outlier exists.
- Fill missing values with median value of the ordinal categorical variables
- Fill missing values with mode value of the nominal categorical variables
SAS Macro
The following code fills in missing data with mean/median/mode for each of the variables assigned in the macro and saves it into a new data set.
************* Imputing Missing Data **************************/;
*****************************************************************/;
*Input : Specify your input dataset name (raw data).
*Stats : Specify mean, median or mode for replacing missing data.
*Vars : Specify your variables in which missing values exist.
- Multiple variables should be seperated by a space.
- The list of variables can be referred as var1-var25.
- For all numeric variables, use _numeric_ keyword.
*Output : Specify dataset where you want ouput file to be saved.
/****************************************************************/;
- Multiple variables should be seperated by a space.
- The list of variables can be referred as var1-var25.
- For all numeric variables, use _numeric_ keyword.
*Output : Specify dataset where you want ouput file to be saved.
/****************************************************************/;
%macro replace (input= prac.file1,stats=median,vars=Q1-Q5,output=replaced);
* Generate analysis results ;
proc univariate data=&input noprint;
var &vars;
output out=dummy &stats= &vars;
run;
* Convert to vertical ;
proc transpose data=dummy out=dummy;
run;
* Replace missing with analysis results ;
data &output;
set &input;
array vars &vars ;
do i =1 to dim(vars);
set dummy(keep=col1) point= i ;
vars(i)=coalesce(vars(i),col1);
drop col1 ;
end;
run;
%mend;
Options mprint nosymbolgen;
%replace (input= readin1,stats= mode,vars= dbp scl,output=replaced);
Share Share Tweet