In this article, we will explain how you can impute missing values using the PROC STDIZE procedure in SAS.
Let's create two different sample datasets for training and validation for demonstration purposes.
data training; input v1 v2 v3; datalines; 10 20 30 . 25 35 5 . 25 20 30 . ; run; data validation; input v1 v2 v3; datalines; 15 18 20 15 . 25 . 21 30 10 30 . ; run;
Replace Missing Values with Zero
The REPONLY
option in PROC STDIZE procedure tells SAS to replace missing data and not standardize the data. You can specify the numeric value for replacing missing values in MISSING=
option in PROC STDIZE. The following code replaces missing values in the variables (v1, v2, and v3) in the "training" dataset with zero (0) and then output the filled data to "training2".
PROC STDIZE DATA=TRAINING OUT=TRAINING2 MISSING=0 REPONLY; VAR V1 V2 V3; RUN;
Replace Missing Values with Mean
The following code replaces missing values in the variables (v1, v2, and v3) in the "training" dataset with the mean of non-missing values and then output the filled data to "training2". The "outstat=info" option saves statistics of the missing values to a dataset named "info."
proc stdize data=training out=training2 method=mean reponly outstat=info; var v1 v2 v3; run;
You can also replace mean with median as a method of imputing missing values.
Impute Missing Values in Validation Data Using Training Data
To fill missing values in the validation data based on values from the training data, you can use method=in(dataset) option in PROC STDIZE procedure.
proc stdize data=validation out=validation2 reponly method=in(info); var v1 v2 v3; run;
The resulting dataset "validation2" will contain the missing values in the validation dataset filled with mean values from the training dataset.
Share Share Tweet