In this tutorial we will cover how to standardize variables in SAS using PROC STDIZE.
Standardization
Standardization refers to subtracting the variable's mean and dividing it by the variable's standard deviation. The purpose of standardization is to transform numerical variables to a common scale, making them more easily comparable. Standardization removes the original units of measurement and centers the data around zero, with a standard deviation of one.
How to Use PROC STDIZE to Standardize Data
By using PROC STDIZE
with the METHOD=STD
method, we can standardize variables using the sample mean and the sample standard deviation. In the example below, we are using sashelp.class
dataset and standardizing the Height and Weight variables.
proc stdize data=sashelp.class out=readin method=std; var Height Weight; run;
How to validate Standardization
To confirm that standardization has been applied correctly to variables, you can calculate the mean and standard deviation of the standardized variables. Verify that the mean of the standardized variables is approximately zero and the standard deviation is approximately one.
PROC MEANS is a SAS procedure used for calculating mean and standard deviations for one or more variables in a dataset. Here we are using the output dataset readin
generated from PROC STDIZE.
proc means data=readin Mean StdDev ndec=2; var Height Weight; run;
As shown in the image above, both the standardized variables have mean=0 and standard deviation=1
Standardization by Group
To apply the standardization by group, we can use the BY statement in PROC STDIZE. In this case, the variable "Sex" serves as the grouping variable with two distinct categories: Male and Female.
Make sure to sort the grouping variable before using the BY statement in PROC STDIZE. You can sort data using PROC SORT procedure. Sorting is not necessary if the data is already arranged by the grouping variable.
proc sort data = sashelp.class out=students; by Sex; run; proc stdize data=students out=readin method=std; var Height Weight; by Sex; run; proc means data=readin Mean StdDev ndec=2; class Sex; var Height Weight; run;
Each group has mean=0 and standard deviation=1 for both variables Height and Weight.
Share Share Tweet