In this tutorial we will cover how to standardize variables in SAS using PROC STDIZE.

## Standardization

Standardization refers to subtracting the variable's mean and dividing it by the variable's standard deviation. The purpose of standardization is to transform numerical variables to a common scale, making them more easily comparable. Standardization removes the original units of measurement and centers the data around zero, with a standard deviation of one.

## How to Use PROC STDIZE to Standardize Data

By using `PROC STDIZE`

with the `METHOD=STD`

method, we can standardize variables using the sample mean and the sample standard deviation. In the example below, we are using `sashelp.class`

dataset and standardizing the **Height** and **Weight** variables.

proc stdize data=sashelp.class out=readin method=std; var Height Weight; run;

### How to validate Standardization

To confirm that standardization has been applied correctly to variables, you can calculate the mean and standard deviation of the standardized variables. Verify that the **mean of the standardized variables is approximately zero and the standard deviation is approximately one**.

PROC MEANS is a SAS procedure used for calculating mean and standard deviations for one or more variables in a dataset. Here we are using the output dataset `readin`

generated from PROC STDIZE.

proc means data=readin Mean StdDev ndec=2; var Height Weight; run;

As shown in the image above, both the standardized variables have **mean=0** and **standard deviation=1**

## Standardization by Group

To apply the standardization by group, we can use the **BY statement** in PROC STDIZE. In this case, the variable "**Sex**" serves as the grouping variable with two distinct categories: Male and Female.

Make sure to sort the grouping variable before using the BY statement in PROC STDIZE. You can sort data using PROC SORT procedure. Sorting is not necessary if the data is already arranged by the grouping variable.

proc sort data = sashelp.class out=students; by Sex; run; proc stdize data=students out=readin method=std; var Height Weight; by Sex; run; proc means data=readin Mean StdDev ndec=2; class Sex; var Height Weight; run;

Each group has **mean=0** and **standard deviation=1** for both variables Height and Weight.

## Post a Comment