PROC SUMMARY in SAS: Learn with Examples

Deepanshu Bhalla Add Comment

This tutorial explains how to use PROC SUMMARY in SAS, along with examples.

PROC SUMMARY is a powerful SAS procedure that can be used to calculate descriptive statistics for variables either across all observations or within specific groups of observations.


How to use PROC SUMMARY?

Below is the syntax of PROC SUMMARY.

PROC SUMMARY DATA=input_dataset;
   BY variable;
   CLASS variable(s) </ options>;
   VAR variable(s);
   OUTPUT OUT=output_dataset </ options>;
RUN;

Please refer to the explanation of the PROC SUMMARY statements below.

  • DATA=input_dataset: Specifies the input dataset containing the variables you want to summarize.
  • BY=variable: Specifies the classification variable. It calculates separate statistics for each BY group and returns the analysis for the variable in separate tables.
  • CLASS variable(s): Specifies the list of classification variables. It calculates summary statistics for a variable grouped by the variable specified in this statement and returns analysis for the variable in a single table.
  • VAR variable(s): Specifies the list of variables for which you want to calculate summary statistics. You can include multiple variables separated by spaces.
  • OUTPUT OUT=output_dataset: Specifies the output dataset where the summarized results will be stored. You can choose a name for the output dataset.

We are using SAS built-in dataset named CARS, which contains information about various cars, including their specifications and attributes.

Variable Type Variable Name

Continuous variables

MSRP

INVOICE

LENGTH

Horsepower

MPG_City

MPG_Highway

Classification variables

Type

DriveTrain

Origin

Make

In the code below, we are summarising three numeric variables MSRP, INVOICE and LENGTH. The variable "MSRP" refers to the Manufacturer's suggested retail price of the car. The variable "INVOICE" refers to the invoice price of the car. The variable "LENGTH" refers to the length of the car.

PROC SUMMARY DATA = SASHELP.CARS PRINT;
VAR MSRP INVOICE LENGTH;
RUN;
PROC SUMMARY

By default, PROC SUMMARY produces N, Mean, Standard Deviation, Minimum and Maximum statistics.

The PRINT option is used to print results in the output window. By default, PROC SUMMARY does not print results, hence it is required to include PRINT option to view results.

Common Statistical Options of PROC SUMMARY

Below is a list of common statistical options of PROC SUMMARY.

  • N: Number of observations
  • MEAN: Mean
  • STD: Standard Deviation
  • MIN: Minimum value
  • MAX: Maximum value
  • NMISS: Number of missing observations
  • SUM: Sum of observations
  • MEDIAN: Middle value (50th percentile)
  • P1: 1st percentile
  • P5: 5th percentile
  • P10: 10th percentile
  • P90: 90th percentile
  • P95: 95th percentile
  • P99: 99th percentile
  • Q1: First Quartile
  • Q3: Third Quartile

In the SAS program below, we are only showing 3 descriptive statistics - Number of missing observations, Mean and Median.

PROC SUMMARY DATA = SASHELP.CARS NMISS MEAN MEDIAN PRINT;
VAR MSRP INVOICE LENGTH;
RUN;
Since the dataset does not have missing values, it is showing 0 against the variables under "NMISS" column.
PROC SUMMARY Output

To remove label column in PROC SUMMARY, you can use the option NOLABELS.

PROC SUMMARY DATA = SASHELP.CARS NMISS MEAN MEDIAN PRINT NOLABELS;
VAR MSRP INVOICE LENGTH;
RUN;

How to group rows using PROC SUMMARY?

To group numeric variable by categorical variable, you can use the CLASS statement. It is similar to GROUP BY in SQL.

PROC SUMMARY DATA = SASHELP.CARS MEAN PRINT NOLABELS;
CLASS TYPE;
VAR MSRP INVOICE;
RUN;

In the code above, we are calculating mean for "MSRP" and "INVOICE" by variable "TYPE".

PROC SUMMARY: CLASS Statement

To remove "N Obs" column from the output, we can use NONOBS option.

PROC SUMMARY DATA = SASHELP.CARS MEAN PRINT NOLABELS NONOBS;
CLASS TYPE;
VAR MSRP INVOICE;
RUN;
How to change order of categorical variable?

By default, PROC SUMMARY returns result in ascending order of classification variable. You can use DESCENDING option in the CLASS statement to arrange it in descending order.

PROC SUMMARY DATA = SASHELP.CARS MEAN PRINT NOLABELS NONOBS;
CLASS TYPE  / DESCENDING;
VAR MSRP INVOICE;
RUN;

How to save output of PROC SUMMARY in a dataset?

You can use OUTPUT OUT=output_dataset to save output of PROC SUMMARY in a SAS Dataset. In the code below, output will be stored in the dataset named READIN. Here we are saving MEAN and MEDIAN of each of the 3 variables - MSRP, INVOICE and LENGTH. The AUTONAME option automatically assigns variable names in the output dataset.

PROC SUMMARY DATA = SASHELP.CARS;
VAR MSRP INVOICE LENGTH;
OUTPUT OUT = READIN MEAN= MEDIAN = /AUTONAME;
RUN;

We can use AUTOLABEL option to automatically assigns label names in the variables in the output dataset.

PROC SUMMARY DATA = SASHELP.CARS;
VAR MSRP INVOICE LENGTH;
OUTPUT OUT = READIN MEAN= MEDIAN = /AUTONAME AUTOLABEL;
RUN;

To assign custom variable names of your choice, you can use syntax -MEAN(original_variable)=new_variable_name.

PROC SUMMARY DATA = SASHELP.CARS;
VAR MSRP INVOICE;
OUTPUT OUT = READIN MEAN(MSRP)=Avg_MSRP MEAN(INVOICE)=Avg_INVOICE;
RUN;
How to interpret _TYPE_ column?

By Default SAS creates two variables named _TYPE_ and _FREQ_ in the output dataset.

PROC SUMMARY DATA = SASHELP.CARS;
CLASS TYPE;
VAR MSRP INVOICE;
OUTPUT OUT = READIN MEAN(MSRP)=Avg_MSRP MEAN(INVOICE)=Avg_INVOICE;
RUN;
  • _TYPE_=0 refers to the entire dataset which means descriptive statistics like frequency, mean are calculated based on the entire dataset.
  • _TYPE_=1 refers to descriptive statistics of unique categories of a classification variable named TYPE. Similarly, if you have more than 1 classification variable in the CLASS statement, _TYPE_ will be incremented accordingly.
_TYPE_ in SAS

The _FREQ_ variable contains the number of rows (frequency).

How to select or remove variables in PROC SUMMMARY?

We can use DROP option to remove the variables _TYPE_ and _FREQ_. Similarly we can also use KEEP option to retain the specific variables.

PROC SUMMARY DATA = SASHELP.CARS;
VAR MSRP INVOICE;
OUTPUT OUT = READIN (DROP = _TYPE_ _FREQ_) MEAN(MSRP)=AVG_MSRP MEAN(INVOICE)=AVG_INVOICE;
RUN;

How to format and summarise using PROC SUMMARY?

Suppose you need to categorise MSRP column into the following bands and then calculates average horsepower by the cohorts.

  1. Values from 0 to 20,000: Displayed as 'Up to 20K'
  2. Values from 20,000 to 50,000: Displayed as '20K-50K'
  3. Values from 50,000 to 100,000: Displayed as '50K-100K'
  4. Values greater than 100,000: Displayed as '100K or more'

FORMAT statement in PROC SUMMARY tells SAS to apply user-defined formats before producing summary table.

proc format;
value MSRP
0-20000 = 'Up to 20K'
20000-50000 = '20K-50K'
50000-100000 = '50K-100K'
100000-high = '100K or more';
run;

PROC SUMMARY DATA = SASHELP.CARS MEAN PRINT NOLABELS;
CLASS MSRP;
FORMAT MSRP MSRP.;
VAR HORSEPOWER;
RUN;

Difference between PROC SUMMARY and PROC MEANS

PROC SUMMARY and PROC MEANS are very similar, but they have a few differences. Here are the main differences between PROC SUMMARY and PROC MEANS.

  • Proc MEANS by default produces printed output in the OUTPUT window whereas Proc SUMMARY does not. PROC SUMMARY requires PRINT option to print results in the output window.
  • If you don't include the VAR statement in PROC MEANS, it analyses all the numeric variable whereas if you exclude the VAR statement in PROC SUMMARY, it produces a simple count of observations.
Compare results of PROC MEANS and PROC SUMMARY when we are not using the VAR statement.
PROC MEANS DATA=SASHELP.CARS;
CLASS MSRP;
RUN;

PROC SUMMARY DATA=SASHELP.CARS PRINT;
CLASS MSRP;
RUN;
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "PROC SUMMARY in SAS: Learn with Examples"
Next → ← Prev