PROC SORT: Guide to Sorting Data in SAS

This tutorial explains how to use PROC SORT in SAS for sorting data. It includes practical examples that cover different data scenarios.

In SAS, the PROC SORT procedure is used to sort datasets based on one or more variables.

Syntax of PROC SORT

The syntax of the PROC SORT procedure in SAS is as follows:

PROC SORT DATA=input_dataset OUT=output_dataset;
  BY variable(s);
RUN;

DATA: Specifies the input dataset that needs to be sorted.
OUT: Specifies the output dataset where the sorted data will be stored.
BY statement: Used to specify the variable(s) by which the dataset should be sorted. You can list multiple variables separated by spaces to perform sorting based on multiple criteria.

Sample Dataset

Let's create a sample dataset for the examples in this tutorial. Here we have 3 columns and 10 rows.

data mydata;
input product $ transactions sale;
datalines;
A 84 158
A 75 118
A 64 421
A 12 592
B 75 206
B 17 855
B 46 360
C 87 650
C 96 922
C 40 860
;
run;

Example 1: Sort Data in Ascending Order

The following code sorts data in ascending order (smallest to largest) based on a column named transactions. The sorted data will be stored in the dataset named "newdata".

proc sort data=mydata out=newdata;
by transactions;
run;

Example 2: Sort Data in Descending Order

To sort the dataset in descending order, we can use the DESCENDING keyword before the variable name in the BY statement in PROC SORT procedure. The following code sorts the data from largest to smallest based on a column named transactions. The sorted data will be stored in the dataset named "newdata".

proc sort data=mydata out=newdata;
by descending transactions;
run;

Example 3: Filter and Sort Data

If you want to filter the data before sorting, you can use WHERE= option in PROC SORT. The following code selects data for product A only and then sorts the filtered data based on "transactions" column.

proc sort data=mydata (where=(product = 'A')) out=newdata;
by transactions;
run;

Example 4: FIRSTOBS= AND OBS= Options in PROC SORT

The FIRSTOBS= option tells SAS to start reading from a specified observation. Whereas, the OBS= option specifies the observation at which SAS processing ends. In the code below, we are reading data from 5th row till 8th row (inclusive).

proc sort data=mydata (firstobs= 5 obs=8) out=newdata;
by transactions;
run;

Example 5: Format in PROC SORT

In the code below, a format named $PRODUCT is defined using the proc format procedure. This format labels the values 'A', 'B', and 'C' to their corresponding labels 'Product A', 'Product B', and 'Product C'.

proc format;
value $PRODUCT 'A'='Product A' 'B'='Product B' 'C'='Product C';
run;

proc sort data=mydata out=newdata;
format product $PRODUCT.;
by transactions;
run;

Example 6: Sort Data by multiple columns

Here the sorting is performed based on two columns - "product" and "transactions".

proc sort data=mydata out=newdata;
by product transactions;
run;

Example 7: NODUP AND NODUPKEY in PROC SORT

In PROC SORT, the NODUP and NODUPKEY options are used to remove duplicate observations from the dataset.

The NODUPKEY option is used to remove duplicates based on the variable(s) specified in BY statement. Whereas the NODUP option is used to remove duplicates based on values of all variables in a dataset.

In the code below, 7 observations with duplicates based on "product" column were deleted.

proc sort data=mydata nodupkey out=newdata;
by product;
run;

In the code below, 0 duplicate observations were deleted as there is no duplicate rows in the dataset.

proc sort data=mydata nodup out=newdata;
by product;
run;

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 2 Responses to "PROC SORT: Guide to Sorting Data in SAS"

AnonymousMarch 29, 2024 at 8:31 AM
The output shown for example 4 is wrong I believe. The code first reads from row 5 to 8, and then does the sorting. However the output shown assumes that all of the data is shown, all of it sorted, and then rows 5 to 8 are shown. I used the following code to the get the output shown for example 4:

proc sort data=mydata out=newdata;
by transactions;
proc print data=newdata (firstobs=5 obs=8);
run;