This tutorial explains how to use PROC SORT in SAS for sorting data. It includes practical examples that cover different data scenarios.
In SAS, the PROC SORT
procedure is used to sort datasets based on one or more variables.
Syntax of PROC SORT
The syntax of the PROC SORT procedure in SAS is as follows:
PROC SORT DATA=input_dataset OUT=output_dataset; BY variable(s); RUN;
DATA
: Specifies the input dataset that needs to be sorted.OUT
: Specifies the output dataset where the sorted data will be stored.BY
statement: Used to specify the variable(s) by which the dataset should be sorted. You can list multiple variables separated by spaces to perform sorting based on multiple criteria.
Let's create a sample dataset for the examples in this tutorial. Here we have 3 columns and 10 rows.
data mydata; input product $ transactions sale; datalines; A 84 158 A 75 118 A 64 421 A 12 592 B 75 206 B 17 855 B 46 360 C 87 650 C 96 922 C 40 860 ; run;
Example 1: Sort Data in Ascending Order
The following code sorts data in ascending order (smallest to largest) based on a column named transactions. The sorted data will be stored in the dataset named "newdata".
proc sort data=mydata out=newdata; by transactions; run;
Example 2: Sort Data in Descending Order
To sort the dataset in descending order, we can use the DESCENDING
keyword before the variable name in the BY statement in PROC SORT procedure. The following code sorts the data from largest to smallest based on a column named transactions. The sorted data will be stored in the dataset named "newdata".
proc sort data=mydata out=newdata; by descending transactions; run;
Example 3: Filter and Sort Data
If you want to filter the data before sorting, you can use WHERE=
option in PROC SORT. The following code selects data for product A only and then sorts the filtered data based on "transactions" column.
proc sort data=mydata (where=(product = 'A')) out=newdata; by transactions; run;
Example 4: FIRSTOBS= AND OBS= Options in PROC SORT
The FIRSTOBS= option tells SAS to start reading from a specified observation. Whereas, the OBS= option specifies the observation at which SAS processing ends. In the code below, we are reading data from 5th row till 8th row (inclusive).
proc sort data=mydata (firstobs= 5 obs=8) out=newdata; by transactions; run;
Example 5: Format in PROC SORT
In the code below, a format named $PRODUCT is defined using the proc format procedure. This format labels the values 'A', 'B', and 'C' to their corresponding labels 'Product A', 'Product B', and 'Product C'.
proc format; value $PRODUCT 'A'='Product A' 'B'='Product B' 'C'='Product C'; run; proc sort data=mydata out=newdata; format product $PRODUCT.; by transactions; run;
Example 6: Sort Data by multiple columns
Here the sorting is performed based on two columns - "product" and "transactions".
proc sort data=mydata out=newdata; by product transactions; run;
Example 7: NODUP AND NODUPKEY in PROC SORT
In PROC SORT, the NODUP and NODUPKEY options are used to remove duplicate observations from the dataset.
The NODUPKEY option is used to remove duplicates based on the variable(s) specified in BY statement. Whereas the NODUP option is used to remove duplicates based on values of all variables in a dataset.
In the code below, 7 observations with duplicates based on "product" column were deleted.
proc sort data=mydata nodupkey out=newdata; by product; run;
In the code below, 0 duplicate observations were deleted as there is no duplicate rows in the dataset.
proc sort data=mydata nodup out=newdata; by product; run;
The output shown for example 4 is wrong I believe. The code first reads from row 5 to 8, and then does the sorting. However the output shown assumes that all of the data is shown, all of it sorted, and then rows 5 to 8 are shown. I used the following code to the get the output shown for example 4:
ReplyDeleteproc sort data=mydata out=newdata;
by transactions;
proc print data=newdata (firstobs=5 obs=8);
run;
Thanks for reporting the issue. I fixed it. Thanks!
Delete