This tutorial explains how to calculate the rank for one or more variables using **PROC RANK**.

In SAS, there are multiple ways to calculate overall rank or rank by a grouping variable. In data step, it can be done via RETAIN statement. SAS made it easy to compute rank with **PROC RANK**.

The following SAS program creates a dataset which will be used to explain examples in this tutorial.

data temp; input ID Gender $ Score; cards; 1 M 33 2 M 94 3 M 66 4 M 46 5 F 92 6 F 95 7 F 18 8 F 11 ; run;

In the following SAS Program, we are calculating the rank of the variable "Score".

proc rank data= temp out = result; var Score; ranks ranking; run;

**Notes :**

- The
**OUT**option is used to store output of the rank procedure. - The
**VAR**option is used to specify numeric variable (s) for which you want to calculate rank - The
**RANKS**option tells SAS to name the rank variable - By default, it calculates rank in ascending order.

Suppose you need to assign the largest value of a variable as rank 1 and the last rank to the lowest value. The **descending **keyword tells SAS to sort the data in descending order and assign rank to the variable accordingly.

proc rank data= tempdescendingout = result; var Score; ranks ranking; run;

Suppose you need to split the variable into **four **parts, you can use the **groups option** in PROC RANK. It means you are telling SAS to assign only 4 ranks to a variable.

proc rank data= temp descendinggroups = 4out = result; var Score; ranks ranking; run;

Use GROUPS=4 for **quartile ranks**, and GROUPS=10 for **decile ranks**, GROUPS = 100 for **percentile ranks**.

Suppose you need to calculate rank by a grouping variable (Gender). To accomplish this task, you can use the **by statement** in proc rank. Please note that it is required to sort the data before using by statement.

proc sort data = temp; by gender; run; proc rank data= temp descending out = result; var Score; ranks ranking;by Gender;run;

Let's create a sample dataset. See the variable "score" having same values **(33 appearing twice).**

data temp2; input ID Gender $ Score; cards; 1 M 33 2 M 33 3 M 66 4 M 46; run;

To handle duplicate values in ranking, you can use option `TIES = HIGH | LOW | MEAN | DENSE`

in PROC RANK.

**See the comparison between these options in the image below -**

proc rank data= temp2ties = denseout = result; var Score; ranks rank_dense; run;

**LOW -**assigns the smallest of the corresponding ranks.**HIGH -**assigns the largest of the corresponding ranks.**MEAN -**assigns the mean of the corresponding ranks**(Default Option).****DENSE -**assigns the smallest of the corresponding rank and add +1 to the next rank (don't break sequence)

super good..keep going..

ReplyDeletevery good and easy explanation....

ReplyDeletePlease add predictive modelling steps...step by step.

I didn't get the concept of ties that you have explained in the last!

ReplyDeleteThank you for your feedback. I have added more description to the concept of ties. Hope it helps.

DeleteNice explanation... keep going guys.. Thanks!!

ReplyDeleteNice explanation..

ReplyDeleteHow can we get rank for character variables

ReplyDeleteHow can I get the cuts of deciles along with ranks and frequency as provided by proc rank?

ReplyDeleteCan you please the proc rank concept along with proc sql?

ReplyDeletegracias, muy buena explicaciÃ³n

ReplyDelete