SAS : First. and Last. Variables

This tutorial explains how to identify first and last observations within a group. It is a common data cleaning challenge to remove duplicates or store unique values. In SQL, we use window functions such as rank over() to generate serial numbers among a group of rows. In SAS, we can create first. and last. variables to achieve this task.

First. and Last. Variables

FIRST.VARIABLE assigns the value of 1 for the first observation in a BY group and the value of 0 for all other observations in the BY group.
LAST.VARIABLE assigns the value of 1 for the last observation in a BY group and the value of 0 for all other observations in the BY group.
Note : Data set must be sorted BY group before applying FIRST. and LAST. Variables.

SAMPLE DATA SET

Suppose you have a dataset consisting 3 variables and 12 observations. The variables are ID, Name and Score. The variable ID is a grouping variable and it contains duplicates.

ID Name Score
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72

Create this data set in SAS

The program below creates the dataset in SAS. Copy the program below and paste it into SAS program editor and run/submit it.



Use PROC SORT to sort the data set by ID. It is required to sort the data before using first. and last. variables.


Note : FIRST./LAST. variables are temporary variables. That means they are not visible in the newly created data set. To make them visible, we need to create two new variables. In the program above, i have created First_ID and Last_ID variables.
SAS : FIRST. vs. LAST. Variables
How it works

FIRST.variable = 1 when an observation is the first observation in each group values of variable ID.
FIRST.variable = 0 when an observation is not the first observation in each group values of variable ID.

LAST.variable = 1 when an observation is the last observation in each group values of variable ID.
LAST.variable = 0 when an observation is not the last observation in each group values of variable ID.
When FIRST.variable = 1 and LAST.VARIABLE = 1, it means there is only a single value in the group. (See ID = 4 in the above data for reference)

Selecting First Observation within a Group

Suppose you need to select only the first observation among a group of observations. It is very easy to do it with IF statement. The IF statement subsets data when IF is not used in conjunction with THEN or ELSE statements.
PROC SORT DATA = READIN;
BY ID;
RUN; 
DATA READIN1;
SET READIN;
BY ID;
IF FIRST.ID;
PROC PRINT;
RUN;
Output : First. Variable

Note : It returns first observation among values of a group (total 7 observations).

Selecting Last Observation within a Group

Suppose you are asked to include only last observation from a group. Like the previous example, we can use last. variable to subset data.
PROC SORT DATA = READIN;
BY ID;
RUN; 
DATA READIN1;
SET READIN;
BY ID;
IF LAST.ID;
PROC PRINT;
RUN;
SAS : Last. Variable

Q. Can we use WHERE instead of IF with First. and Last. Variables?
No. WHERE statement cannot be used with First. and Last. Variables. It is because WHERE statement requires variables already be created in the dataset before processing.

Generate Serial Number in a Group

Suppose you need to create serial numbers among a group of observations. See the snapshot below -

Generate Serial Number in a Group
Data temp;
set readin;
by ID;
if first.id then N = 1;
else N +1;
proc print;
run;
In the above program, we are setting N=1 when it is the first value of a group i.e. ID. Otherwise adding 1 to N. The N+1 implies N = N + 1 in BY group processing. When there is a second observation in a group, N+1 adds 1 to N=1 so N becomes 2. It further increments by 1 when there is third observation in the group and so on.

Calculate Cumulative Score by Group

Suppose you need to calculate running cumulative score by variable ID.
Cumulative Score by Group
Data temp;
set readin;
by ID;
if first.id then CumScore = Score;
else CumScore + Score;
proc print;
run;
In the above program, we are setting Cumscore = Score when it is the first value of a group i.e. ID. Otherwise adding Score to Cumscore. The Cumscore+Score implies CumScore = CumScore + Score in BY group processing.

STORING UNIQUE AND DUPLICATE VALUES
data unique duplicates;
set readin;
by id;
if first.id = 1 and last.id = 1 then output unique;
else output duplicates;
run;
-The DATA statement creates two temporary SAS data sets: DUPLICATES AND UNIQUE.

-The SET statement reads observations from data set READIN 

-The BY statement tells SAS to process observations by ID. Variables FIRST.ID and LAST.ID are created. 

-If the first and last observation is the same value, it implies it is a unique value else the value is duplicate.

Case Studies

1. Identify and select only records having maximum Score among a group of observations of variable ID

2. Select unique observations plus second observation from duplicate observations of variable ID

Solution 1

First we need to build a logic how we can select records having max score within variable ID. We can do it via PROC SORT. In this case, we need to sort data by 2 variables - first sorting on variable ID and then next sorting on Score by descending order. The DESCENDING keyword is used in PROC SORT to arrange data from largest to smallest. Sorting on descending order is used to place the max value at first observation in each group of ID.
proc sort data= readin;
by ID descending score;
run;

data readin1;
set readin;
by ID;
if first.id;
run;
After sorting, we retain records having maximum value by using FIRST. and IF statement. The IF  FIRST.ID keeps the first record among a group of values of variable ID. The output is shown in the image below.
Output
Solve second case study yourself and post your answer in the comment box below. Make sure it should be solved in one data step code. Hint : BOTH FIRST. and LAST. variables would be used.

SAS Tutorials : 100 Free SAS Tutorials

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

8 Responses to "SAS : First. and Last. Variables"

  1. nice explaination

    ReplyDelete
  2. good explanation and good examples

    ReplyDelete
  3. data test8;
    set readin;
    by ID;
    if first.ID =1 and last.ID = 1 then output test8;
    else if mod(_n_,2) = 0 then output test8;
    run;
    proc print; run;

    ReplyDelete
    Replies
    1. data uniq_or_Dup2;
      set readlin;
      if (first.Id=1 and last.id=1) or (lag1(first.id)=1 and first.id=0) then output uniq_or_Dup2;
      proc print
      run;

      Delete
  4. proc sort data=data;
    by ID;
    run;

    data data1;
    set data;
    by ID;
    if first.ID=1 then serial=1;
    else serial+1;
    run;

    data data2;
    set data1;
    by id;
    if first.ID=1 and last.ID=1 or serial=2;
    run;

    ReplyDelete
  5. if we have four duplicates how can we get second one ?

    ReplyDelete
  6. data test8;
    set readin;
    by ID;
    if first.ID =1 and last.ID = 1 then output test8;
    if first.id and last.id=0 then N=1;
    else N+1;
    if N=2 then output test8;
    drop N;
    run;

    ReplyDelete
  7. data nk;
    set readin;
    by id;
    if first.id then N=1;
    else N+1;
    if N=2 then output;
    if first.id and last.id then output;
    drop N;
    proc print;
    run;

    ReplyDelete

Next → ← Prev