SAS : First. and Last. Variables

Live Online Training : SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects
- Scenario Based Questions
- Job Placement Assistance
- Get 10% off till Sept 25, 2017
- Batch starts from October 8, 2017

This tutorial explains how to identify first and last observations within a group. It is a common data cleaning challenge to remove duplicates or store unique values. In SQL, we use window functions such as rank over() to generate serial numbers among a group of rows. In SAS, we can create first. and last. variables to achieve this task.

First. and Last. Variables

FIRST.VARIABLE assigns the value of 1 for the first observation in a BY group and the value of 0 for all other observations in the BY group.
LAST.VARIABLE assigns the value of 1 for the last observation in a BY group and the value of 0 for all other observations in the BY group.
Note : Data set must be sorted BY group before applying FIRST. and LAST. Variables.

SAMPLE DATA SET

Suppose you have a dataset consisting 3 variables and 12 observations. The variables are ID, Name and Score. The variable ID is a grouping variable and it contains duplicates.

ID Name Score
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72

Create this data set in SAS

The program below creates the dataset in SAS. Copy the program below and paste it into SAS program editor and run/submit it.



Use PROC SORT to sort the data set by ID. It is required to sort the data before using first. and last. variables.


Note : FIRST./LAST. variables are temporary variables. That means they are not visible in the newly created data set. To make them visible, we need to create two new variables. In the program above, i have created First_ID and Last_ID variables.
SAS : FIRST. vs. LAST. Variables
How it works

FIRST.variable = 1 when an observation is the first observation in each group values of variable ID.
FIRST.variable = 0 when an observation is not the first observation in each group values of variable ID.

LAST.variable = 1 when an observation is the last observation in each group values of variable ID.
LAST.variable = 0 when an observation is not the last observation in each group values of variable ID.
When FIRST.variable = 1 and LAST.VARIABLE = 1, it means there is only a single value in the group. (See ID = 4 in the above data for reference)

Selecting First Observation within a Group

Suppose you need to select only the first observation among a group of observations. It is very easy to do it with IF statement. The IF statement subsets data when IF is not used in conjunction with THEN or ELSE statements.
PROC SORT DATA = READIN;
BY ID;
RUN; 
DATA READIN1;
SET READIN;
BY ID;
IF FIRST.ID;
PROC PRINT;
RUN;
Output : First. Variable

Note : It returns first observation among values of a group (total 7 observations).

Selecting Last Observation within a Group

Suppose you are asked to include only last observation from a group. Like the previous example, we can use last. variable to subset data.
PROC SORT DATA = READIN;
BY ID;
RUN; 
DATA READIN1;
SET READIN;
BY ID;
IF LAST.ID;
PROC PRINT;
RUN;
SAS : Last. Variable

Q. Can we use WHERE instead of IF with First. and Last. Variables?
No. WHERE statement cannot be used with First. and Last. Variables. It is because WHERE statement requires variables already be created in the dataset before processing.

Generate Serial Number in a Group

Suppose you need to create serial numbers among a group of observations. See the snapshot below -

Generate Serial Number in a Group
Data temp;
set readin;
by ID;
if first.id then N = 1;
else N +1;
proc print;
run;
In the above program, we are setting N=1 when it is the first value of a group i.e. ID. Otherwise adding 1 to N. The N+1 implies N = N + 1 in BY group processing. When there is a second observation in a group, N+1 adds 1 to N=1 so N becomes 2. It further increments by 1 when there is third observation in the group and so on.

Calculate Cumulative Score by Group

Suppose you need to calculate running cumulative score by variable ID.
Cumulative Score by Group
Data temp;
set readin;
by ID;
if first.id then CumScore = Score;
else CumScore + Score;
proc print;
run;
In the above program, we are setting Cumscore = Score when it is the first value of a group i.e. ID. Otherwise adding Score to Cumscore. The Cumscore+Score implies CumScore = CumScore + Score in BY group processing.

STORING UNIQUE AND DUPLICATE VALUES
data unique duplicates;
set readin;
by id;
if first.id = 1 and last.id = 1 then output unique;
else output duplicates;
run;
-The DATA statement creates two temporary SAS data sets: DUPLICATES AND UNIQUE.

-The SET statement reads observations from data set READIN 

-The BY statement tells SAS to process observations by ID. Variables FIRST.ID and LAST.ID are created. 

-If the first and last observation is the same value, it implies it is a unique value else the value is duplicate.

Case Studies

1. Identify and select only records having maximum Score among a group of observations of variable ID

2. Select unique observations plus second observation from duplicate observations of variable ID

Solution 1

First we need to build a logic how we can select records having max score within variable ID. We can do it via PROC SORT. In this case, we need to sort data by 2 variables - first sorting on variable ID and then next sorting on Score by descending order. The DESCENDING keyword is used in PROC SORT to arrange data from largest to smallest. Sorting on descending order is used to place the max value at first observation in each group of ID.
proc sort data= readin;
by ID descending score;
run;

data readin1;
set readin;
by ID;
if first.id;
run;
After sorting, we retain records having maximum value by using FIRST. and IF statement. The IF  FIRST.ID keeps the first record among a group of values of variable ID. The output is shown in the image below.
Output
Solve second case study yourself and post your answer in the comment box below. Make sure it should be solved in one data step code. Hint : BOTH FIRST. and LAST. variables would be used.

SAS Tutorials : 100 Free SAS Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

23 Responses to "SAS : First. and Last. Variables"

  1. nice explaination

    ReplyDelete
  2. good explanation and good examples

    ReplyDelete
  3. data test8;
    set readin;
    by ID;
    if first.ID =1 and last.ID = 1 then output test8;
    else if mod(_n_,2) = 0 then output test8;
    run;
    proc print; run;

    ReplyDelete
    Replies
    1. data uniq_or_Dup2;
      set readlin;
      if (first.Id=1 and last.id=1) or (lag1(first.id)=1 and first.id=0) then output uniq_or_Dup2;
      proc print
      run;

      Delete
  4. proc sort data=data;
    by ID;
    run;

    data data1;
    set data;
    by ID;
    if first.ID=1 then serial=1;
    else serial+1;
    run;

    data data2;
    set data1;
    by id;
    if first.ID=1 and last.ID=1 or serial=2;
    run;

    ReplyDelete
  5. if we have four duplicates how can we get second one ?

    ReplyDelete
  6. data test8;
    set readin;
    by ID;
    if first.ID =1 and last.ID = 1 then output test8;
    if first.id and last.id=0 then N=1;
    else N+1;
    if N=2 then output test8;
    drop N;
    run;

    ReplyDelete
  7. data nk;
    set readin;
    by id;
    if first.id then N=1;
    else N+1;
    if N=2 then output;
    if first.id and last.id then output;
    drop N;
    proc print;
    run;

    ReplyDelete
  8. data new2;
    set readin1;
    retain cnt;
    if first_id=1 then cnt=1;
    else cnt+1;
    if cnt le 2;
    run;

    ReplyDelete
  9. data ad;
    set abcd_s;
    by x;
    if first.x and last.x then output;
    if first.x then n=1;
    else n+1;
    if n=2 then output;
    drop n;
    run;

    ReplyDelete
  10. Data Readin2;
    set readin;
    by id;
    if first.id then n =1;
    Else n+1;
    if first.id = 1 and last.id= 1 or n = 2;
    drop n
    Run;

    ReplyDelete
  11. Its very helpful, I have come across many questions in my interviews

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. data gaurav;
    set readin;
    by id;
    if first.id then N=1;
    else N+1;
    if (first.id=1 and last.id=1) then output;/* unique observation*/
    if N=2 then output;/* second observation from duplicate*/
    run;
    proc print;
    run;

    ReplyDelete
  14. If we have 4 or more than 4 duplicate ids then what would be the logic to find out 2nd observation?
    Thanks!

    ReplyDelete
  15. proc sort data= readin;
    by ID descending score;
    run;

    data readin1;
    set readin;
    by ID;
    if last.id;
    run;

    ReplyDelete
  16. if i want to calculate average per person then how to calculate??

    ReplyDelete
    Replies
    1. proc sort data= readin;
      by ID Score;
      run;

      data temp;
      set readin;
      by ID;
      if first.ID then
      do;
      N=1;
      cumScore = Score;
      avgScore = Score;
      end;
      else
      do;
      N+1;
      cumScore + Score;
      avgScore = cumScore/N;
      end;
      if last.ID = 1;

      run;

      Delete
  17. Doesn't generating serial numbers as well as cumulative rank require the newly created variables - N and Cum Score to be retained. Else, SAS will not be able to store their values jumping from one obs to the next ?

    ReplyDelete
  18. Below code is working for case study 2. Let me know if anybody still facing any problem in solving or understanding this question.


    proc sort data=READIN;
    BY ID Descending score;
    run;

    DATA READIN2;
    SET READIN;
    BY ID ;
    First_ID= First.ID;
    Last_ID= Last.ID;
    IF first.id then N=1;
    ELSE N+1;
    if (FIRST_ID=1 and LAST_ID=1) or N=2 then output READIN2 ;
    drop First_ID Last_ID N;
    proc print;
    run;

    ReplyDelete
  19. How can we use ELSE N+1 in data step with using retain? Data step works obs by obs din't N loses its previous value in every next iteration?
    I know, I am missing something, but what?

    ReplyDelete
    Replies
    1. Retain can be used in two ways:
      1. Implicitly : We can use retain implicitly by using +1 notation. If we use retain implicitly then we don't need to write retain statement. just like above code.
      Another example :
      data a;
      set xyz;
      by id;
      if first.id then x=1;
      else x+1 : /which means x=x+1 but we are using retain implicitly so no need to mention x= here;
      run;

      2. Explicitly

      Using the same example where we are using retain explicitly so need to mention retain statement.
      data a;
      set xyz;
      by id;
      retain x;
      if first.id then x=1;
      else x=x+1 : /here we are using x= so need to mention retain statement;
      run;

      Delete

Next → ← Prev