How to Filter Data in SAS

Deepanshu Bhalla 1 Comment

This tutorial explains multiple ways to filter data in SAS, along with examples.

Sample Dataset

Let's create a sample SAS dataset for demonstration purposes. The following code creates a sample SAS dataset which will contain 5 observations and 4 variables.

/* Sample SAS dataset */
data mydata;
  input ID Age Gender $ Score;
  datalines;
1 24 Male 85
2 30 Female 90
3 22 Male 78
4 28 Female 95
5 31 Male 80
;
run;

IF Statement: Filtering Data in SAS

In SAS, the IF statement is used to filter data. The IF statement is used within the data step to conditionally execute statements based on a specified condition. In the example below, we want dataset that will only contain observations with Age greater than or equal to 25.

data filtered_data;
  set mydata;
  if Age >= 25;
run;
How to Filter Data in SAS

Look at the SAS log after filtering.

 NOTE: There were 5 observations read from the data set WORK.MYDATA.
 NOTE: The data set WORK.FILTERED_DATA has 3 observations and 4 variables.

The OUTPUT statement is used when you need to filter data into multiple datasets based on certain criteria. The following SAS code is creating two new datasets, age1 and age2, by filtering the original dataset "mydata" based on the condition Age >= 25. Observations that satisfy the condition will be output to the age1 dataset, while those that do not meet the condition will be output to the age2 dataset.

data age1 age2;
  set mydata;
  if Age >= 25 then output age1;
  else output age2;
run;

Result : The data set WORK.AGE1 has 3 observations and 4 variables. The data set WORK.AGE2 has 2 observations and 4 variables.

How to Use Multiple IF statement

In SAS, you can use multiple IF statements using ELSE IF statement to implement more complex logic within a data step. The following SAS code is creating three new datasets, namely male, female, and invalid, based on the values of the variable Gender in the existing dataset mydata.

data male female invalid;
  set mydata;
  if Gender = "Male" then output male;
  else if Gender = "Female" then output female;
  else output invalid;
run;

WHERE Statement: Filtering Data in SAS

In SAS, the WHERE statement can also be used to filter data. The WHERE statement can be used within both the data step and SAS Procedures. In the example below, we are using the condition of Age greater than or equal to 25 and we want filtered dataset in a new dataset named "filtered_data".

data filtered_data;
  set mydata;
  where Age >= 25;
run;

You can also use the WHERE statement in the SET statement.

data filtered_data;
  set mydata (where=(Age >= 25));
run;

Difference between IF and WHERE Statement

Here are some of the key differences between the IF and WHERE Statement in SAS.

  • Efficiency: The WHERE statement is more efficient than the IF statement as it tells SAS to read only those observations of the input dataset that meets criteria. Whereas, the IF statement reads all observations of the input dataset.
  • Filtering based on New Variables: If you create a new variable before filtering in the data step, you can't use the WHERE statement to filter based on this new variable. Whereas, the IF statement can filter data based on new variables.
  • Split Data into Multiple Datasets: The IF statement along with OUTPUT statement can be used to filter and output data into multiple datasets. Whereas, the WHERE statement does not allow to filter and send data to multiple datasets.
  • The WHERE statement can be used in SAS procedures (PROC) to filter data while the IF statement cannot be used in procedures.

How to Use Multiple Conditions

Please refer to the table below showing how to use logical operators in SAS.

Symbol Keyword Example
& AND Gender = "Female" AND Age > 29
| OR Gender = "Female" OR Age > 29
^= NOT NOT (Gender = "Female")

The following code filters the "mydata" dataset based on two conditions: Gender = "Female" or Age > 29.

data filtered_data;
  set mydata;
  where Gender = "Female" OR Age > 29;
run;

You can also write the above condition using "|" instead of OR operator like this :
where Gender = "Female" | Age > 29;

In the example below, the condition NOT (Gender = 'Female') means it will select all rows where the Gender is NOT equal to "Female".

data filtered_data;
  set mydata;
  where NOT (Gender = "Female");
run;

Text Filters in SAS

Text filters are a type of filter that can be used to find text that matches a specific pattern. This can be useful for selecting rows having specific words, phrases in a character variable.

data sample_data;
   input name $;
   datalines; 
Sam
Saina
Dave
Riana
DIANA
Hiana
David
davia
;
run;
Filter: Contains Some Text

The CONTAINS operator checks if the specified substring is present anywhere within the character variable. The following code creates a dataset that will only contain observations with names that contain the substring 'IANA' in any case (upper or lower).

  
data filtered_data;
  set sample_data;
  where upcase(name) contains 'IANA';
run;  

The filtered dataset will look like this:

  
name
--------
Riana
DIANA
Hiana
Filter: Does Not Contain Some Text

The following SAS program creates a dataset that will only contain observations that do not meet the specified condition, i.e., those where the 'name' variable does not contain the substring 'IANA' in any case.

  
data filtered_data;
  set sample_data;
  where upcase(name) not contains 'IANA';
run; 
Filter: Starts with Some Text

The following SAS program creates a dataset that will only contain observations those where the 'name' variable starts with 'D' or 'd'. The result will include all names such as 'Dave', 'David', 'DIANA', 'davia' as long as they start with 'D' or 'd'.

data filtered_data;
  set sample_data;
  where upcase(name) like 'D%';
run;
Filter: Ends with Some Text

The following SAS program creates a dataset that will only contain observations those where the 'name' variable ends with 'NA' or 'na'.

data filtered_data;
  set sample_data;
  where upcase(name) like '%NA';
run;
Filter Data based on First N Characters

The following SAS program filters data where the first two characters of the 'name' variable are 'DA' or 'da'. The SUBSTR function is used to extract the first two characters of the name variable, and the UPCASE function converts them to uppercase so that we can select both 'DA' and 'da'.

data filtered_data;
  set sample_data;
  where upcase(substr(name,1,2)) = 'DA';
run;
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 1 Response to "How to Filter Data in SAS"
  1. Your contents just hit the target everytime without any miss. Thanks for your simple but elegant presentation. Keep going. All the best.

    ReplyDelete
Next → ← Prev