SAS : Where Statement and Dataset Options

Deepanshu Bhalla 8 Comments

The WHERE statement is an alternative to the IF statement for subsetting (filtering) a data set.

Syntax of WHERE Statement :

WHERE condition ;

For example : where Age > 25; means selecting rows wherein the value of "age" variable is greater than 25.

Comparison Operators :
Symbolic Mnemonic Meaning Example
= EQ equals WHERE gender = ‘M’; or WHERE gender EQ ‘M’;
^= or ~= NE not equal WHERE salary NE . ;
> GT greater than WHERE salary GT 4500;
< LT less than WHERE salary LT 4500;
>= GE greater than or equal WHERE salary GE 4500;
<= LE less than or equal WHERE salary LE 4500;
in IN selecting multiple values WHERE country IN(‘US’ ’IN’);

Example 1 : Suppose you want to select only section A students. You know the variable Section contains information for students' sections.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul A 77
Priya B 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul A 95
Sahul C 84
Lahul C 44
;
run;

data readin1;
set readin;a
where Section EQ "A";
run;

where section EQ "A" => This would tell SAS to select only section A values.

You can also write where section = "A". This statement serves the same purpose.

The output is shown below :

WHERE statement in SAS
Logical Operators :
Symbolic Mnemonic Meaning Example
& AND Both conditions true WHERE gender =’M’ and age =1;
| OR Either condition true WHERE gender =’M’ or age =1;
~ or ^ NOT Reverse the statement WHERE country not IN(‘US’,’IN’);

Example 2 : Suppose you want to select section A and B students. You know the variable Section contains information for students' sections.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul A 77
Priya B 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul A 95
Sahul C 84
Lahul C 44
;
run;

data readin1;
set readin;
where Section IN ("A" "B");
run;

where section IN ("A" "B") => This would tell SAS to select section A and B values.

However, you can also write ...

The output is shown below :

BETWEEN-AND Operator : Between Two Numbers

Example 3 : Suppose you want to select scores whose values are greater than or equal to 50 and less than or equal to 75.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul A 77
Priya B 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul A 95
Sahul C 84
Lahul C 44
;
run;

data readin1;
set readin;
where Score between 50 and 75;
run;

where Score between 50 and 75 => This would tell SAS to select values through 50 and 75 (not 51 to 74).

This can also be written like :

where Score GE 50 and Score LE 75;


IS MISSING Operator : Selecting Missing Values

Example 4 : Suppose you want to select only those observations in which students did not fill their section information.

The dataset is modified to include missing values in SECTION variable.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul . 77
Priya . 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
;
run;
data readin1;
set readin;
where Section is missing;
run;

Where section is missing => This would tell SAS to select missing values for variable SECTION.

The output is shown below :

IS NOT MISSING Operator : Selecting Non-Missing Values

Example 5 : Suppose you want to select only those observations in which students filled their section information.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul . 77
Priya . 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
;
run;
data readin1;
set readin;
where section is not missing;
run;

Where section is not missing => This would tell SAS to select non-missing values.

The output is shown below :

The NOT operator can be used within WHERE statement in many ways :


1. where section is missing and score is not missing;
2. where not (score in (34,44,84));
3. where not (Score between 50 and 75);
4. where NOT(Section EQ "A"); 

CONTAINS Operator : Searching specific character

Example 6 : Suppose you want to select only those observations in which students' name contain 'hul'.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul . 77
Priya . 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
;
run;
data readin1;
set readin;
where name contains 'hul';
run;

where name contains 'hul'; => This would tell SAS to select observations having the values Rahul, Sahul and Lahul for the variable NAME.

Note : The CONTAINS operator is case sensitive.

The output is shown below :

Since the CONTAINS operator is case sensitive, where Name contains 'HUL' would not select any observation. The log for this statement is shown below :


LIKE Operator : Pattern Matching

The LIKE operator selects observations by comparing the values of a character variable to a specified pattern. It is case sensitive.

Example 7 : To select all students with a name that starts with the letter S.

There are two special characters available for specifying a pattern:

1. percent sign (%) - Wildcard Character

2. underscore ( _ ) - Fill in the blanks

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul . 77
Priya . 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
Pahulk A 81
;
run;
data readin1;
set readin;
where name like 'S%';
run;

where name like 'S%';
OR
where name like 'Sa%';

In this dataset, the above statements would produce the same result :

Examples :

1. where name like '_am';

You can also write this statement like : where name like '%am';

2. where name like '_ahu_';

This would not select PAHULK from the variable NAME.

3. where name like '_ahu__';

This would select PAHULK as double underscore (__) is stated.


Sounds-like Operator : Selecting sound like characters

Example 8 : To select names that sound like 'Ram'.

data readin;
input name $ Section $ Score;
cards;
Tom  A 84
Raj  A 80
Ram  B 71
Atul . 77
Priya . 45
Sandy A 67
Sam  A 57
David B 39
Wolf B 34
Rahul . 95
Sahul C 84
Lahul C 44
Pahulk A 81
Rama A 84
;
run;

data readin1;
set readin;
where name = *'Ram';
run;

The output is shown below :


WHERE = Data Set Option

1. In the example shown below, the WHERE= data set option is used to select only section A data.


data readin1 (where = (section ='A'));
set readin;
run;

2. The following example shows how to use WHERE= data set option in procedures


proc print data=readin (where=(section='A'));
run;

In this case, you can also use WHERE statement....


proc print data=readin;
where section='A';
run;

Difference between WHERE and IF conditions - WHERE Vs. IF .

Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

8 Responses to "SAS : Where Statement and Dataset Options"
  1. Hi Deepanshu!
    I have collected data on change in educational practices using a questionaire and wish to do the analysis. What all types of analysis can be done using SAS?

    ReplyDelete
  2. Hello ! What is the difference between below 2 snippets of code?
    /*code 1*/
    data readin1 (where = (section ='A'));
    set readin;
    run;

    /*code 2*/
    data readin1 ;
    set readin (where = (section ='A'));
    run;

    I think code2 is more efficient while dealing with large datasets as it subsets the dataset at reading stage only.

    ReplyDelete
    Replies
    1. yes your understanding is correct, instead of reading whole dataset and then filtered out , run your logic in options like code 2. It will only read where condition is satisfied.

      Delete
  3. Hey in one of your examples of where name = "_ahu__" would Rahul be selected because after l there is no other character available.

    ReplyDelete
  4. Can you send me sas back end process how data reads in sas

    ReplyDelete
  5. Hi may i know how to use between operator for character data.

    ReplyDelete
  6. I think we can't use between operator for character data because we can't range it from this to this..

    ReplyDelete
Next → ← Prev