SAS : WHERE vs. IF Statements

Deepanshu Bhalla 38 Comments

The WHERE statement is an alternative to IF statement when it comes to subsetting a dataset. It is important to know the difference between these two statements.

Difference between WHERE and IF statements

WHERE statement wins against IF statement in the following cases :

  1. The WHERE statement can be used in procedures to subset data while IF statement cannot be used in procedures.

    Look at the log of WHERE and IF statements shown below :
    WHERE statement in log

    IF statement in log

  2. WHERE can be used as a data set option while IF cannot be used as a data set option.

    WHERE as a data set option in log

    IF as a data set option in log

  3. The WHERE statement is more efficient than IF statement. It tells SAS not to read all observations from the data set.

    Look at the LOG (shown below) after using WHERE statement, you only see a count of the number of observations that meet the criteria in the WHERE statement.

    Only 6 observations were read from the dataset READIN. In actual, the dataset READIN contains 14 observations.

    WHERE statement efficiency in log

    All 14 observations are read, and the 6 that meet the IF criteria are placed in the new data set.

    IF statement efficiency in log

    NOTE: Both statements produced the same result.

    The where clause sends only those records that meet condition to PDV, the IF statement sends all the records to PDV and removes the records that do not meet condition before they get sent to the output buffer.
  4. The WHERE statement can be used to search for all similar character values that sound alike while IF statement cannot be used.

    For example, you want to filter out all the names that sound alike 'Sam'.

The IF statement outperforms the WHERE statement in the following scenarios:

1. When reading data using INPUT statement.

IF Statement

IF Statement can be used when specifying an INPUT statement.







WHERE Statement
WHERE statement can not be used when specifying an input statement.




2. When it is required to execute multiple conditional statements

Suppose, you have data for college students’ mathematics scores. You want to rate them on the basis of their scores.

Conditions :
1. If a score is less than 40, create a new variable named “Rating” and give “Poor” rating to these students.
2. If a score is greater than or equal to 40 but less than 75, give “Average” rating to these students.
3. If a score is greater than or equal to 75 but less than or equal to 100, give “Excellent” rating to these students.

This can be easily done using IF-THEN-ELSE IF statements. However, WHERE statement requires variables to exist in the data set.


3. When it is required to use newly created variables in data set.

IF statement can be applied on a newly created variable whereas WHERE statement cannot be applied on a newly created variable. In the below example, IF statement doesn't require variables to exist in the READIN data set while WHERE statement requires variable to exist in the data set.



4. When to Use _N_, FIRST., LAST. Variables

WHERE statement cannot be applied on automatic variables such as _N_, First., Last. Variables. While IF statement can be applied on automatic variables.
Difference between IF and WHERE Conditions

Difference : WHERE and IF when merging data sets
WHERE statement applies the subset condition before merging the data sets, Whereas, IF statement applies the subset condition after merging the data sets.

Create 2 Sample Datasets for Merging
data ex1;
input ID Score;
cards;
1 25
2 28
3 35
4 45
;
run;
data ex2;
input ID Score;
cards;
1 95
2 97
;
run;

Merge with WHERE Condition
data comb;
merge ex1 ex2;
by ID;
where score <= 30;
run;
It returns 2 observations. WHERE condition applied before merging. It applies separately on each of the 2 data sets before merging.

Merge with IF Condition
data comb;
merge ex1 ex2;
by ID;
if score <= 30;
run;
It returns 0 observation as IF condition applied after merging. Since there is no observation in which value of score is less than or equal to 30, it returns zero observation.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

38 Responses to "SAS : WHERE vs. IF Statements"
  1. Very Informative.

    ReplyDelete
  2. Very helpful for beginners like me

    ReplyDelete
  3. thanks for the information

    ReplyDelete
  4. Really worth reading information

    ReplyDelete
  5. While merging the datasets if the subsetting variable is not available in all the datasets listed in the merge statement we should use if subsetting statement instead of where..

    ReplyDelete
  6. Special where operators like is null, is missing, contains,like , between and cannot be used along with if statement.

    ReplyDelete
  7. Very good points. Now the differences between these two options are clear. Thanks

    ReplyDelete
  8. useful information and keep some more logic based information thanks

    ReplyDelete
  9. very useful. thanks

    ReplyDelete
  10. Great ... Keep up the good work

    ReplyDelete
    Replies
    1. Thank you for your appreciation. Cheers!

      Delete
    2. Please give me some tasks about where if and daytime formats

      Delete
  11. best site to learn sas.. great job!!

    ReplyDelete
  12. Great job and nicely documented the differences

    ReplyDelete
  13. Very precise and briefly explained. Thank you

    ReplyDelete
  14. Very precise and briefly explained. Thank you

    ReplyDelete
  15. data a;
    input id hiredate mmddyy10.;
    format hiredate mmddyy10.;
    cards;
    101 01/01/2015
    102 01/20/2015
    103 02/20/2015
    ;
    run;
    data aa;
    set a;
    where hiredate = '20JAN2015'd ;
    run;
    why we can not '01/20/2015'd;

    ReplyDelete
    Replies
    1. 'd format is available for date9 or date7 formats and not for any other date format.

      Delete
  16. Thanks a lot for such a useful documentation. Clarifies every bit of doubt.

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. Hi Diptanshu,
    Very-2 Nice and helpful topics and interview question too. Request you could you please upload PDF for these topics and interview questions too so that can be read in offline as well. Please let us know your input on this suggestion.

    Thanks,
    Sandeep

    ReplyDelete
  19. Thank you for a very clear difference.

    ReplyDelete
  20. While merging by ID how are the scores merged? I mean, for a particular ID Which Score is going to come in the output window? I tried it, but got few from one set while few from another. Please clarify on this!!

    ReplyDelete
  21. i need more differences like this

    ReplyDelete
  22. Very clear and useful. Thank you very much!

    ReplyDelete
  23. WOW, extremely useful and simple to learn.

    ReplyDelete
  24. Thank you very much sir.

    ReplyDelete
Next → ← Prev