In this tutorial, we will see how to use CASE WHEN statement in SAS using PROC SQL.
In PROC SQL, you can use the CASE WHEN
statement to perform conditional logic and manipulate data based on specified conditions. Let's understand the CASE WHEN statement in PROC SQL with examples.
In simple words, the CASE WHEN statement in SAS is similar to IF-ELSE statements in terms of conditional logic. Both allow you to perform different actions based on specified conditions.
Syntax of CASE WHEN statement
Below is the syntax of CASE WHEN statement in PROC SQL.
PROC SQL; SELECT column1, CASE WHEN condition1 THEN expression1 WHEN condition2 THEN expression2 ... ELSE expressionN END AS new_column FROM table; QUIT;
Example 1: Categorizing numerical values
Suppose you have users' data and you want to categorize them into "Minor" or "Adult". A minor is referred to as someone under the age of 18.
In this example, we used the CASE WHEN statement within the SELECT statement to create a new column called "AgeGroup" based on the specified conditions.
DATA Users; INPUT UserID $ Age; DATALINES; AA01 15 AA02 40 AA03 17 AA04 60 AA05 30 ; RUN;
PROC SQL; SELECT Age, CASE WHEN Age < 18 THEN 'Minor' WHEN Age >= 18 AND Age < 65 THEN 'Adult' ELSE 'Senior' END AS AgeGroup FROM Users; QUIT;
To store the result in a new table, use CREATE TABLE
statement. Please refer the code below where we have created a new table (dataset) called Users2.
PROC SQL; CREATE TABLE Users2 as SELECT Age, CASE WHEN Age < 18 THEN 'Minor' WHEN Age >= 18 AND Age < 65 THEN 'Adult' ELSE 'Senior' END AS AgeGroup FROM Users; QUIT;
Example 2: Group categorical variables
Here we are categorizing products based on their category as follows:
- If the category is 'Laptop' or 'TV', the product belongs to the 'Electronics' group.
- If the category is 'Clothing' or 'Beauty', the product belongs to the 'Fashion' group.
- Otherwise, the product belongs to the 'Home' group.
/* Create the Products dataset */ DATA Products; INPUT ProductID $ Category $12.; DATALINES; P001 Laptop P002 Clothing P003 Home P004 TV P005 Beauty ; RUN;
PROC SQL; CREATE TABLE Products_Grouped AS SELECT ProductID, Category, CASE WHEN Category IN ('Laptop', 'TV') THEN 'Electronics' WHEN Category IN ('Clothing', 'Beauty') THEN 'Fashion' ELSE 'Home' END AS ProductGroup FROM Products; QUIT;
Example 3: How to use CALCULATED component in CASE WHEN Statement
The calculated
component in SAS is used within a PROC SQL query to refer to a newly created variable for further calculation.
Here we are creating a new variable called "ExperienceBand" which categorizes employees based on their experience, indicating whether it is below 5 years, between 5 and 10 years, or 10 years and above.
data Employees; input DateofJoining :yymmdd10.; format DateofJoining yymmddd10.; datalines; 2019-04-25 2009-05-26 2023-03-02 2017-05-11 ; run;
PROC SQL; CREATE TABLE Employees2 AS SELECT DateofJoining, intck('year', DateofJoining, today()) AS Experience, CASE WHEN calculated Experience < 5 THEN 'Below 5' WHEN calculated Experience BETWEEN 5 AND 10 THEN '5-10' ELSE '10+' END AS ExperienceBand FROM Employees; QUIT;
- The expression
intck('year', DateofJoining, today()) AS Experience
calculates the number of years of experience for each employee by counting the number of years between their date of joining and the current date. The result is stored in a variable called "Experience." - The
CASE
statement evaluates the value of the "Experience" variable and assigns a corresponding band to the "ExperienceBand" variable. The categories are as follows:- If the calculated experience is less than 5 years, the "ExperienceBand" is set as 'Below 5'.
- If the calculated experience is between 5 and 10 years (inclusive), the "ExperienceBand" is set as '5-10'.
- If the calculated experience is greater than 10 years, the "ExperienceBand" is set as '10+'.
Example 4: How to use CASE WHEN with Aggregate Functions
Below is an example of using CASE WHEN with aggregate functions in PROC SQL.
We want to calculate the total count of products, the count of expensive products (those with a price greater than 100), and the average price of the expensive products for each category.
data products; length category $20.; input category $ price; datalines; Electronics 150 Electronics 80 Clothing 120 Clothing 50 Electronics 200 Clothing 90 ; run;
PROC SQL; SELECT category, COUNT(*) AS total_count, SUM(CASE WHEN price > 100 THEN 1 ELSE 0 END) AS count_expensive, AVG(CASE WHEN price > 100 THEN price END) AS avg_expensive_price FROM products GROUP BY category; QUIT;
The CASE WHEN statement is used inside the SUM and AVG functions to count and calculate the average of the expensive products. If the price is greater than 100, it returns 1. Otherwise, it returns 0. The purpose of the SUM(CASE WHEN ...) is to count the number of rows that meet a specific condition. In this case, it counts the number of times the price column is greater than 100 by summing the values of 1 generated by the CASE WHEN statement. By grouping the results by category, we get the count of expensive products, and average price of expensive products for each category.
Share Share Tweet