Intermediate Proc SQL Tutorial

Live Online Training : SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects
- Scenario Based Questions
- Job Placement Assistance
- Get 10% off till Oct 26, 2017
- Batch starts from October 28, 2017

This article explains practical application of SQL queries with examples.

Example 1 : Count Cases Where Condition is TRUE

The input data is shown below. Suppose you are asked to calculate the number of Ys and Ns of column Z by column X.
Intermediate SQL Tutorial

data xyz;
input x y$ z$;
cards;
1 23 Y
1 24 N
1 25 Y
2 21 Y
2 22 Y
3 25 N
3 36 Y
;
run;
proc sql noprint;
create table tt as
select x,
sum(case when z= "Y" then 1 else 0 end) as z_Y,
sum(case when z= "N" then 1 else 0 end) as z_N
from xyz
group by x;
quit;

Example 2 : Creating trend variables
Sample Dataset
data example1;
input ID Months Revenue Balance;
cards;
101 1 3 90
101 2 33 68
101 3 22 51
101 4 3 90
101 5 33 65
101 6 22 54
102 1 100 18
102 2 58 62
102 3 95 97
102 4 100 18
102 5 58 65
102 6 95 92
;
Task : Calculate total revenue and total balance accumulated in the first 3 months and how much the data spreads in the first 3 months time period.
proc sql noprint;
create table output1 as
select ID,
sum(case when 1 <= Months <= 3 then Revenue else . end) as Rev_1_3,
sum(case when 1 <= Months <= 3 then Balance else . end) as Bal_1_3,
var(case when 1 <= Months <= 3 then Revenue else . end) as Var_Rev_1_3,
var(case when 1 <= Months <= 3 then Balance else . end) as Var_Bal_1_3
from example1
group by ID;
quit;

Output : 
Output
Detailed Code for Multiple Variables

Example 3 : Check the Status Flag between the start date and end date

DATA dates;
INPUT ID Period : date9. Status $;
FORMAT Period date9.;
CARDS;
1 13may2000 Y
1 17oct1999 Y
1 03feb2001 N
1 28feb2001 N
2 10nov2000 Y
2 25apr2001 N
3 03jun1997 Y
3 14jan2001 Y
3 24oct1998 N
3 27aug2000 N
;
RUN;
proc sql;
select a.ID , a.Period as Start_Date, b.Period as End_Date,
a.Status as Status_Start, b.Status as Status_End
from dates a left join
(select * from dates group by ID having Period = max(Period)) b
on a.ID = b.ID
group by a.ID
having a.Period = min(a.Period);
quit;

Example 4 : Calculate Percentage Change between the first and last row
data temp;
input ID time $ x1-x3;
cards;
1 Y1 85 85 86
1 Y2 80 79 70
1 Y3 78 77 87
2 Y1 79 79 79
2 Y2 83 83 85
;
run; 
data temp2;
set temp;
retain Serial 0;
If first.ID then Serial = 1;
else Serial = Serial + 1;
by ID;
run;
proc sql noprint;
create table t2 as
select a.ID, (a.x1-b.x1)/b.x1 *100 as change
from temp2 a left join
(select ID, x1 from temp2
 group by ID
 having serial = min(serial)) b
on a.ID = b.ID
group by a.ID
having serial = max(serial);
quit;

Example 5 : Extract First and Last Observation within a Group

PROC SQL : Alternative to First. Statement
proc sql;
create table output2 (drop=n) as
select *, monotonic() as n
from example1
group by ID
having min(n) = n;
quit;
PROC SQL : Alternative to Last. Statement
proc sql;
create table output2 (drop=n) as
select *, monotonic() as n
from example1
group by ID
having max(n) = n;
quit;
Example 6 : Self Join
Suppose you have data for employees. It comprises of employees' name, ID and manager ID. You need to find out manager name.

Proc SQL : Self Join
data example2;
input Name $ ID ManagerID;
cards;
Smith 123 456
Robert 456 .
William 222 456
Daniel 777 222
Cook 383 222
;
run;
SQL Query : Self Join
proc sql;
create table want as
select a.*, b.Name as Manager
from example2 as a left join example2 as b
on a.managerid = b.id;
quit;
Example 7 : Capping Extreme Values
Suppose you need to cap values of a column.

data have1;
input x y z;
cards;
101  1 10
102  2 20
103  3 45
104  1 23
105  2 42
106  3 46
107  1 61
109  2 22
110  3 28
111  1 30
112  2 32
113  3 39
;
run;
proc sql noprint;
create table output2 (drop=z rename= (z1=z))  as
select *, case when z >=  40 then 40 else z
end as z1 from have1;
quit;
First cap values of column z and then sum all the z values by column y.
proc sql;
select sum(z1) as OutCol
from (select *, case when z >=  40 then 40 else z
end as z1 from have1)
group by y;
quit;

Example 8 : Select All Excluding 1 Variable
proc sql;
create table want (drop=x) as
select a.*,b.*
from dat1 a, dat2 (rename=(id=x)) b
where a.id = b.x;
quit;
Example 9 : Sub Query - ANY and ALL Operators

1. ANY operator selects values that pass the comparison test with any of the values that are returned by the sub-query. 

2. ALL operator selects values that pass the comparison test with all of the values that are returned by the sub-query. 
data jansale;
input sale id;
cards;
100 1
105 2
108 3
110 4
;
run;
data febsale;
input sale id;
cards;
120 1
105 2
118 3
117 4
;
run;
proc sql;
select *
from jansale
where sale < all (select sale from febsale);
quit;
proc sql;
select *
from jansale
where sale < any (select sale from febsale);
quit;

Proc SQL Tutorials : 15 Proc SQL Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

2 Responses to "Intermediate Proc SQL Tutorial"

  1. This comment has been removed by the author.

    ReplyDelete
  2. what is the use of 'any' then...... 'all' will return least value of sale between two sale values as per the requirement. 'any' will compare any value of sale from jansale to any vlaue of febsale. and when 'any' sale in jansale is less than 'any' sale in febsale it ll show up in output.what wwe will get from the output?? !!!!!!!!!!!

    ReplyDelete

Next → ← Prev