This article explains practical applications of SQL queries using PROC SQL along with examples. PROC SQL is a SAS programming procedure that allows users to execute SQL queries within SAS programs. The article showcases how SQL queries can be applied in real-world scenarios using PROC SQL.
Example 1 : Count Cases Where Condition is TRUE
The input data is shown below. Suppose you are asked to calculate the number of Ys and Ns of column Z by column X.
Intermediate PROC SQL Tutorial |
data xyz; input x y$ z$; cards; 1 23 Y 1 24 N 1 25 Y 2 21 Y 2 22 Y 3 25 N 3 36 Y ; run;
proc sql noprint; create table tt as select x, sum(case when z= "Y" then 1 else 0 end) as z_Y, sum(case when z= "N" then 1 else 0 end) as z_N from xyz group by x; quit;
Example 2 : Creating trend variables
Sample Dataset |
data example1; input ID Months Revenue Balance; cards; 101 1 3 90 101 2 33 68 101 3 22 51 101 4 3 90 101 5 33 65 101 6 22 54 102 1 100 18 102 2 58 62 102 3 95 97 102 4 100 18 102 5 58 65 102 6 95 92 ;Task : Calculate total revenue and total balance accumulated in the first 3 months and how much the data spreads in the first 3 months time period.
proc sql noprint; create table output1 as select ID, sum(case when 1 <= Months <= 3 then Revenue else . end) as Rev_1_3, sum(case when 1 <= Months <= 3 then Balance else . end) as Bal_1_3, var(case when 1 <= Months <= 3 then Revenue else . end) as Var_Rev_1_3, var(case when 1 <= Months <= 3 then Balance else . end) as Var_Bal_1_3 from example1 group by ID; quit;Output :
Output |
Example 3 : Check the Status Flag between the start date and end date
DATA dates; INPUT ID Period : date9. Status $; FORMAT Period date9.; CARDS; 1 13may2000 Y 1 17oct1999 Y 1 03feb2001 N 1 28feb2001 N 2 10nov2000 Y 2 25apr2001 N 3 03jun1997 Y 3 14jan2001 Y 3 24oct1998 N 3 27aug2000 N ; RUN;
proc sql; select a.ID , a.Period as Start_Date, b.Period as End_Date, a.Status as Status_Start, b.Status as Status_End from dates a left join (select * from dates group by ID having Period = max(Period)) b on a.ID = b.ID group by a.ID having a.Period = min(a.Period); quit;
Example 4 : Calculate Percentage Change between the first and last row
data temp; input ID time $ x1-x3; cards; 1 Y1 85 85 86 1 Y2 80 79 70 1 Y3 78 77 87 2 Y1 79 79 79 2 Y2 83 83 85 ; run;
data temp2; set temp; retain Serial 0; If first.ID then Serial = 1; else Serial = Serial + 1; by ID; run;
proc sql noprint; create table t2 as select a.ID, (a.x1-b.x1)/b.x1 *100 as change from temp2 a left join (select ID, x1 from temp2 group by ID having serial = min(serial)) b on a.ID = b.ID group by a.ID having serial = max(serial); quit;
Example 5 : Extract First and Last Observation within a Group
PROC SQL : Alternative to First. Statement
proc sql; create table output2 (drop=n) as select *, monotonic() as n from example1 group by ID having min(n) = n; quit;PROC SQL : Alternative to Last. Statement
proc sql; create table output2 (drop=n) as select *, monotonic() as n from example1 group by ID having max(n) = n; quit;
Example 6 : Self Join
Suppose you have data for employees. It comprises of employees' name, ID and manager ID. You need to find out manager name.
Proc SQL : Self Join |
data example2; input Name $ ID ManagerID; cards; Smith 123 456 Robert 456 . William 222 456 Daniel 777 222 Cook 383 222 ; run;SQL Query : Self Join
proc sql; create table want as select a.*, b.Name as Manager from example2 as a left join example2 as b on a.managerid = b.id; quit;
Example 7 : Capping Extreme Values
Suppose you need to cap values of a column.
data have1; input x y z; cards; 101 1 10 102 2 20 103 3 45 104 1 23 105 2 42 106 3 46 107 1 61 109 2 22 110 3 28 111 1 30 112 2 32 113 3 39 ; run; proc sql noprint; create table output2 (drop=z rename= (z1=z)) as select *, case when z >= 40 then 40 else z end as z1 from have1; quit;First cap values of column z and then sum all the z values by column y.
proc sql; select sum(z1) as OutCol from (select *, case when z >= 40 then 40 else z end as z1 from have1) group by y; quit;
Example 8 : Select All Excluding 1 Variable
proc sql; create table want (drop=x) as select a.*,b.* from dat1 a, dat2 (rename=(id=x)) b where a.id = b.x; quit;
Example 9 : Sub Query - ANY and ALL Operators
- ANY operator: Selects values that pass the comparison test with any of the values returned by the sub-query.
- ALL operator: Selects values that pass the comparison test with all of the values returned by the sub-query.
data jansale; input sale id; cards; 100 1 105 2 108 3 110 4 ; run; data febsale; input sale id; cards; 120 1 105 2 118 3 117 4 ; run;
proc sql; select * from jansale where sale < all (select sale from febsale); quit; proc sql; select * from jansale where sale < any (select sale from febsale); quit;
hello , in code 2 , what does revenue else . end means ? Also i am unable to download detaled code file , as it is password protected .can you please share the password.
ReplyDeleteUnder "Detailed code for multiple variables" the link shared is downloading the excel file which is password protected .. Kindly let us know the password
ReplyDelete