PROC SQL Joins: A Step-by-Step Guide

This tutorial is designed for beginners who want to get started with PROC SQL Joins. It explains different types of joins and the equivalent data step merge code for these joins. This tutorial includes several examples to help you practice and become proficient in PROC SQL Joins.

Lesson 1 : Proc SQL Fundamentals with 20 Examples

Advantages of PROC SQL Joins over Data Step Merging

PROC SQL joins do not require sorted tables (data sets), while you need to have two data sets sorted when using Merge Statement
PROC SQL joins do not require that common variable have the same name in the data sets you are joining, while you need to have common variable name listed in BY option when using MERGE statement.
PROC SQL joins can use comparison operators other than the equal sign (=).
PROC SQL can handle many to many relationship well whereas Data Step Merge do not.

Table of Contents

1. Cross Join / Cartesian product

The Cartesian product returns a number of rows equal to the product of all rows (observations) in all the tables (data sets) being joined. For example, if the first table has 10 rows and the second table has 10 rows, there will be 100 rows (10 * 10) in the merged table (data set).

Create Sample Datasets

Let's create the two sample datasets that will be used in this tutorial to explain how to use JOINS in SAS.

Data A;
Input ID Name$ Height;
cards;
1 A 1
3 B 2
5 C 2
7 D 2
9 E 2
;
run;

Data B;
Input ID Name$ Weight;
cards;
2 A 2
4 B 3
5 C 4
7 D 5
;
run;

The following code shows how to apply Cartesian Product using PROC SQL in SAS.

PROC SQL;
Create table dummy as
Select * from A as x cross join B as y;
Quit;

Cartesian or Cross Product

Key takeaways

Since the first data set has 5 rows and the second data set has 4 rows, there are 20 rows (5 * 4) in the merged data set.
The 'as' keyword (aka alias) is used to assign a table a temporary name.
Since the ID values of the first data set is different than the ID values of the second data set, the ID given in the joined data set is misleading.

2. Inner Join

The INNER JOIN returns rows common to both tables (data sets). If we select * keyword in the query, the final merged file would have number of columns equal to (Common columns in both the data sets + uncommon columns from data set A + uncommon columns from data set B).

Venn Diagram : Inner Join

PROC SQL;
Create table dummy as
Select * from A as x, B as y
where x.ID = y.ID;
Quit;

Inner Join

Explanation

Since the above case is of type INNER JOIN, it returns values 5 and 7 from the variable ID in the combined table as these two values are common in both the datasets A and B

Another way to write the above code -

PROC SQL;
Create table dummy as
Select * from A as x inner join B as y
On x.ID = y.ID;
Quit;

Both the codes produce same result.

Inner Join : Data Step Code

Data dummy;         
Merge A (IN = X) B (IN=Y);
by ID;
If X and Y;
run;

3. Left Join

The LEFT JOIN returns all rows from the left table with the matching rows from the right table.

Left Join Venn Diagram

PROC SQL;
Create table dummy as
Select * from A as x left join B as y
On x.ID = y.ID;
Quit;

Left Join

Explanation

Since the above case is of type LEFT JOIN, it returns all rows from the table (dataset) A with the matching rows from the dataset B.

Left Join : Data Step Code

Data dummy;         
Merge A (IN = X) B (IN=Y);
by ID;
If X ;
run;

4. Right Join

The RIGHT JOIN returns all rows from the right table that do not match any row with the left-hand table, and the matched rows from the left-hand table.

Right Join Venn Diagram

PROC SQL;
Create table dummy as
Select * from A as x right join B as y
On x.ID = y.ID;
Quit;

Right Join

Note : The right-hand table ID values are missing in the merged table. To add the missing right hand table ID values to a right join, you can use the SQL COALESCE function. The COALESCE function returns the first non-missing argument.

proc sql;
create table dummy as
select coalesce (x.ID,y.ID) as ID, coalesce (x.name,y.name) as name,height,weight
from a as x right join b as y
on x.id = y.id;
quit;

Right Join with Coalesce

Explanation

Since the above case is of type RIGHT JOIN, it returns all rows from the table (dataset) B with the matching rows from the dataset A.

Right Join : Data Step Code

Data dummy;         
Merge A (IN = X) B (IN=Y);
by ID;
If Y ;
run;

5. Full Join

The FULL JOIN returns all rows from the left table and from the right table.

Full Join

Key takeaway : The FULL JOIN suffers the same difficulty as the RIGHT JOIN. Namely, the common variable values are lost from the right-hand data set. The COALESCE function can solve this difficulty.

proc sql;
create table dummy as
select coalesce (x.ID,y.ID) as ID, coalesce (x.name,y.name) as name,height,weight
from a as x full join b as y
on x.id = y.id;
quit;

Full Join with Coalesce

Explanation

Since the above case is of type FULL JOIN, it returns all rows from the table (dataset) A and B.

Full Join : Data Step Code

Data dummy;         
Merge A B;
by ID;
run;

By default, MERGE statement performs full join so IN variables are not required.

One to Many Relationship : Duplicate Values in Primary Key

Join : One to Many Relationship

SQL Join will return Cartesian Product if duplicate values are found in primary key (common column). In this example, it returns cartesian product of missing values in the "ID" column. Since dataset A has 3 missing values and dataset B has 1 missing value, there are 3 (3*1) missing values in the merged dataset.

Data Step MERGE statement will return the maximum number of missing values in the primary key in both the tables. In this case, it would return 3 missing values i.e. max(3,1).

Example 2 : One to Many Relationship

PROC SQL : One to Many Relationship

How about six rows of value 5 in the combined table?

When duplicates, PROC SQL returns cartesian product i.e. product of both the tables. In dataset A, we have 2 5s and 3 5s in dataset B. So, it returns (2x3 = 6) 5s in the combined table.

How to refer to permanent library in PROC SQL Joins

In PROC SQL, you can refer to permanent libraries when performing joins by specifying the library and table names - library_name.table_name. See the example below.

PROC SQL;
Create table dummy as
Select * from readin.A as x left join readin.B as y
On x.ID = y.ID;
Quit;

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 43 Responses to "PROC SQL Joins: A Step-by-Step Guide"

UnknownJune 30, 2014 at 8:21 AM
Its Very Useful... Thanks a lot sharing..!
AjayJune 30, 2014 at 9:29 AM
Well explained! Keep up the good work :-)
UnknownJanuary 12, 2015 at 11:55 PM
Really useful :-)
abhiApril 3, 2015 at 2:49 PM
Thanks Deepanshu for sharing this valuable knowledge. Its really a treasure for beginners like me.
AnonymousApril 22, 2015 at 10:39 AM
Well explained..Its really useful especially for beginners. Thank u for sharing.
UnknownApril 25, 2015 at 1:21 AM
really very helpful ! i have one doubt explain the ten procedures which were used in SQL?
AnonymousMay 28, 2015 at 5:24 AM
Very Helpful and easy to understand
Thanks!
UnknownAugust 13, 2015 at 10:57 PM
Extremely helpful, thank you very much !
Moumita ChakrabortyAugust 22, 2015 at 9:08 PM
Really very helpful but I found a small typo. In the full join SQL code wont it be 'full join' instead of 'right join'?
UnknownDecember 9, 2015 at 10:25 AM
Love your site...I'm glad that I found this site...
AnonymousDecember 14, 2015 at 12:13 AM
Excellent! good explanation with codes . . . very useful!
Olga KozlovaJanuary 16, 2016 at 2:57 PM
thank you so much!!
Sanjeev kuridiFebruary 3, 2016 at 2:34 AM
Loved your explanation..Very useful for SAS beginners..Keep going
mansooraliMarch 1, 2016 at 4:31 AM
very usefull information..
UnknownJune 24, 2016 at 12:42 AM
Hi Deepanshu,

in all the joins it is mentioned

(Common columns in both the data sets + uncommon columns from data set A + uncommon columns from data set B).

is this true????
ParveenSharmaJuly 21, 2016 at 6:03 AM
full join code should be like this
PROC SQL;
Create table dummy as
Select coalesce (x.ID,y.ID) as ID,
coalesce (x.name,y.name) as name,height,weight
from A as x full join B as y
on x.ID = y.ID;
Quit;

ParveenSharmaJuly 21, 2016 at 6:07 AM
full join Data Step code should be

Data dummy;
Merge A (IN = X) B (IN=Y);
by ID;
if x=1 or y=1;
run;
RAOSeptember 21, 2016 at 10:01 AM
Well explained about joins. Very use full for the beginners.
Deepanshu BhallaDecember 19, 2016 at 10:33 AM
Did you read the first paragraph 'Advantages of PROC SQL Joins over Data Step Merging'? That's answer of your first question. Second question is also discussed in the article. If you have duplicates in any of the tables, you would get cartesian product of the duplicate records.
chaithraFebruary 6, 2017 at 4:35 AM
Was very useful.!! Well explained.
AnilFebruary 20, 2017 at 9:18 AM
What about Outer joins? How it will be prepared using PROC SQL?
AnonymousMarch 3, 2017 at 9:16 PM
This is really great website. I'm glad I found it.
UnknownOctober 12, 2017 at 12:57 AM
It helpful for my understanding, Thank you
AnonymousMarch 12, 2018 at 8:11 AM
How many datasets can be merged in proc sql (sas 9.4)?

AnonymousMarch 20, 2018 at 4:44 PM
I love your site to refer any concept in SAS or SQL
UnknownSeptember 24, 2018 at 7:57 PM
Superb description about SAS
Tom LittleNovember 12, 2018 at 3:43 PM
Great, thanks for posting this.
UnknownNovember 15, 2018 at 11:10 PM
excellent explanation
UnknownDecember 2, 2018 at 3:24 AM
Hi Deepanshu,
I appreciate your work and thanks for creating a decent tool. If you'll mention the rules that would be great. Also add interesting scenarios bit logical concepts.
and Syntax of every function statement.
Thanks
UnknownFebruary 27, 2019 at 2:04 AM
I dont think ur explaination about inner join is accurate, ud better compare that with Proc SQL outter join.
Ashutosh GuptaSeptember 6, 2020 at 12:20 AM
Why missing values are coming in right join but not in left join?
MIchałSeptember 20, 2020 at 2:44 AM
Great Job!
FahOctober 23, 2023 at 5:42 PM
Hi, Deepanshu! It's been years since you last answered to any comments here, so this is a long shot. What if I want to use PROC SQL to join various tables that are listed in sequence? For instance, BIRTHS12, BIRTHS13... BIRTHS21? Is there a way to join tables in a range? [BIRTHS12-BIRTHS21]?