SAS : Many to Many Merge

In SAS, many-to-many merges are handled very differently via Data Step MERGE and PROC SQL JOIN.

Let's take an example -

Suppose you have two data sets. You want to merge both the data sets but there are duplicate values in the common variable (ie. primary key) of any or both of the datasets.

Many to Many Merging

Data Step : Many to Many Merge

The DATA step Merge does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product of matching observations. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records.

Data Step MERGE does not create a cartesian product in case of a many-to-many relationship. It will return number of records for a duplicate value equal to maximum number of the duplicate value in both the table.

SAS Code -

data dat1;
input ID Info;
cards ;
1 3123
1 1234
2 7482
2 8912
3 1284
;
run;

data dat2;
input ID Info2;
cards ;
1 4444
1 5555
1 8989
2 9099
2 8888
3 8989
;
run;

data combined;
merge dat1 dat2 ;
by ID;
run;

Output : Merge

Note : In this example, we have 2 1s in dat1 and 3 1s in dat2. The maximum number of 1s in both the tables is 3. So it would return 3 1s in the merged dataset.

PROC SQL JOIN : Many to Many Merge

PROC SQL JOIN creates all possible combinations of matching observations in case of a many-to-many relationship. Cartesian product is a collection of all pairs of two given sets. For example, In ID variable, there are 2 1's in dat1 dataset and 3 1's in dat2 dataset, the cartesian product would be (3*2 = 6 Observations) in the final result.

proc sql noprint;
create table combined2 as
select * from dat1 a
join dat2 b
on a.ID = b.ID;
quit;

See the output shown in the image below -

SAS : Many to Many Merging

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 13 Responses to "SAS : Many to Many Merge"

AnonymousJanuary 17, 2016 at 1:15 AM
super work..keep continuing..
AnonymousMarch 7, 2016 at 12:49 AM
Nice work!!!
AmritaJune 23, 2016 at 12:46 PM
why the sql code says left join? Shouldn't it be cross join?
AnonymousJanuary 11, 2017 at 12:21 PM
Hi Deepanshu,

Superb work & many many thanks... Can I get a detail explanation on different kinds of join (left, right, full, inner, cross) with PROC SQL as well as DATA MERGE. Again keep the good work up :)
AnonymousJune 4, 2017 at 7:00 PM
Hi,

can you please explain how it writes the value 1234 in info for last 1 in output : Merge ?
AnonymousAugust 8, 2017 at 11:34 AM
Hi
Have a query
I have 10 observations in dataset A and 20 observations in dataset B
How to find whether the 10 observations in dataset A are present in dataset B
AnonymousMay 14, 2019 at 10:56 AM
What if i did not want all 6 obs shown in example but needed only the ones in first dataset merged with the variables in 2nd dataset without extra observations? What i mean is if I have 10 obs(with repeats) and 6 variables in dataset A and 12 obs(with repeats) and 3 variables in dataset B; i want the product to be Dataset C with 10 obs from A but will all 9 variables coming from A and B?
UnknownJune 21, 2019 at 10:27 PM
Can u please tell me in case of many to many match which would be more perfect Data Step Merge or Proc SQL join and why????
UnknownNovember 17, 2019 at 8:57 AM
What is the difference in Cartesian Product and Cross join. If they are one and the same, why haven't you used keyword cross in the above query?
Balaji ChDecember 29, 2019 at 10:10 AM
Well explained..