SAS : Many to Many Merge

Deepanshu Bhalla 13 Comments

In SAS, many-to-many merges are handled very differently via Data Step MERGE and PROC SQL JOIN.

Let's take an example -

Suppose you have two data sets. You want to merge both the data sets but there are duplicate values in the common variable (ie. primary key) of any or both of the datasets.

SAS Many to Many Merge
Many to Many Merging

Data Step : Many to Many Merge

The DATA step Merge does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product of matching observations. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records.

Data Step MERGE does not create a cartesian product in case of a many-to-many relationship. It will return number of records for a duplicate value equal to maximum number of the duplicate value in both the table.
SAS Code - 
data dat1;
input ID Info;
cards ;
1 3123
1 1234
2 7482
2 8912
3 1284
;
run;

data dat2;
input ID Info2;
cards ;
1 4444
1 5555
1 8989
2 9099
2 8888
3 8989
;
run;

data combined;
merge dat1 dat2 ;
by ID;
run;
Output : Merge
Note : In this example, we have 2 1s in dat1 and 3 1s in dat2. The maximum number of 1s in both the tables is 3. So it would return 3 1s in the merged dataset.

PROC SQL JOIN : Many to Many Merge

PROC SQL JOIN creates all possible combinations of matching observations in case of a many-to-many relationship. Cartesian product is a collection of all pairs of two given sets. For example, In ID variable, there are 2 1's in dat1 dataset and 3 1's in dat2 dataset, the cartesian product would be (3*2 = 6 Observations) in the final result.

proc sql noprint;
create table combined2 as
select * from dat1 a
join dat2 b
on a.ID = b.ID;
quit;
See the output shown in the image below -
SAS : Many to Many Merging
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

13 Responses to "SAS : Many to Many Merge"
  1. super work..keep continuing..

    ReplyDelete
  2. why the sql code says left join? Shouldn't it be cross join?

    ReplyDelete
    Replies
    1. Here we are talking about all possible combinations of matching observations (not all possible combinations of all the values). Primary key exists here. Hope it helps.

      Delete
  3. Hi Deepanshu,

    Superb work & many many thanks... Can I get a detail explanation on different kinds of join (left, right, full, inner, cross) with PROC SQL as well as DATA MERGE. Again keep the good work up :)

    ReplyDelete
  4. Hi,

    can you please explain how it writes the value 1234 in info for last 1 in output : Merge ?

    ReplyDelete
  5. Hi
    Have a query
    I have 10 observations in dataset A and 20 observations in dataset B
    How to find whether the 10 observations in dataset A are present in dataset B

    ReplyDelete
  6. What if i did not want all 6 obs shown in example but needed only the ones in first dataset merged with the variables in 2nd dataset without extra observations? What i mean is if I have 10 obs(with repeats) and 6 variables in dataset A and 12 obs(with repeats) and 3 variables in dataset B; i want the product to be Dataset C with 10 obs from A but will all 9 variables coming from A and B?

    ReplyDelete
  7. Can u please tell me in case of many to many match which would be more perfect Data Step Merge or Proc SQL join and why????

    ReplyDelete
  8. What is the difference in Cartesian Product and Cross join. If they are one and the same, why haven't you used keyword cross in the above query?

    ReplyDelete
Next → ← Prev