SAS : Many to Many Merge

In SAS, many-to-many merges are handled very differently via Data Step MERGE and PROC SQL JOIN.

Let's take an example -

Suppose you have two data sets. You want to merge both the data sets but there are duplicate values in the common variable (ie. primary key) of any or both of the datasets.
Many to Many Merging

Data Step Merge

The DATA step Merge does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product of matching observations. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records.
Data Step MERGE does not create a cartesian product in case of a many-to-many relationship. It will return number of records for a duplicate value equal to maximum number of the duplicate value in both the table.
SAS Code - 
data dat1;
input ID Info;
cards ;
1 3123
1 1234
2 7482
2 8912
3 1284
;
run;


data dat2;
input ID Info2;
cards ;
1 4444
1 5555
1 8989
2 9099
2 8888
3 8989
;
run;

data combined;
merge dat1 dat2 ;
by ID;
run;
Output : Merge
Note : In this example, we have 2 1s in dat1 and 3 1s in dat2. The maximum number of 1s in both the tables is 3. So it would return 3 1s in the merged dataset.

PROC SQL JOIN

PROC SQL JOIN creates all possible combinations of matching observations in case of a many-to-many relationship. Cartesian product is a collection of all pairs of two given sets. For example, In ID variable, there are 2 1's in dat1 dataset and 3 1's in dat2 dataset, the cartesian product would be (3*2 = 6 Observations) in the final result.
proc sql noprint;
create table combined2 as
select * from dat1 a
join dat2 b
on a.ID = b.ID;
quit;
See the output shown in the image below -
SAS : Many to Many Merging

SAS Tutorials : 100 Free SAS Tutorials

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

5 Responses to "SAS : Many to Many Merge"

  1. super work..keep continuing..

    ReplyDelete
  2. Nice work!!!

    ReplyDelete
  3. why the sql code says left join? Shouldn't it be cross join?

    ReplyDelete
    Replies
    1. Here we are talking about all possible combinations of matching observations (not all possible combinations of all the values). Primary key exists here. Hope it helps.

      Delete
  4. Hi Deepanshu,

    Superb work & many many thanks... Can I get a detail explanation on different kinds of join (left, right, full, inner, cross) with PROC SQL as well as DATA MERGE. Again keep the good work up :)

    ReplyDelete

Next → ← Prev