In SAS, many-to-many merges are handled very differently via Data Step MERGE and PROC SQL JOIN.
Let's take an example -Suppose you have two data sets. You want to merge both the data sets but there are duplicate values in the common variable (ie. primary key) of any or both of the datasets.
Many to Many Merging |
Data Step : Many to Many Merge
The DATA step Merge does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product of matching observations. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records.
Data Step MERGE does not create a cartesian product in case of a many-to-many relationship. It will return number of records for a duplicate value equal to maximum number of the duplicate value in both the table.SAS Code -
data dat1; input ID Info; cards ; 1 3123 1 1234 2 7482 2 8912 3 1284 ; run; data dat2; input ID Info2; cards ; 1 4444 1 5555 1 8989 2 9099 2 8888 3 8989 ; run; data combined; merge dat1 dat2 ; by ID; run;
Output : Merge |
PROC SQL JOIN : Many to Many Merge
PROC SQL JOIN creates all possible combinations of matching observations in case of a many-to-many relationship. Cartesian product is a collection of all pairs of two given sets. For example, In ID variable, there are 2 1's in dat1 dataset and 3 1's in dat2 dataset, the cartesian product would be (3*2 = 6 Observations) in the final result.
proc sql noprint;See the output shown in the image below -
create table combined2 as
select * from dat1 a
join dat2 b
on a.ID = b.ID;
quit;
SAS : Many to Many Merging |
super work..keep continuing..
ReplyDeleteNice work!!!
ReplyDeletewhy the sql code says left join? Shouldn't it be cross join?
ReplyDeleteHere we are talking about all possible combinations of matching observations (not all possible combinations of all the values). Primary key exists here. Hope it helps.
DeleteHi Deepanshu,
ReplyDeleteSuperb work & many many thanks... Can I get a detail explanation on different kinds of join (left, right, full, inner, cross) with PROC SQL as well as DATA MERGE. Again keep the good work up :)
Hi,
ReplyDeletecan you please explain how it writes the value 1234 in info for last 1 in output : Merge ?
Hi
ReplyDeleteHave a query
I have 10 observations in dataset A and 20 observations in dataset B
How to find whether the 10 observations in dataset A are present in dataset B
Use Innerjoin
DeleteUse intersect operator
DeleteWhat if i did not want all 6 obs shown in example but needed only the ones in first dataset merged with the variables in 2nd dataset without extra observations? What i mean is if I have 10 obs(with repeats) and 6 variables in dataset A and 12 obs(with repeats) and 3 variables in dataset B; i want the product to be Dataset C with 10 obs from A but will all 9 variables coming from A and B?
ReplyDeleteCan u please tell me in case of many to many match which would be more perfect Data Step Merge or Proc SQL join and why????
ReplyDeleteWhat is the difference in Cartesian Product and Cross join. If they are one and the same, why haven't you used keyword cross in the above query?
ReplyDeleteWell explained..
ReplyDelete