SAS : Combining and Appending Datasets

Deepanshu Bhalla 3 Comments

This tutorial explains various methods to combine or append datasets in SAS, along with examples.

In SAS, datasets can be combined using PROC APPEND, SET statement and PROC SQL UNION methods. For example, you have multiple human records files from various departments of your company and you are asked to combine them so that there would be a single file containing information of all the departments.

Sample Dataset

Let's create two sample datasets named 'Dataset1' and 'Dataset2' to explain examples in this tutorial.

Data Dataset1;
Input Name $ Score;
cards;
David 30
Ram 25
Sam 74
Sandy 36
;
run;

Data Dataset2;
Input Name $ Score;
cards;
Ken 36
Obama 74
Raj 30
Shyam 25
;
run;

1. Concatenate Datasets

In this method, we specify the datasets in the SET statement that need to be combined.

Data Stack;
Set Dataset1 Dataset2;
Run;
Output
Append Datasets in SAS
Note : The stacked data set is not sorted because we have not used BY statement.

2. Sorted Stacked Dataset

Here we are concatenating sorted datasets into one sorted dataset. You do this by using a SET statement and a BY statement in a DATA step.

Note : Make sure data sets are sorted before using BY statement. Datasets can be sorted with PROC SORT.

proc sort data = dataset1;
by name;
run;

proc sort data = dataset2;
by name;
run;
Data Stack1;
Set Dataset1 Dataset2;
By Name;
Run;
Output

3. PROC SQL for Appending Datasets

OUTER UNION CORR keyword is used in PROC SQL to concatenate two data sets. The CORR tells SAS to append data sets by name (not by column position).

PROC SQL;
CREATE TABLE stackk AS
SELECT *
FROM Dataset1
OUTER UNION CORR 
SELECT *
FROM Dataset2
ORDER BY Name;
QUIT;
The output of PROC SQL is same as the output of previous example.

4. PROC APPEND

In PROC APPEND, the data set specified in BASE= option refers to a data set in which other data set would be added or appended. In log, it writes 'Appending Dataset2 to Dataset1' for our example. After running this code, the dataset1 contains 8 records ( 4 from the original 'dataset1' file and 4 from the dataset2)

proc append base=dataset1 data=dataset2;
run;

If you want to append data and store it to another dataset, you can run PROC APPEND twice to do it. In the first PROC APPEND, it would create a base table ALLDATA (as specfied in the code below). If the dataset ALLDATA does not already exist, it would be automatically created by SAS.

proc append base=alldata data=dataset1;
run;

proc append base=alldata data=dataset2;
run;
What is the fastest method for combining datasets?

PROC APPEND is faster than using the SET statement or PROC SQL UNION because it only reads the dataset being added (specified by the 'DATA=' option).

Uses of PROC APPEND

PROC APPEND is most useful when you use it in a macro. For example, you create a macro in which there is a loop for calculation of some metrics of multiple variables. In each iteration, it returns a data set which needs to be appended with the output of subsequent iterations so that once loop completes all the iterations, we would have complete data set.

Important Point

Suppose you have two datasets having same variable names but the length of the common variable is different, It would throw a warning and it would not append datasets. To workaround this issue, we can use FORCE option to append the data sets.

proc append base=dataset1 data=dataset2 force;
run;

5. Uses of Multiple SET Statements

Multiple SET statements combine datasets by adding column from the other dataset. Unlike INNER/LEFT Joins, it does not require any primary or unique key to join two datasets.

Data Dataset3;
Input Section $;
cards;
A
B
C
D
;
run;
Data Stack3;
Set Dataset1;
Set Dataset3;
Run;

For example, if we use multiple SET statements instead of specifying multiple datasets in a single SET statement, the result would be completely different. See the output shown below:

Data Stack3;
Set Dataset1;
Set Dataset2;
Run;

Warning : It overwrites data.

Use of Multiple SET statement
Many to Many Relationship While Appending Data

Multiple SET statement can also produce cartesian product of two tables. It can be done with POINT= option.

Data Stack2;
set Dataset1;
do i=1 to num;
set Dataset3 nobs=num point=i;
if i=i then output;
end;
run;
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 3 Responses to "SAS : Combining and Appending Datasets"
  1. Hi,
    what is the effect if we don't sort the data sets before appending?

    ReplyDelete
    Replies
    1. If you are using BY statement in merging but dont sort the data then the code will error out.

      Delete
  2. What does i=i do in the last code? Won't it always result in TRUE?

    ReplyDelete
Next → ← Prev