This tutorial explains how to use the PROC APPEND procedure in SAS, along with examples.
The basic syntax of PROC APPEND is as follows :
PROC APPEND BASE=output-dataset DATA=source-dataset;
RUN;
BASE=
: Specify the name of the dataset where the source dataset will be added. DATA=
Specify the name of the dataset that needs to be added.
We are creating two sample datasets to explain examples in this tutorial.
data mydata1;
input Product $ Sales Profit;
datalines;
A 110 23
B 147 31
C 238 51
D 207 42
;
run;
data mydata2;
input Product $ Sales Profit;
datalines;
E 81 13
F 87 17
G 69 10
;
run;
The following SAS code appends data from 'MYDATA2' dataset to 'MYDATA1' dataset.
PROC APPEND BASE=mydata1 DATA=mydata2;
RUN;
In this example, 'final_data' is a new dataset where the data from mydata1 and mydata2 will be appended. Please note that final_data does not exist so SAS will create it while appending data.
PROC APPEND BASE=final_data DATA=mydata1;
RUN;
PROC APPEND BASE=final_data DATA=mydata2;
RUN;
PROC APPEND would return error if the source dataset has different variable length as compared to target dataset. See the example below.
data mydata3;
input Product $ Sales Profit;
datalines;
A 110 23
B 147 31
C 238 51
D 207 42
;
run;
data mydata4;
length Product $10.;
input Product $ Sales Profit;
datalines;
E 81 13
F 87 17
G 69 10
;
run;
PROC APPEND BASE=mydata3 DATA=mydata4;
RUN;
WARNING: Variable Product has different lengths on BASE and DATA files (BASE 8 DATA 10).
ERROR: No appending done because of anomalies listed above. Use FORCE option to append these files.
Use the FORCE option within PROC APPEND to tell SAS to avoid mismatch of variable length while appending datasets.
PROC APPEND BASE=mydata3 DATA=mydata4 force;
RUN;
When you have datasets that contain different variable names, FORCE option within PROC APPEND would let them combined but returns missing values for those variables.
data team1;
input Names $ Scores Matches;
datalines;
A 110 23
B 147 31
C 238 51
D 207 42
;
run;
data team2;
input Players $ Goals Matches;
datalines;
E 81 13
F 87 17
G 69 10
;
run;
PROC APPEND BASE=team1 DATA=team2 force;
RUN;
The solution is to rename them before appending using the RENAME= option:
PROC APPEND BASE=team1 DATA=team2 (rename=(players=names goals=scores));
RUN;
We can use the WHERE= dataset option to filter data while appending datasets. In this example, we are selecting only those observations where variable scores are greater than 80.
PROC APPEND BASE=team1 DATA=team2 (rename=(players=names goals=scores) where=(scores>80));
RUN;
PROC APPEND can handle only two datasets at a time. If you have more than two datasets to append, you have to run PROC APPEND multiple times.
PROC APPEND is faster than using SET statement because it adds data from one dataset to another without having to read through both datasets. When you use the SET statement, SAS has to read through both datasets which can take a long time for large datasets.
Share Share Tweet