This tutorial explains how to transpose data by converting multiple variables from long to wide format in SAS using the 'Double Transpose' method.
The code below creates a sample data set named 'temp' which constitutes five variables - ID, TIME, X1, X2, X3. This dataset will be used to explain the examples in this tutorial.
data temp; input ID time $ x1-x3; cards; 1 Y1 85 85 86 1 Y2 80 79 70 1 Y3 78 77 87 2 Y1 79 79 79 2 Y2 83 83 85 ; run;
We want to see our output look like the table shown in the image below -
proc sort data=temp; by ID time; run; proc transpose data=temp out=out1; by ID time; var x1-x3; run; proc transpose data=out1 delimiter=_ out=new2(drop=_name_); by ID; var col1; id _name_ time; run;
1. First step, it is required to sort the variables 'ID' 'time' before using them in BY statement in PROC TRANSPOSE. It is done with PROC SORT.
2. In the first transpose of the above code, we are telling SAS to store information of all the variables in a single variable and the respective values in the another variable. And we do not want to transpose variables ID and Time. Hence, we have specified them in BY statement. See the following output generated in this step -
ID | Time | _NAME_ | COL1 |
---|---|---|---|
1 | Y1 | x1 | 85 |
1 | Y1 | x2 | 85 |
1 | Y1 | x3 | 86 |
1 | Y2 | x1 | 80 |
1 | Y2 | x2 | 79 |
1 | Y2 | x3 | 70 |
1 | Y3 | x1 | 78 |
1 | Y3 | x2 | 77 |
1 | Y3 | x3 | 87 |
2 | Y1 | x1 | 79 |
2 | Y1 | x2 | 79 |
2 | Y1 | x3 | 79 |
2 | Y2 | x1 | 83 |
2 | Y2 | x2 | 83 |
2 | Y2 | x3 | 85 |
3. Second transpose further reshapes the data from long to wide format. It generates the desired output. The delimiter= option is used to place a separator that separates values of two ID variables.
thanks for your information
ReplyDeleteplz provide more examples.....Tx..
ReplyDeleteHi, when I run your last example code above, I got error messages:
ReplyDeleteproc transpose data=out1 delimiter=_ out=new2(drop=_name_);
---------
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, (, DATA, LABEL, LET, NAME, OUT,
PREFIX.
ERROR 76-322: Syntax error, statement will be ignored.
328 by ID;
329 var col1;
330 id _name_ time;
-----
22 200
ERROR 22-322: Expecting ;.
ERROR 200-322: The symbol is not recognized and will be ignored.
331 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.05 seconds
cpu time 0.04 seconds
Why is that? Thank you very much for your help!
I Have transaction dataset in which I have a column of expenses I want to keep all transactions side by side using comma based on the account id wise.Below i have mentioned small scenario of the one.
ReplyDeleteAcct_Id gender expenses
101 M 20000
102 F 20000
103 F 50000
101 M 10000
103 F 18000
102 F 21000
102 F 11000
103 F 49000
101 M 20000
I want all expenses in one column side by side using deimeter as comaa, I want it as below in SAS, Can anyone please assist me in doing this will be a great help for me.
101 M 20000,10000,20000
102 F 20000,21000,11000
103 F 50000,18000,49000
Thanks and regards,
Swarupa
First sort d data by id then
DeleteProc transpose data=example out=example1(drop=_name_) prefix=expenses;
Var expenses;
By acct_id gender;
Run;
Then use catx
Data ex2(drop=expenses expenses2 expenses 3);
Set example1;
Expenses=catx(',',expenses1,expenses,expenses);
Run;
Respect+
DeleteHello,
ReplyDeleteI have this dataset:
CONTRACT IND MONTH1 MONTH2 MONTH3
1 100 10 20 30
1 200 30 10 10
2 100 20 20 20
2 300 10 20 30
I need this dataset:
CONTRACT MONTH IND100 IND200 IND300
1 month1 10 30 0
1 month2 20 10 0
1 month3 30 10 0
2 month1 20 0 10
2 month2 20 0 20
2 month3 20 0 30
Can you help me?
Thank you
data mydata;
Deleteinput CONTRACT IND MONTH1 MONTH2 MONTH3;
cards;
1 100 10 20 30
1 200 30 10 10
2 100 20 20 20
2 300 10 20 30
;
run;
proc transpose data=mydata prefix=IND out=out1(rename=(_name_=Month));;
by CONTRACT;
var MONTH1-MONTH3;
id IND;
run;
In case you want zeros instead of missing values in PROC TRANSPOSE output, you can add the following program after running proc transpose.
Deletedata out1;
set out1;
array replace _numeric_;
do over replace;
if replace=. then replace=0;
end;
run;
Use
DeleteOption missing=0;
It will replace missing value as 0, you can also pass value as per ur requirement
Thank you very much for the answer.
ReplyDeleteI have read in several sites that the proc transpose is a very slow procedure for large data files that is my case. There is some other option in SAS code to do it.
Regards,
natàlia
/*we can not use groupconcat function over proc SQL so we use retain statement here*/
ReplyDeleteDATA XYZ(KEEP=ID KEEP=GENDER KEEP=NEW_COL);
SET A;
BY ID GENDER;
RETAIN NEW_COL;
LENGTH NEW_COL $500.;
IF FIRST.ID THEN NEW_COL=SALARY;
ELSE NEW_COL=CATX(',',NEW_COL,SALARY);
IF LAST.ID THEN OUTPUT ;
RUN;
Thank you so much. This post is so valuable. I tried several ways but were too much coding. I tried yours with minor modify and it works perfectly. In the 1st proc transpose, I used all the variable that I need to reshape and in the 2nd proc transpose, I use "by id time" rather than "by id" only as you shown above.
ReplyDeleteSo appreciate your sharing.
SAS Error - Variable name truncation; 32-character restriction
ReplyDeleteHello, I am currently using SAS Studio through On Demand for Academics to complete a project.
The current technical issue arises when creating a SAS table whose column headers are given the names of distinct values of a single column from another SAS table.
This is the basic SAS code I am using to turn ALLTAGS into a table with the values as column headers:
__________________________________________________
/* turn distinct tags into column headers*/
proc sql;
create table alltags
as
select distinct
tag, count(distinct tag) as count
from phase2.q1_10q;
quit;
proc transpose
data = work.alltags
out=tagsAsFields
;
ID tag;
var count;
I receive the error:
ERROR: The ID value "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" occurs twice in the input data set.
Where “"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" contains the first 32 characters of the first instance encountered where the first 32 characters of the string are not distinct from the first 32 characters of a previously generated column.
e.g.
Both
“APICShareBasedPaymentArrangementAcquisitionsIncreaseForCostRecognition”
And
“APICShareBasedPaymentArrangementIncreaseDecreaseForCostRecognition”
Become
“APICShareBasedPaymentArrangement”
”
So they cannot both be made column names.
____________________________________________
I understand that I can create a table with headers >32 characters outside of SAS, then when I import the table to SAS specify variable names (32 char or less) in the data step.
However, the data set I am transposing contains >8000 distinct entries, ~5000 of which are >32 characters. Manually renaming each field in the data step is not a feasible task (unless properly automated in some way).
The final data set is to be imported into SAS Enterprise Miner 15.1 Where analysis will take place. It is my current understanding that the 32-character limit will be imposed again when the data set is imported into Enterprise Miner. Can this restriction be bypassed?
Our research also suggested that variable labels can be used to store strings >32 character and remain permanently associated with the variable/column/field in the metadata. Unless there is a method of automating such label assignment, the size of the dataset prohibits this approach. Some support.sas threads suggest these labels may be deprecated in some way.
If there is a known solution to bypassing the 32-character limit in column names for SAS tables(in Studio and/or EM), please provide details that would assist me in completing this task. Thank you for your time.
This is great and very helpful. thanks Deepanshu :)
ReplyDelete