What does Proc Tabulate do?
Proc Tabulate is mainly used to create a professional looking table.Terminologies
VAR : The Var statement tells SAS that these variables are analysis variables. They must be numeric. They are used to create summary statistics.
CLASS : The Class statement tells SAS that these variables are categorical variables.
TABLE : The Table statement tells SAS which variables are row expressions, which are column expressions.
Table Salary; - If there are no commas in the TABLE statement, SAS assumes you are only defining the column expression.
Table Gender, Salary; - If there is one comma, then it is a row expression before comma and column expression after comma.
Create a dataset
Data test;Simple Table
Input T1 T2 T3 T4 T5 Age BU;
Cards;
1 5 2 3 4 3 3
4 5 2 1 2 1 3
3 4 4 3 2 3 2
4 3 2 5 3 3 3
1 2 4 2 1 2 2
;
Run;
Proc tabulate data = test;The output is shown in the image below -
Var T1;
Table T1;
Run;
By default, it calculates SUM for variables.
How to add different Statistics Options
The asterisk * is used to add statistical keywords.
Suppose you want to calculate COUNT for T1 variable :
Proc tabulate data = test;
Var T1;
Table T1 * N;
Run;
Suppose you want to calculate both COUNT and SUM for T1 variable :
Proc tabulate data = test;
Var T1;
Table T1 * (N SUM);
Run;
Statistics Options
Cross Tab
In the TABLE statement, row expression is specified first then followed by comma and then column expression.
Proc Tabulate Data = test;
Class Age;
Var T1;
Table Age, T1 * (N COLPCTN);
Run;
Transposed Format
Proc Tabulate Data = test;
Class Age;
Var T1;
Table T1, Age * (N ROWPCTN);
Run;
Label Variables and Statistics
The "=" equal sign is an operator most commonly used for formatting.
Proc Tabulate Data = test;The above SAS program can be written as follows :
Class Age;
Var T1;
Table Age, T1 = "Group I" * (N="Count" COLPCTN="%");
Run;
Proc Tabulate Data = test;
Class Age;
Var T1;
Keylabel N="Count" COLPCTN="%";
Table Age, T1 = "Group I" * (N COLPCTN);
Run;
The Keylabel statement will change the label of keywords. Both the above programs produce same output.
Shift the row header up
In order to hide variable or labels, you leave the label specification blank (i.e. =‘ ‘ ).
Proc Tabulate Data = test;
Class Age;
Var T1;
Table Age=" ", T1 = "Group I" * (N="Count" COLPCTN="%") / box="Age";
Run;
The ALL keyword is used to generate a sum total for rows or columns.
Proc Tabulate Data = test;
Class Age;
Var T1;
Table Age ALL = "Grand Total" , T1 = "Group I" * (N="Count" COLPCTN="%");
Run;
Two level data cuts
Proc tabulate data = test;
Class Age BU;
Var T1;
Table T1="Group I",(Age * BU="Business Unit") * (N="Count" ROWPCTN="%");
Run;
In order to hide variable or statistic labels, you leave the label specification blank (i.e. =‘ ‘ ).
Proc tabulate data = test;
Class Age BU;
Var T1;
Table T1="Group I",(Age=" " * BU="Business Unit") * (N="Count" ROWPCTN="%");
Run;
Cell Formats
You can use the asterisk * to associate the format modifier “F=” to the summary statistic.
Proc tabulate data = test;
Class Age BU;
Var T1;
Keylabel N="Count" ROWPCTN="%";
Table T1="Group I",(Age * BU="Business Unit") * (N ROWPCTN * F=6.0);
Run;
Format Statement with Proc Tabulate
Proc format;
value agefmt
1 = 'Under 18'
2 = '18 - 25'
3 = 'Over 25';
Run;
Proc format;
value bufmt
1 = 'Analytics'
2 = 'Technology'
3 = 'Others';
Run;
Proc tabulate data = test;
Format Age agefmt. BU bufmt.;
Class Age BU;
Var T1;
Keylabel N="Count" ROWPCTN="%";
Table T1="Group I",(Age * BU="Business Unit") * (N ROWPCTN * F=6.0);
Run;
Three Dimensional Table
Data test;
length BU $18.;
Input Location$ BU$ Gender$ Income;
Cards;
Delhi Analytics Male 5000
Mumbai Tech Female 45000
Delhi Analytics Male 37000
Chennai Tech Male 33000
Delhi Tech Male 5000
Chennai Analytics Male 15000
Mumbai Analytics Female 440000
Delhi Analytics Female 5000
Mumbai Tech Male 45000
Delhi Analytics Female 37000
Chennai Tech Female 33000
Delhi Tech Female 5000
Chennai Analytics Male 15000
;
Run;
Proc tabulate data = test F=6.0;
Class Location BU Gender;
Var Income;
Table Location=" "*BU=" ", Gender * Income=" "*(N="Count" ROWPCTN="%") / Box="Location BU";
Run;
Proc tabulate data = test out=test1;
Format Age agefmt. BU bufmt.;
Class Age BU;
Var T1;
Keylabel N="Count" ROWPCTN="%";
Table T1="Group I",(Age * BU="Business Unit") * (N ROWPCTN * F=6.0);
Run;
You are awesome buddy... very useful
ReplyDeleteGreat!!!! very easy to understand
ReplyDeleteKeep up the good work, bro.
ReplyDeletesir ,i have a doubt Iam attending the interview on next month ,what type of questions will ask for 1 year experience data analytists, i was learning sas,r,python pllllllllllllz tell me some guidence for us
ReplyDeletehave u got the job ??
Deleteyes
DeleteThanks.very useful
ReplyDeleteI find listendata to be an extraordinary platform when it comes to- understanding any SAS/SQL related stuffs. The examples are presented beautifully. The progression of the complexity is very well maintained, which helps a user to understand the basic logic and relevant implementations first and progressively the more complex ones. Thank you for making life simpler :)
ReplyDeleteeasy to understand, crisp and to the point, thanks a lot
ReplyDeletethank you! great explanation
ReplyDeleteThank you for this detailed and simple explanation of the proc tabulate procedure
ReplyDelete