Proc Tabulate is mainly used to create a professional looking table.
The basic syntax of PROC TABULATE is as follows:
proc tabulate data = mydata; class categorical_variable; var analysis_variable; table mycolumn; run;
VAR : The Var statement tells SAS that these variables are analysis variables. They must be numeric. They are used to create summary statistics.
CLASS : The Class statement tells SAS that these variables are categorical variables.
TABLE : The Table statement tells SAS which variables are row expressions, which are column expressions.
Examples : Table Salary; - If there are no commas in the TABLE statement, SAS assumes you are only defining the column expression. Table Gender, Salary; - If there is one comma, then it is a row expression before comma and column expression after comma.
The following SAS program creates a dataset which will be used to explain examples in this tutorial.
Data test; Input T1 T2 T3 T4 T5 Age BU; Cards; 1 5 2 3 4 3 3 4 5 2 1 2 1 3 3 4 4 3 2 3 2 4 3 2 5 3 3 3 1 2 4 2 1 2 2 ; Run;
The following SAS code uses the PROC TABULATE procedure to create a table using the variable "T1" from the dataset "test".
Proc tabulate data = test; Var T1; Table T1; Run;
The output is shown in the image below -
By default, it calculates SUM for variables.
The asterisk * is used to add statistical keywords.
Suppose you want to calculate COUNT for T1 variable :
Proc tabulate data = test; Var T1; Table T1 * N; Run;
Suppose you want to calculate both COUNT and SUM for T1 variable :
Proc tabulate data = test; Var T1; Table T1 * (N SUM); Run;
In the TABLE statement, row expression is specified first then followed by comma and then column expression.
Proc Tabulate Data = test; Class Age; Var T1; Table Age, T1 * (N COLPCTN); Run;
In this SAS code, we generated a table that shows the frequency count and column percentage of variable T1 across different categories of Age in the dataset test. It helps to understand the distribution of variable T1 within each age group.
Proc Tabulate Data = test; Class Age; Var T1; Table T1, Age * (N ROWPCTN); Run;
The "=" equal sign is an operator most commonly used for formatting.
Proc Tabulate Data = test; Class Age; Var T1; Table Age, T1 = "Group I" * (N = "Count" COLPCTN ="%"); Run;
The above SAS program can be written as follows :
Proc Tabulate Data = test; Class Age; Var T1; Keylabel N="Count" COLPCTN="%"; Table Age, T1 = "Group I" * (N COLPCTN); Run;
The Keylabel statement will change the label of keywords. Both the above programs produce same output.
In order to hide variable or labels, you leave the label specification blank (i.e. =‘ ‘ ).
Proc Tabulate Data = test; Class Age; Var T1; Table Age=" ", T1= "Group I"* (N="Count" COLPCTN="%") / box="Age"; Run;
The ALL keyword is used to generate a sum total for rows or columns.
Proc Tabulate Data = test; Class Age; Var T1; Table Age ALL= "Grand Total", T1= "Group I"* (N="Count" COLPCTN="%"); Run;
Proc tabulate data = test; Class Age BU; Var T1; Table T1="Group I",(Age * BU="Business Unit") * (N="Count" ROWPCTN="%"); Run;
In order to hide variable or statistic labels, you leave the label specification blank (i.e. =‘ ‘ ).
Proc tabulate data = test; Class Age BU; Var T1; Table T1="Group I",(Age=" " * BU="Business Unit") * (N="Count" ROWPCTN="%"); Run;
You can use the asterisk * to associate the format modifier F= to the summary statistic.
Proc tabulate data = test; Class Age BU; Var T1; Keylabel N="Count" ROWPCTN="%"; Table T1="Group I",(Age * BU="Business Unit") * (N ROWPCTN * F=6.0); Run;
The FORMAT statement within the PROC TABULATE procedure is used to apply custom formats to variables in your tables. It shows data in a more readable way.
Proc format; value agefmt 1 = 'Under 18' 2 = '18 - 25' 3 = 'Over 25'; Run; Proc format; value bufmt 1 = 'Analytics' 2 = 'Technology' 3 = 'Others'; Run; Proc tabulate data = test; Format Age agefmt. BU bufmt.; Class Age BU; Var T1; Keylabel N="Count" ROWPCTN="%"; Table T1="Group I",(Age * BU="Business Unit") * (N ROWPCTN * F=6.0); Run;
Data test; length BU $18.; Input Location$ BU$ Gender$ Income; Cards; Delhi Analytics Male 5000 Mumbai Tech Female 45000 Delhi Analytics Male 37000 Chennai Tech Male 33000 Delhi Tech Male 5000 Chennai Analytics Male 15000 Mumbai Analytics Female 440000 Delhi Analytics Female 5000 Mumbai Tech Male 45000 Delhi Analytics Female 37000 Chennai Tech Female 33000 Delhi Tech Female 5000 Chennai Analytics Male 15000 ; Run;
Proc tabulate data = test F=6.0; Class Location BU Gender; Var Income; Table Location=" "*BU=" ", Gender * Income=" "*(N="Count" ROWPCTN="%") / Box="Location BU"; Run;
The OUT= option within the PROC TABULATE procedure is used to save output to a new dataset.
Proc tabulate data = test out=test1; Format Age agefmt. BU bufmt.; Class Age BU; Var T1; Keylabel N="Count" ROWPCTN="%"; Table T1="Group I",(Age * BU="Business Unit") * (N ROWPCTN * F=6.0); Run;
You are awesome buddy... very useful
ReplyDeleteGreat!!!! very easy to understand
ReplyDeleteKeep up the good work, bro.
ReplyDeletesir ,i have a doubt Iam attending the interview on next month ,what type of questions will ask for 1 year experience data analytists, i was learning sas,r,python pllllllllllllz tell me some guidence for us
ReplyDeletehave u got the job ??
Deleteyes
DeleteThanks.very useful
ReplyDeleteI find listendata to be an extraordinary platform when it comes to- understanding any SAS/SQL related stuffs. The examples are presented beautifully. The progression of the complexity is very well maintained, which helps a user to understand the basic logic and relevant implementations first and progressively the more complex ones. Thank you for making life simpler :)
ReplyDeleteeasy to understand, crisp and to the point, thanks a lot
ReplyDeletethank you! great explanation
ReplyDeleteThank you for this detailed and simple explanation of the proc tabulate procedure
ReplyDelete