SAS : Count Distinct Values of Variables

Deepanshu Bhalla 1 Comment
This tutorial explains how to count distinct values of variables using PROC SQL and PROC FREQ. We will also check the performance of these two approaches.

PROC SQL : Count Distinct Values
proc sql;
create table t as
select count(distinct make) as n_make, count(distinct type) as n_type
,count(distinct origin) as n_origin
from sashelp.cars;
quit;

PROC FREQ : Count Distinct Values
ods output nlevels = charlevels;
proc freq data=sashelp.cars (keep = make type origin) nlevels;
tables make type origin / nopercent nocol nocum nofreq noprint;
run;
ods output close;

Performance Testing : PROC SQL vs. PROC FREQ

I have stacked raw data set multiple times (500 to 5000 times) to check the performance of these two procedures. In addition, i have added NOPRINT and ODS OUTPUT CLOSE to improve efficiency of the PROC FREQ code.
Count Distinct Values of a Variable
%macro stack (inputdata = sashelp.cars, iterations=, out=);

data temp;
length x $45.;
do i = 1 to &iterations.;
if i then x = "&inputdata.";
output;
end;
drop i;
run;

proc sql noprint;
select x into: n separated by ' '
from temp;
quit;

data &out.;
set &n;
run;

ods output nlevels = charlevels;
proc freq data=outdata (keep = make type origin) nlevels;
tables make type origin / nopercent nocol nocum nofreq noprint;
run;
ods output close;

proc sql;
create table t as
select count(distinct make) as n_make, count(distinct type) as n_type
,count(distinct origin) as n_origin
from outdata;
quit;

%mend;

%stack (inputdata = sashelp.cars, iterations= 500, out= outdata);
%stack (inputdata = sashelp.cars, iterations= 1000, out= outdata);
%stack (inputdata = sashelp.cars, iterations= 2000, out= outdata);
%stack (inputdata = sashelp.cars, iterations= 5000, out= outdata);
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

1 Response to "SAS : Count Distinct Values of Variables"
  1. Nice post. Your PROC FREQ solution could also use the built-in variables _ALL_, _CHARACTER_, _NUMERIC_ to avoid having to list (or even know the names of) individual variables. I don't know of a way to do the same via PROC SQL.

    ReplyDelete
Next → ← Prev