This tutorial explains how to count the number of missing values for each variable in SAS, along with examples.
The following SAS program creates a dataset which will be used to explain examples in this tutorial.
DATA mydata; INPUT ID $ Name $ Gender $ Age Height Weight; DATALINES; 001 John M 25 180 . 002 . F 30 170 70 003 Mary F . 160 60 004 James M 35 . 80 005 Emma . 40 155 . ; RUN;
Count Missing Values of Numeric Columns
In PROC MEANS, the NMISS
counts missing values and N
option counts non-missing values for each numeric variable in a SAS dataset. In the example below, we have a dataset named "mydata".
proc means data=mydata NMISS N; run;
To calculate the number of missing values for specific numeric variables, you can define them in the VAR statement.
proc means data=mydata NMISS N; var Height Weight; run;
If you have a grouping (categorical) variable which contains missing values and you want to include them in the summary table, you can use the MISSING option within the PROC MEANS procedure.
Proc Means data = mydata N NMISS MISSING; Class Gender; Var Height Weight; Run;
As you can see in the table above, the variable "gender" has one missing value.
Count Missing Values of Character Columns
In PROC SQL, the NMISS function returns the number of missing values for both character and numeric variables.
proc sql; select nmiss(gender) as n_missing_gender, nmiss(name) as n_missing_name, nmiss(weight) as n_missing_weight from mydata; quit;
Count Missing Values of All Variables in a Dataset
We can specify format to label missing and non-missing values in a table and then we can use it in the PROC FREQ procedure to print the number of missing and non-missing values for each variable in a dataset.
proc format; value $missfmt ' '='Missing' other='Non-Missing'; value missfmt . ='Missing' other='Non-Missing'; run; proc freq data=mydata; format _CHAR_ $missfmt.; tables _CHAR_ / missing missprint nocum nopercent; format _NUMERIC_ missfmt.; tables _NUMERIC_ / missing missprint nocum nopercent; run;
The keyword _CHAR_ denotes all the character variables and _NUMERIC_ denotes all the numeric variables.
How to Save Count of Missing Values in a Dataset
You can use the ods output
statement with stackods
option within PROC MEANS to store the count of missing values into a SAS dataset. The code below saves the result into a new dataset named "newData".
proc means data=mydata NMISS N stackods; ods output summary=newData; run;
SAS Macro : Count of Missing Values in a Dataset
The following macro returns the number of missing values for each variable in a SAS dataset and stores the output in a new dataset named "missingData".
%macro missing_vars(dataset); /* Get the variable names */ proc contents data=&dataset out=contents(keep=name) noprint; run; /* Create macro variables for each variable name */ data _null_; set contents end=last; call symputx(cats('var', _n_), name); if last then call symputx('nvars', _n_); run; /* Generate the PROC SQL code */ proc sql noprint; select catx(' ', 'nmiss(', name, ') as', cat('nmiss_', name)) into :counts separated by ',' from contents; quit; proc sql; create table missingData as select &counts from &dataset; quit; %mend missing_vars; %missing_vars(mydata);
This code first gets the variable names from the dataset using PROC CONTENTS then creates a macro variable for each variable name. It then generates the PROC SQL code to calculate the count for each variable and finally runs the PROC SQL code.
Share Share Tweet