This article shows how to calculate the number of missing (blank) and non-missing (non-blank) values in each observation (row) in SAS.
This is a usual task for SAS programmers but it is mostly associated with finding out the number of rows or number of non-missing values across rows. Remember this SAS rule - use PROCs to aggregate a COLUMN. But use FUNCTIONs to aggregate a row.
Let's create a sample data for demonstration -
The program below creates a sample dataset which would be named as TEMP and it would be stored in WORK library.
data temp; input x y z a b$; cards; 1 23 24 50 AA 1 . 24 50 AC 1 13 . 50 AB 1 23 . 50 . ; run;
The sample data looks like below -
Sample Data |
The SAS function N calculates the number of non-blank numeric values in each row. To count the number of missing numeric values, you can use NMISS function.
data outdata; set temp; nvalues = N(of x--a); nmiss = nmiss(of x--a); proc print; run;
Output |
Suppose you need to calculate number of both character and numeric non-missing and missing values.
Since SAS has no inbuilt function to calculate the number of variables, we need to use PROC CONTENTS to calculate the number of variables. Later we are storing the number of variables information in a macro variable which is totvar.
The function CMISS counts the number of missing values in each row. It considers missing values of both numeric and character variables.
proc contents data=temp out=cols noprint; run; data _null_; set cols nobs=total; call symputx('totvar', total); run; data outdata; set temp; totalvar=&totvar; totmiss=cmiss(of x--b); totnonmiss=totalvar- cmiss(of x--b); proc print ; run;
SAS : Output |
In the above program, there are 4 observations, and NOBS is used to calculate the number of observations, so how can we calculate number of variables. I mean when we use "set cols nobs=total;" the value of total should be 4 and the value of &totvar should also be 4, but in output it is showing 5. How? I am confused.
ReplyDeleteHey, it is because out statement in proc contents will create a new dataset with the number of variables as number of observation.
ReplyDeleteHi if I want to list out the names of variables missing by each row how can I do that?
ReplyDeleteHow can we find number of Null values in all the columns and rows. Also, find the percentage of Null values in each
ReplyDeletecolumn. Round off the percentages upto two decimal places
I want to export data in 2 different xls how I'm supposed to do that ? Data is 8lac plus due to which I'm not getting full output.
ReplyDeleteCode I used:
Proc SQL;
Create table test as
Select * from work
Where month = sep21 ;
Quit;
Proc export data = test
Outfike = "path"
DBMS = xlsx replace;
Run;