In SAS, Numeric and Character missing values are represented differently.
Numeric Missing Values
SAS stores 28 missing values in a numeric variable. They are as follows :
The following code checks for dot missing value only. It does not check for other 27 special numeric missing values (._ , .A through .Z)
Character missing values are represented by a single blank enclosed in quotes ' '.
Suppose we have a data set containing three variables - X, Y and Z. They all have some missing values. We wish to compute sum of all the variables.
N : The N() function returns the number of non-missing values in a list of numeric variables.
CALL MISSING
SAS provides the statement CALL MISSING() to explicitly initialise or set a variable value to be missing.
How missing values are handled in SAS procedures
1. PROC FREQ
By default, missing values are excluded and percentages are based on the number of non-missing values. If you use the "/ MISSING" option on the tables statement, the percentages are based on the total number of observations (non-missing and missing) and the percentage of missing values are reported in the table.
Numeric Missing Values
SAS stores 28 missing values in a numeric variable. They are as follows :
- dot-underscore . _
- dot .
- .A through .Z ( Not case sensitive)
Sorting Order : dot- underscore is the lowest valued missing value. After the dot-underscore, comes the dot, and then the dot-A. The dot-Z is the highest valued missing value.
Run the following code and see how SAS treats them missing value
data temp;
input x;
cards;
1
2
3
.
.A
.X
.Z
._
4
;
run;
proc freq;
table x;
run;
Check for missing numeric values
If x =. then PUT "x is missing";
The following code checks for all 28 numeric missing values (. , ._ , .A through .Z)
If x <=.z then PUT "x is missing";
The MISSING function accepts either a character or numeric variable as the argument and returns the value 1 if the argument contains a missing value or zero otherwise.
If missing(x) then PUT "x is missing";
Character missing values
If y = ' ' then put "y is missing";
If missing(y) then put "y is missing";
Working with Missing Values
Suppose we have a data set containing three variables - X, Y and Z. They all have some missing values. We wish to compute sum of all the variables.
data mydata2;SUM function returns the sum of non-missing arguments whereas “+” operator returns a missing value if any of the arguments are missing.
set mydata;
a=sum(x,y,z);
p=x+y+z;
run;
Functions that handle MISSING data
NMISS : The NMISS() function will return the number of missing values in the specified list of numeric variables. The NMISS() function will convert any character values to numeric before assessing if the argument value is missing.
IF NMISS(x,y,z) = 0 then PUT " All variables have non-missing values";CMISS : The CMISS() function introduced in SAS 9.2 is similar to the NMISS() function that it counts the number arguments that are missing, but for both character and numeric variables without requiring character values to be converted to numeric.
N : The N() function returns the number of non-missing values in a list of numeric variables.
CALL MISSING
SAS provides the statement CALL MISSING() to explicitly initialise or set a variable value to be missing.
data _null_;
call missing( num_1, num_2, x ); num_3 = x;
put num_1 = / num_2 = / num_3 = ;
run;
Delete empty rows
options missing = ' ';
data readin;
set outdata;
if missing(cats(of _all_)) then delete;
run;
How missing values are handled in SAS procedures
1. PROC FREQ
By default, missing values are excluded and percentages are based on the number of non-missing values. If you use the "/ MISSING" option on the tables statement, the percentages are based on the total number of observations (non-missing and missing) and the percentage of missing values are reported in the table.
PROC FREQ DATA= TEST;
TABLES X / MISSING;
RUN;
2. PROC MEANS
It produces statistics on non-missing data only. The NMISS option is used to calculate number of missing values.
By default, correlations are computed based on the number of pairs with non-missing data (pairwise deletion of missing data). The nomiss option can be used on the proc corr statement to request that correlations be computed only for observations that have non-missing data for all variables on the var statement (listwise deletion of missing data).
4. PROC REG
If any of the variables on the model or var statement are missing, they are excluded from the analysis (i.e., listwise deletion of missing data)
5. PROC LOGISTIC
If any of the variables on the model or var statement are missing, they are excluded from the analysis (i.e., listwise deletion of missing data)
6. PROC FACTOR
Missing values are deleted listwise, i.e., observations with missing values on any of the variables in the analysis are omitted from the analysis.
It produces statistics on non-missing data only. The NMISS option is used to calculate number of missing values.
Proc Means Data = test N NMISS;To see number of observations having a missing value for the classification variable, type MISSING option in PROC MEANS.
Var q1 - q5 ;
Run;
Proc Means data = test N NMISS MISSING;3. PROC CORR
Class Age ;
Var q1 - q5;
Run;
By default, correlations are computed based on the number of pairs with non-missing data (pairwise deletion of missing data). The nomiss option can be used on the proc corr statement to request that correlations be computed only for observations that have non-missing data for all variables on the var statement (listwise deletion of missing data).
4. PROC REG
If any of the variables on the model or var statement are missing, they are excluded from the analysis (i.e., listwise deletion of missing data)
5. PROC LOGISTIC
If any of the variables on the model or var statement are missing, they are excluded from the analysis (i.e., listwise deletion of missing data)
6. PROC FACTOR
Missing values are deleted listwise, i.e., observations with missing values on any of the variables in the analysis are omitted from the analysis.
Thanks a lot! This blog has been very helpful.
ReplyDeleteThanks, the information was really helpful.
ReplyDeleteI tried this code with the temp data but its not removing any rows.
ReplyDeleteoptions missing = ' ';
data readin;
set outdata;
if missing(cats(of _all_)) then delete;
run;
Output is same as the data set. Is it working for any one else ?
i tried the same code
Deletedata work.missvalue;
input x y z;
cards;
1 2 4
7 8 .
7 - 7
6 9 .
9 8 _
9 9 7
;
run;
options missing='';
data readin;
set work.missvalue;
IF MISSING(CATS(OF_ALL_))THEN DELETE;
RUN;
but in my above code out is blank : Rows 4 and Column 0
please guide someone
Please edit this chapter based on the fact that some of the datasets are dependent on other chapters and we can't run the code to see what's going on (unless we still have prior datasets)
ReplyDeleteBrO,Why dont you help in doing the same.Rather then pointing someone,do some good for others as well.
DeleteMr.Deepanshu is giving his best for all beginners.
Hi....,
ReplyDeleteI would like to get the variables has to be printed which have couple of missing values.
For ex: if the name variable has missing record the name variable should be printed.
if id variables has a missing record the id variables has to be printed.
thanks for the excellent tutorials , please keep datasets and examples in same lesson. it will be helpful for indevedual topic readers.
ReplyDeletedata nidhi;
ReplyDeleteinput x y z;
cards;
22 5 56
. 25 96
56 . 56
66 52 41
96 52 .
85 . 52
. . 44
;
run;
I want to assign 0 to the all missing values in one statment.
can anyone help?
Try this out ....
ReplyDeletedata nidhi1;set nidhi;
array new _numeric_;
do over new;
if new=. then new=0;
end;
run;
I want to treat 43% missing with value=1,27% missing with value=2, 16% missing with value=3, 14% missing with value=4.
ReplyDeleteHow can I do this?
Can anyone please help?
Thanks for sharing great post
ReplyDelete