Suppose you want to delete empty rows from a dataset in SAS. It generally happens when we import data from external sources such as excel / csv files. It loads additional rows that are totally blank. Sometimes blank observations also affect the desired output so it's necessary to check missing cases and treat them.
Sample Dataset
The sample dataset looks like below. In the dataset, we have four variables - 1 character and 3 numeric. It would be used further in the example to demonstrate how to remove empty rows.
Create a SAS dataset
The following program creates a sample dataset in SAS.
Method I : Remove rows where all variables having missing / blank values
Notes :
Sample Dataset
The sample dataset looks like below. In the dataset, we have four variables - 1 character and 3 numeric. It would be used further in the example to demonstrate how to remove empty rows.
Name | Score1 | Score2 | Score3 |
---|---|---|---|
Sam | 77 | 68 | 66 |
Deepanshu | 50 | 89 | |
Shane | 55 | 78 | 89 |
Roger | 50 | 97 | 86 |
Priya | 88 | 68 | 93 |
Create a SAS dataset
The following program creates a sample dataset in SAS.
data outdata;
length Name $12.;
input Name $ Score1 Score2 Score3 ;
infile datalines missover;
datalines;
Sam 77 68 66
Deepanshu 50 . 89
Shane 55 78 89
Roger 50 97 86
Priya 88 68 93
;
run;
Method I : Remove rows where all variables having missing / blank values
options missing = ' ';
data readin;
set outdata;
if missing(cats(of _all_)) then delete;
run;
Notes :
1. The MISSING= system option is used to display the missing values as a single space rather than as the default period (.) options missing = ' ';
2. The CATS function concatenates the values. It also removes leading and trailing blanks. cats(of _all_) - Concatenate all the variables
3. missing(cats(of _all_)) - Identifies all the rows in which missing values exist in all the variables.
Output
In the output dataset, we have 5 rows. One row is deleted from the original dataset.
In the output dataset, we have 5 rows. One row is deleted from the original dataset.
SAS : Delete empty rows from a dataset |
Method 2
In this program, NMISS function checks the numeric of missing numeric values and CMISS checks the number of missing character values. In this code, we are telling SAS to delete records wherein both the character and numeric values are missing.
data readin;
set outdata;
if nmiss( of _numeric_ ) and cmiss(of _character_) then delete ;
run;
Method 3
In the code below, we are using COALESCE and COALESCEC functions to return non-empty rows. These functions return first non-missing value. If all values are missing, it returns missing. Later we are checking whether these functions return missing or not.
data readin;
set outdata;
if missing(coalescec(of _character_)) and missing(coalesce(of _numeric_)) then delete;
run;
Example : Delete rows where any variable has missing values
data readin;
set outdata;
if nmiss(of _numeric_) OR cmiss(of _character_) > 0 then delete;
run;
In this case, we are using OR operator to check if any of the variable has missing values. It returns 4 observations. Check out the output below -
Output |
Practice Questions - Try it yourself
Q1. Remove records wherein all numeric columns having empty rows.
Q2. Remove records wherein all character variables having empty rows.
Post your answer in the comment box below -
Because Name is your 'key', your code could be written more succinctly as:
ReplyDeletedata readin;
set outdata;
where not missing( Name);
run;
But, if you got records with a Name, but no Scores, you could write it as:
data readin;
set outdata;
where n( of Score1-Score3) > 0 ;
run;
Cheers, :)
But if i want to delete rows only when missing values exist in all the variables ;)
ReplyDeleteAnytime you change the Options, please do take care to change it back so that later code doesn't change its behavior.
ReplyDeleteoptions missing = ' ';
ReplyDeletedata readin;
set outdata;
if missing(cats(of _all_)) then delete;
run;
when we run this code i am getting an error that _all_ is uninitialise
It is working fine at my end. Which version of SAS you are using?
Deletemy question--
ReplyDeletedata abc;
input name$ sex$ age;
datalines;
tushar m 26
dhiru m 27
twinkle f 20
ankita f 22
goldi f 24
nikhil m 25
;
i want to arrange observations by sex alternatively like 1st m 2nd f 3rd m 4th f
5th m 6th f.
how i do it ????
plz tell me ans.
proc sort data = abc;
Deleteby sex age;
run;
data abc1;
set abc;
retain counter;
if first.sex then counter = 1;else counter = counter + 1;
by sex;
ranking = strip(input(counter,z9.) || sex);
drop counter;
run;
proc sort data = abc;
Deleteby sex age;
run;
data abc1;
set abc;
by sex;
retain counter;
if first.sex then counter = 1;else counter = counter + 1;
run;
proc sort data = abc1;
by descending sex ;
run;
proc sort data = abc1;
by counter;
run;
@ashwani
To delete blank rows in any dataset-
ReplyDeletedata a;
set outdata;
if cmiss(of _all_) = 4 then delete; // 4 is number of variables in data
run;
options missing = ' ';
ReplyDeletedata readin;
set outdata;
if missing(cats(of _numeric_)) then delete;
run;
this is to delete rows where all numeric columns have missing values.
please comment if the answer is right or wrong
What can I do if want to delete all columns from dataset having blank values
ReplyDeleteI want to know how to reduce the length of particular column based on the maximum length of the value in that particular column. e.g if i assigned column a as 200 length but i have maximum length of the value present in it as 20 .
ReplyDeleteThe below query will give the max length of the variable data.
Deleteproc sql; select max(length(forename)) from have; quit;
data readin;
ReplyDeleteset outdata;
if missing(coalescec(of _char_)) then delete;
run;
data readin1;
set outdata;
if missing(coalesce(of _numeric_)) then delete;
run;