7 Ways to Remove Rows with NA in R

This tutorial demonstrates various ways to remove rows with missing (NA) values in R, along with several examples. NA refers to missing values. NA stands for "Not Available".

Below is a list of 7 different methods to remove rows with NA values in R.

Method 1: Removing Rows with NAs using na.omit() Function
newdf <- na.omit(df)
Method 2: Removing Rows with NAs using complete.cases() Function
newdf <- df[complete.cases(df), ]
Method 3: Removing Rows with NAs using rowSums() Function
newdf <- df[rowSums(is.na(df)) == 0, ]
Method 4: Removing Rows with NAs using drop_na() Function
library(tidyr)
newdf <- df %>% drop_na()
Method 5: Removing Rows with Only NAs in a Row using subset() & rowSums() Functions
newdf <- subset(df, rowSums(is.na(df)) != ncol(df))
Method 6: Removing Rows with Only NAs in a Row using filter() & rowSums() Functions
library(dplyr)
newdf <- filter(df, rowSums(is.na(df)) != ncol(df))
Method 7: Removing Rows with Only NAs in a Row using rowSums() & ncol() Functions
newdf <- df[rowSums(is.na(df)) != ncol(df), ]
Sample Data

Here we are creating a dataframe named df for demonstration purpose. This dataframe has 6 observations and 4 columns. Column names are name, sex, score and address.

df <- data.frame(name = c('deeps','sandy', 'david', NA,'preet',NA),
                 sex   = c('Male', 'Male', NA, NA, 'Female',NA),
                 score = c(50, 100, 45, 100, 90, NA),
                 address = c('London', 'Bangalore', NA, NA, NA,NA))

Data are shown in the table below.

name sex score address
deeps Male 50 London
sandy Male 100 Bangalore
david NA 45 NA
NA NA 100 NA
preet Female 90 NA
NA NA NA NA
Example 1: Removing Rows with NAs using na.omit() Function

Here we are using na.omit() function to remove rows that contain any NA values. This function checks each row and removes any row that contains one or more NA values. It returns a subset of the original data frame without the rows that have missing values.

Syntax
newdf <- na.omit(df)
Output
   name  sex score   address
1 deeps Male    50    London
2 sandy Male   100 Bangalore

We have created a new data frame called newdf by removing rows that contain any NA (missing) values from the original data frame df.

Example 2: Removing Rows with NAs using complete.cases() Function

In this example, we will see how to use complete.cases() function to remove rows that contain any NA values.

In R, the complete.cases() function returns TRUE for rows in a data frame which are complete (no missing values). df[complete.cases(df), ] selects all non-missing rows from the original data frame df, effectively removing rows with any missing values.

Syntax
newdf <- df[complete.cases(df), ]
Output
   name  sex score   address
1 deeps Male    50    London
2 sandy Male   100 Bangalore
Example 3: Removing Rows with NAs using rowSums() Function

By using the combination of the rowSums() and is.na() functions, we can remove rows that have at least one NA value.

Syntax
newdf <- df[rowSums(is.na(df)) == 0, ]
Output
   name  sex score   address
1 deeps Male    50    London
2 sandy Male   100 Bangalore
Explanation

Let's understand how code works:

  1. is.na(df) returns TRUE if the corresponding element in df is NA, and FALSE otherwise.
  2. rowSums(is.na(df)) calculates the sum of TRUE values in each row. This gives us a numeric vector with the number of missing values (NAs) in each row of df.
  3. rowSums(is.na(df)) == 0 compares each element of the numeric vector with zero. This results in a logical vector where TRUE indicates that the corresponding row has no missing values (NAs).
  4. df[rowSums(is.na(df)) == 0, ] selects only the rows without any missing values.
Example 4: Removing Rows with NAs using drop_na() Function

To remove rows with any missing values (NAs) from a data frame using the tidyr package, you can use the drop_na() function.

Syntax

If tidyr package is not already installed, you can install it using this command - install.packages("tidyr")

library(tidyr)
newdf <- df %>% drop_na()
Output
   name  sex score   address
1 deeps Male    50    London
2 sandy Male   100 Bangalore

In the code above, the %>% operator is used to pipe the data frame df into the drop_na() function. This function removes any rows containing NAs from the data frame and assigns the result to the new data frame newdf.

Example 5: Removing Rows with Only NAs in a Row using subset() & rowSums() Functions

In this example, we will see how to remove rows from a data frame where all values in a row are missing (NA).

This example is different from the previous examples in the sense that it is about deleting rows where only missing values exist in a row, rather than at least one missing value in a row.
Syntax
newdf <- subset(df, rowSums(is.na(df)) != ncol(df))
Output
removing rows with NA in R

In the code above, the subset() function is used to filter the data frame df based on a specific condition. The condition rowSums(is.na(df)) != ncol(df) is used to check for each row of the data frame if the sum of missing values is not equal to the total number of columns. Rows that meet this condition, i.e., rows without missing values, are kept in the new data frame newdf, while rows with any missing values are removed.

Example 6: Removing Rows with Only NAs in a Row using filter() & rowSums() Functions

In this example, we will see how to remove rows from a data frame where all values in a row are missing (NA) using filter(), rowSums() & is.na() functions.

Syntax

Make sure dplyr package is installed. If not, you can install it using this command - install.packages("dplyr")

library(dplyr)
newdf <- filter(df, rowSums(is.na(df)) != ncol(df))
Output
   name    sex score   address
1 deeps   Male    50    London
2 sandy   Male   100 Bangalore
3 david   <NA>    45      <NA>
4 <NA> <NA>   100      <NA>
5 preet  Female    90      <NA>

This method is very similar to the previous method, with the only difference being that we use the filter() function from the dplyr package instead of the subset() function from Base R. The logic in this method remains the same as the previous method.

Example 7: Removing Rows with Only NAs in a Row using rowSums() & ncol() Functions

To remove rows with only NAs (missing values) in a data frame using the rowSums(), is.na() and ncol() functions, you can use the following code:

Syntax
newdf <- df[rowSums(is.na(df)) != ncol(df), ]
Output
   name    sex score   address
1 deeps   Male    50    London
2 sandy   Male   100 Bangalore
3 david   <NA>    45      <NA>
4 <NA> <NA>   100      <NA>
5 preet  Female    90      <NA>
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

0 Response to "7 Ways to Remove Rows with NA in R"

Post a Comment

Next → ← Prev