Impute Missing Values in R with Examples

Deepanshu Bhalla Add Comment

This post explains how to replace missing values in R, along with examples.

Create a Sample Dataset

Let's create a sample data frame with missing values.

set.seed(123)
sample_data <- data.frame(
  col1 = sample(c(1:10, NA), 10, replace = TRUE),
  col2 = sample(c(20:30, NA), 10, replace = TRUE),
  col3 = sample(c(100:110, NA), 10, replace = TRUE),
  col4 = sample(c(1:10, NA), 10, replace = TRUE),
  col5 = sample(c(5:15, NA), 10, replace = TRUE),
  col6 = sample(LETTERS[1:6], 10, replace = TRUE)
)
R Function to Impute Missing Values

The following R function called "impute" is to impute missing values in a dataframe. The function accepts two arguments: "data" (the dataset) and "method" (the imputation method). The function uses loop that iterates over numeric columns. It is useful when you want to quickly fill missing values with mean, median or a simple value like 0.

impute <- function(data, method) {
  for (i in which(sapply(data, is.numeric))) {
    if (!(mode(method) == "function")) {
      data[is.na(data[, i]), i] <- method
    } else {
      data[is.na(data[, i]), i] <- method(data[, i],  na.rm = TRUE)
    }
  }
  return(data)
}

How to impute missing values with median

mydata <- impute(sample_data, median)

How to impute missing values with mean

mydata <- impute(sample_data, mean)

How to impute missing values with zero in R

mydata <- impute(sample_data, 0)

Similary if you want to replace missing values with 99, you can set 99 in the second argument of function.

mydata <- impute(sample_data, 99)
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 0 Response to "Impute Missing Values in R with Examples"
Next → ← Prev