This post explains how to replace missing values in R, along with examples.
Create a Sample Dataset
Let's create a sample data frame with missing values.
set.seed(123) sample_data <- data.frame( col1 = sample(c(1:10, NA), 10, replace = TRUE), col2 = sample(c(20:30, NA), 10, replace = TRUE), col3 = sample(c(100:110, NA), 10, replace = TRUE), col4 = sample(c(1:10, NA), 10, replace = TRUE), col5 = sample(c(5:15, NA), 10, replace = TRUE), col6 = sample(LETTERS[1:6], 10, replace = TRUE) )
The following R function called "impute" is to impute missing values in a dataframe. The function accepts two arguments: "data" (the dataset) and "method" (the imputation method). The function uses loop that iterates over numeric columns. It is useful when you want to quickly fill missing values with mean, median or a simple value like 0.
impute <- function(data, method) { for (i in which(sapply(data, is.numeric))) { if (!(mode(method) == "function")) { data[is.na(data[, i]), i] <- method } else { data[is.na(data[, i]), i] <- method(data[, i], na.rm = TRUE) } } return(data) }
How to impute missing values with median
mydata <- impute(sample_data, median)
How to impute missing values with mean
mydata <- impute(sample_data, mean)
How to impute missing values with zero in R
mydata <- impute(sample_data, 0)
Similary if you want to replace missing values with 99, you can set 99 in the second argument of function.
mydata <- impute(sample_data, 99)
Share Share Tweet