Missing Imputation with MICE Package in R

Live Online Training : Data Science with R

- Explain Advanced Algorithms in Simple English
- Live Projects
- Case Studies
- Job Placement Assistance
- Get 10% off till Oct 26, 2017
- Batch starts from October 28, 2017

In R, the mice package has features of imputing missing values on mixed data.

Variable Type with Missing Imputation Methods
  1. For Continuous Data - Predictive mean matching, Bayesian linear regression, Linear regression ignoring model error, Unconditional mean imputation etc.
  2. For Binary Data - Logistic Regression, Logistic regression with bootstrap
  3. For Categorical Data (More than 2 categories) - Polytomous logistic regression, Proportional odds model etc,
  4. For Mixed Data (Can work for both Continuous and Categorical) - CART, Random Forest, Sample (Random sample from the observed values)
anscombe <- within(anscombe, {
y1[1:3] <- NA
y4[3:5] <- NA
})
imp = mice(anscombe)
imp1 = complete(imp)
Important Points:
  1. By default, the "mice" function creates multiple level (k=5) imputation.
  2. The "complete" function is used to prepare your final data with imputation. By default, it picks first level imputation scores.

Custom mice function
imp = mice(anscombe, m=1)
imp1 = complete(imp, 1)
Default settings in the mice package

If nothing is specified in the method option (as shown in the above example), it checks, by default, the variable type and applies missing imputation method based on the type of variable.
  1. Predictive mean matching (continuous data)
  2. Logistic regression imputation (binary data, factor with 2 levels)
  3. Polytomous regression imputation for unordered categorical data (factor>= 2 levels)
  4. Proportional odds model (ordered, >= 2 levels)

CART : Imputation Algorithm
imp = mice(anscombe, meth = "cart", minbucket = 5)
imp1 = complete(imp)
Random Forest : Imputation Algorithm

Simulations by Shah (Feb 13, 2014) suggested that the quality of the imputation for 10 and 100 trees was identical, so mice 2.22 changed the default number of trees from ntree = 100 to ntree = 10.
imp = mice(anscombe, meth = "rf", ntree = 10)
imp1 = complete(imp)
Important Note : You can ignore minbucket and ntree in the above code. The package can take default values.

R Tutorials : 75 Free R Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

0 Response to "Missing Imputation with MICE Package in R"

Post a Comment

Next → ← Prev