In R, the mice package has features of imputing missing values on mixed data.
Variable Type with Missing Imputation Methods
If nothing is specified in the method option (as shown in the above example), it checks, by default, the variable type and applies missing imputation method based on the type of variable.
Variable Type with Missing Imputation Methods
- For Continuous Data - Predictive mean matching, Bayesian linear regression, Linear regression ignoring model error, Unconditional mean imputation etc.
- For Binary Data - Logistic Regression, Logistic regression with bootstrap
- For Categorical Data (More than 2 categories) - Polytomous logistic regression, Proportional odds model etc,
- For Mixed Data (Can work for both Continuous and Categorical) - CART, Random Forest, Sample (Random sample from the observed values)
anscombe <- within(anscombe, {Important Points:
y1[1:3] <- NA
y4[3:5] <- NA
})
imp = mice(anscombe)
imp1 = complete(imp)
- By default, the "mice" function creates multiple level (k=5) imputation.
- The "complete" function is used to prepare your final data with imputation. By default, it picks first level imputation scores.
Custom mice function
imp = mice(anscombe, m=1)Default settings in the mice package
imp1 = complete(imp, 1)
If nothing is specified in the method option (as shown in the above example), it checks, by default, the variable type and applies missing imputation method based on the type of variable.
- Predictive mean matching (continuous data)
- Logistic regression imputation (binary data, factor with 2 levels)
- Polytomous regression imputation for unordered categorical data (factor>= 2 levels)
- Proportional odds model (ordered, >= 2 levels)
CART : Imputation Algorithm
imp = mice(anscombe, meth = "cart", minbucket = 5)Random Forest : Imputation Algorithm
imp1 = complete(imp)
Simulations by Shah (Feb 13, 2014) suggested that the quality of the imputation for 10 and 100 trees was identical, so mice 2.22 changed the default number of trees from ntree = 100 to ntree = 10.
imp = mice(anscombe, meth = "rf", ntree = 10)Important Note : You can ignore minbucket and ntree in the above code. The package can take default values.
imp1 = complete(imp)
Share Share Tweet