Missing Value Imputation based on Clustering

Live Online Training : SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects
- Scenario Based Questions
- Job Placement Assistance
- Get 10% off till Oct 26, 2017
- Batch starts from October 28, 2017

The clustering based missing imputation assigns observations to clusters and fill in cluster means for missing observations.

Example 

Suppose the variable X1 is Cost and X2 is Salary. The non-missing cases have been clustered into three clusters.

Now you have a case with a value for Cost but not for Salary.

  • You determine which cluster's mean value is closest to the Cost value. 
  • Suppose this value of Cost is closest to the mean of the second cluster. 
  • To impute the missing value of Salary, you replace it with the mean value of  Salary for the second cluster.

proc fastclus data=training impute outiter outseed=seed out=training1 maxclusters=5;
var outcome survrate prognos amttreat gsi avoid intrus;
run;
Options in PROC FASTCLUS

  1. IMPUTE requests imputation of missing values after the final assignment of observations to clusters. 
  2. OUTITER outputs information from the iteration history to the OUTSEED= data set, including the cluster seeds at each iteration.
  3. OUTSEED= is another name for the MEAN= data set, provided because the data set can contain location estimates other than means.
  4. MAXCLUSTERS= specifies the maximum number of clusters permitted. If you omit the MAXCLUSTERS= option, a value of 100 is assumed.

PROC FASTCLUS is then used to replace the missing values from the validation data set with the cluster means from the training data set computed at the first iteration.
proc fastclus data=valid impute seed=seed(where=(_iter_=1)) replace=none maxclusters=5 out=validate1 maxiter=0;
var outcome survrate prognos amttreat gsi avoid intrus;
run;
SEED= specifies an input data set from which initial cluster seeds are to be selected.

REPLACE= specifies how seed replacement is performed. NONE suppresses the seed replacement.

SAS Tutorials : 100 Free SAS Tutorials


Statistics Tutorials : 50 Statistics Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

0 Response to "Missing Value Imputation based on Clustering"

Post a Comment

Next → ← Prev