The clustering based missing imputation assigns observations to clusters and fill in cluster means for missing observations.

Suppose the variable X1 is Cost and X2 is Salary. The non-missing cases have been clustered into three clusters.

Now you have a case with a value for Cost but not for Salary.

**Example**Suppose the variable X1 is Cost and X2 is Salary. The non-missing cases have been clustered into three clusters.

Now you have a case with a value for Cost but not for Salary.

- You determine which cluster's mean value is closest to the Cost value.
- Suppose this value of Cost is closest to the mean of the second cluster.
- To impute the missing value of Salary, you replace it with the mean value of Salary for the second cluster.

proc fastclus data=training impute outiter outseed=seed out=training1 maxclusters=5;

var outcome survrate prognos amttreat gsi avoid intrus;

run;

**Options in PROC FASTCLUS**

**IMPUTE**requests imputation of missing values after the final assignment of observations to clusters.**OUTITER**outputs information from the iteration history to the OUTSEED= data set, including the cluster seeds at each iteration.**OUTSEED=**is another name for the MEAN= data set, provided because the data set can contain location estimates other than means.**MAXCLUSTERS=**specifies the maximum number of clusters permitted. If you omit the MAXCLUSTERS= option, a value of 100 is assumed.

**PROC FASTCLUS**is then used to replace the missing values from the**validation data set**with the cluster means from the**training data set**computed at the first iteration.proc fastclus data=valid impute seed=seed(where=(_iter_=1)) replace=none maxclusters=5 out=validate1 maxiter=0;

var outcome survrate prognos amttreat gsi avoid intrus;

run;

**SEED=**specifies an input data set from which initial cluster seeds are to be selected.**REPLACE=**specifies how seed replacement is performed. NONE suppresses the seed replacement.
## Post a Comment