In predictive modeling, it is very important to check whether the model is able to distinguish between events and non-events. There is a performance statistics called "Kolmogorov-Smirnov" (KS) statistics which measures the discriminatory power of a model. It is a very popular metrics in credit risk modeling.

PROC NPAR1WAY returns KS around 0.6. Whereas decile method return KS around 0.58 (57.8%). Both are correct in terms of calculation. The real difference is PROC NPAR1WAY calculates score at observation level whereas decile method computes at decile level. There should NOT be a high difference between these two scores.

**Kolmogorov-Smirnov (KS) Statistics**It looks at maximum difference between distribution of cumulative events and cumulative non-events.In the image below, KS is 57.8% and it is at third decile.

Calculating KS Statistic |

**There are two ways of calculating KS Statistics :**- Split predicted probability into 10 parts (decile) and then compute the cumulative % of events and non-events in each decile and check the decile where difference is maximum (as shown above in the image)
- Compute KS Two Sample Test with
**proc npar1way.**It generates the difference metrics. See the code below.

**SAS Code : Calculating KS Statistics**

The D statistic (highlighted in the image above) is the metrics that is used to report KS score. DO NOT USE "KS" showing in the output table 'K-S Two-Sample Test (Asymptotic)'. The D statistic is the maximum difference between the cumulative distributions between events (Y=1) and non-events (Y=0). In this example, D=0.603.

Higher the value of D, the better the model distinguishes between events and non-events.

**Did you notice PROC NPAR1WAY and decile method show different KS score?**

PROC NPAR1WAY returns KS around 0.6. Whereas decile method return KS around 0.58 (57.8%). Both are correct in terms of calculation. The real difference is PROC NPAR1WAY calculates score at observation level whereas decile method computes at decile level. There should NOT be a high difference between these two scores.

**Related Articles**

nice

ReplyDeleteHi Deepanshu,

ReplyDeleteIn the 2 sample test, which statistic we can take to show that variable has KS? KS or D?what is D since you highlighted this?

I'll appreciate if you can help me with these statistics - D, KS, KSa.

DeleteSometimes in Banking industry, we do choose variable with KS greater than 2. Does KSa to be chosen in that case?

It is the D Statistic we use.

DeleteBut why there is such a big difference in KS value, 75 vs 60?

ReplyDeleteIt is 58 vs 60. 75 was based on a different dataset. The first image was just to show how KS works. I forgot to update it when i added PROC NPAR1WAY in the article. Thanks!

Delete