SAS : Calculating KS Test

Live Online Training : SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects
- Scenario Based Questions
- Job Placement Assistance
- Get 10% off till Oct 26, 2017
- Batch starts from October 28, 2017

In predictive modeling, it is very important to check whether the model is able to distinguish between events and non-events. There is a performance statistics called "Kolmogorov-Smirnov" (KS) statistics which measures the discriminatory power of a model. It is a very popular metrics in credit risk modeling.

Kolmogorov-Smirnov (KS) Statistics
It looks at maximum difference between distribution of cumulative events and cumulative non-events.
In the image below, KS is 57.8% and it is at third decile.
Calculating KS Statistic
KS curve is shown below. It is drawn by plotting Cumulative % of population. Better the KS, better the model.
KS Test

There are two ways of calculating KS Statistics :
  1. Split predicted probability into 10 parts (decile) and then compute the cumulative % of events and non-events in each decile and check the decile where difference is maximum (as shown above in the image)
  2. Compute KS Two Sample Test with proc npar1way. It generates the difference metrics. See the code below. 

SAS Code : Calculating KS Statistics 
data full;
do i=1 to 1000;
drop i;
proc logistic data= full;
model y2(event="1")=x;
output out=out2 p= pred;
Proc npar1way data=out2 edf;
class y2;
var pred;
KS Output 

The D statistic (highlighted in the image above) is the metrics that is used to report KS score. DO NOT USE "KS" showing in the output table 'K-S Two-Sample Test (Asymptotic)'. The D statistic is the maximum difference between the cumulative distributions between events (Y=1) and non-events (Y=0). In this example, D=0.603.
Higher the value of D, the better the model distinguishes between events and non-events.
Did you notice PROC NPAR1WAY and decile method show different KS score?

PROC NPAR1WAY returns KS around 0.6. Whereas decile method return KS around 0.58 (57.8%). Both are correct in terms of calculation. The real difference is PROC NPAR1WAY calculates score at observation level whereas decile method computes at decile level. There should NOT be a high difference between these two scores.

Related Articles
  1. Model Performance in Logistic Regression
  2. Model Validation in Logistic Regression

SAS Tutorials : 100 Free SAS Tutorials

Statistics Tutorials : 50 Statistics Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

6 Responses to "SAS : Calculating KS Test"

  1. Hi Deepanshu,

    In the 2 sample test, which statistic we can take to show that variable has KS? KS or D?what is D since you highlighted this?

    1. I'll appreciate if you can help me with these statistics - D, KS, KSa.
      Sometimes in Banking industry, we do choose variable with KS greater than 2. Does KSa to be chosen in that case?

  2. But why there is such a big difference in KS value, 75 vs 60?

    1. It is 58 vs 60. 75 was based on a different dataset. The first image was just to show how KS works. I forgot to update it when i added PROC NPAR1WAY in the article. Thanks!


Next → ← Prev