Check Predictive Model Performance

Live Online Training : SAS Programming with 50+ Case Studies

- Explain Programming Concepts in Simple English
- Live Projects
- Scenario Based Questions
- Job Placement Assistance
- Get 10% off till Sept 25, 2017
- Batch starts from October 8, 2017

There are two main measures for assessing performance of a predictive model: Discrimination and Calibration. These measures are not restricted to logistic regression. They can be used for other classification techniques as well such as decision tree, random forest, gradient boosting, support vector machine (SVM) etc. The explanation of these two measures are shown below -

1. Discrimination

Discrimination refers to the ability of the model to distinguish between events and non-events.

Area under the ROC curve (AUC / C statistics)

It plots true positive rate (aka Sensitivity) and false positive rate (aka 1-Specificity). Mathematically, It is calculated using the formula below -
AUC = Concordant Percent + 0.5 * Tied Percent
ROC Curve 
Concordant : Percentage of pairs where the observation with the desired outcome (event) has a higher predicted probability than the observation without the outcome (non-event).

Discordant : Percentage of pairs where the observation with the desired outcome (event) has a lower predicted probability than the observation without the outcome (non-event).

Tied : Percentage of pairs where the observation with the desired outcome (event) has same predicted probability than the observation without the outcome (non-event).


Rules : 

If C>=  0.9, the model is considered to have outstanding discrimination.
Caution : The model may be faced with problem of over-fitting.

If 0.8 <= C < 0.9, the model is considered to have excellent discrimination; 
If 0.7<=C <  0.8, the model is considered to have acceptable discrimination; 
If C = 0.5, the model has no discrimination (random case)
If C < 0.5, the model is worse than random
Gini (Somer's D)

It is a common measure for assessing predictive power of a credit risk model. It measures the degree to which the model has better discrimination power than the model with random scores.
Somer's D  = 2 AUC  - 1
or
Somer's D =  (Concordant Percent  - Discordant Percent) / 100
It should be greater than 0.4.


Kolmogorov-Smirnoff Statistic (KS)

It looks at maximum difference between distribution of cumulative events and cumulative non-events.
  1. KS statistics should be in top 3 deciles.
  2. KS statistics should be between 40 and 70.
KS Statistics
In this case, KS is maximum at second decile and KS score is 75.

It implies the model should predict the highest number of events in the first decile and then goes progressively down. For example, there should not be a case that the decile 2 predicts higher number of events than the first decile.


2. Calibration

It is a measure of how close the predicted probabilities are to the actual rate of events.

I. Hosmer and Lemeshow Test (HL)

It measures the association between actual events and predicted probability.

In HL test, null hypothesis states that sample of observed events and non-events supports the claim about the predicted events and non-events. In other words, the model fits data well.

Calculation
  1. Calculate estimated probability of events
  2. Split data into 10 sections based on descending order of probability
  3. Calculate number of actual events and non-events in each section
  4. Calculate Predicted Probability = 1 by averaging probability in each section
  5. Calculate Predicted Probability = 0 by subtracting Predicted Probability=1 from 1
  6. Calculate expected frequency by multiplying number of cases by Predicted Probability = 1
  7. Calculate chi-square statistics taking frequency of observed (actual) and predicted events and non-events
Hosmer Lemeshow Test
Rule : If p-value > .05. the model fits data well

II. Deviance and Residual Test

The null hypothesis states the model fits the data well. In other words, null hypothesis is that the fitted model is correct.
Deviance and Residual Test
Since p-value is greater than 0.05 for both the tests, we can say the model fits the data well.
In SAS, these tests can be computed by using option scale = none aggregate in PROC LOGISTIC.

III. Brier Score

The Brier score is an important measure of calibration i.e. the mean squared difference between the predicted probability and the actual outcome.
Lower the Brier score is for a set of predictions, the better the predictions are calibrated.
  • If the predicted probability is 1 and it happens, then the Brier Score is 0, the best score achievable.
  • If the predicted probability is 1 and it does not happen, then the Brier Score is 1, the worst score achievable.
  • If the predicted probability is 0.8 and it happens, then the Brier Score is (0.8-1)^2 =0.04.
  • If the predicted probability is 0.2 and it happens, then the Brier Score is (0.2-1)^2 =0.64.
  • If the predicted probability is 0.5, then the Brier Score is (0.5-1)^2 =0.25, irregardless of whether it happens.
By specifying fitstat option in proc logistic, SAS returns Brier score and other fit statistics such as AUC, AIC, BIC etc.
proc logistic data=train;
model y(event="1") = entry;
score data=valid out=valpred fitstat;
run;

A complete assessment of model performance should take into consideration both discrimination and calibration. It is believed that discrimination is more important than calibration.

SAS Macro : Best Model Selection

SAS Tutorials : 100 Free SAS Tutorials

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has close to 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like retail and commercial banking, Telecom, HR and Automotive.


While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Get Free Email Updates :
*Please confirm your email address by clicking on the link sent to your Email*

Related Posts:

8 Responses to "Check Predictive Model Performance"

  1. Awesome work man :)....great site keep it up...please add arima also :)

    ReplyDelete
    Replies
    1. Thank you for your appreciation. Check out the series of ARIMA articles -
      http://www.listendata.com/search/label/Time%20Series

      Delete
  2. In the above tabulate of Hosmers lemeshow u were supposed to create 10 deciles but I can see only 8.

    ReplyDelete
  3. Thank you for putting this site together. You're explanations are so clear and straight to the point; very helpful.

    ReplyDelete
  4. Cannot open the macro file..is it password protected?

    ReplyDelete
  5. Cannot open the macro file..is it password protected?

    ReplyDelete
  6. Hi, thanks for the post. The file "SAS Macro : Best Model Selection" requires a password. Whats the password ?

    ReplyDelete
  7. Whats the password for the excel macro? if you are not providing the password then why you upload and make visible?

    ReplyDelete

Next → ← Prev