How to Check Performance of a Predictive Model

Deepanshu Bhalla 12 Comments , , ,

There are two main measures for assessing performance of a predictive model :

  1. Discrimination
  2. Calibration

These measures are not restricted to logistic regression. They can be used for any classification techniques such as decision tree, random forest, gradient boosting, support vector machine (SVM) etc. The explanation of these two measures are shown below -

1. Discrimination

Discrimination refers to the ability of the model to distinguish between events and non-events.

Area under the ROC curve (AUC / C statistics)

It plots true positive rate (aka Sensitivity) and false positive rate (aka 1-Specificity). Mathematically, It is calculated using the formula below -

AUC = Concordant Percent + 0.5 * Tied Percent
ROC Curve

Concordant : Percentage of pairs where the observation with the desired outcome (event) has a higher predicted probability than the observation without the outcome (non-event).

Discordant : Percentage of pairs where the observation with the desired outcome (event) has a lower predicted probability than the observation without the outcome (non-event).

Tied : Percentage of pairs where the observation with the desired outcome (event) has same predicted probability than the observation without the outcome (non-event).

Rules : AUC
  1. If AUC>= 0.9, the model is considered to have outstanding discrimination. Caution : The model may be faced with problem of over-fitting.
  2. If 0.8 <= AUC < 0.9, the model is considered to have excellent discrimination.
  3. If 0.7<= AUC < 0.8, the model is considered to have acceptable discrimination.
  4. If AUC = 0.5, the model has no discrimination (random case)
  5. If AUC < 0.5, the model is worse than random
How to Calculate Concordance Manually
Gini (Somer's D)

It is a common measure for assessing predictive power of a credit risk model. It measures the degree to which the model has better discrimination power than the model with random scores.

Somer's D = 2 AUC - 1 or Somer's D = (Concordant Percent - Discordant Percent) / 100

It should be greater than 0.4.

Kolmogorov-Smirnoff Statistic (KS)

It looks at maximum difference between distribution of cumulative events and cumulative non-events.

  1. KS statistics should be in top 3 deciles.
  2. KS statistics should be between 40 and 70.
KS Statistics
KS Statistics

In this case, KS is maximum at second decile and KS score is 75.

Calculating KS Test with SAS
Rank Ordering

It implies the model should predict the highest number of events in the first decile and then goes progressively down. For example, there should not be a case that the decile 2 predicts higher number of events than the first decile.

2. Calibration

It is a measure of how close the predicted probabilities are to the actual rate of events.

I. Hosmer and Lemeshow Test (HL)

It measures the association between actual events and predicted probability.

In HL test, null hypothesis states that sample of observed events and non-events supports the claim about the predicted events and non-events. In other words, the model fits data well.

Calculation
  1. Calculate estimated probability of events
  2. Split data into 10 sections based on descending order of probability
  3. Calculate number of actual events and non-events in each section
  4. Calculate Predicted Probability = 1 by averaging probability in each section
  5. Calculate Predicted Probability = 0 by subtracting Predicted Probability=1 from 1
  6. Calculate expected frequency by multiplying number of cases by Predicted Probability = 1
  7. Calculate chi-square statistics taking frequency of observed (actual) and predicted events and non-events
Hosmer Lemeshow Test
Hosmer Lemeshow Test
Rule : If p-value > .05. the model fits data well
II. Deviance and Residual Test

The null hypothesis states the model fits the data well. In other words, null hypothesis is that the fitted model is correct.

Deviance and Residual Test
Deviance and Residual Test
Since p-value is greater than 0.05 for both the tests, we can say the model fits the data well.

In SAS, these tests can be computed by using option scale = none aggregate in PROC LOGISTIC.

III. Brier Score

The Brier score is an important measure of calibration i.e. the mean squared difference between the predicted probability and the actual outcome.

Lower the Brier score is for a set of predictions, the better the predictions are calibrated.
  • If the predicted probability is 1 and it happens, then the Brier Score is 0, the best score achievable.
  • If the predicted probability is 1 and it does not happen, then the Brier Score is 1, the worst score achievable.
  • If the predicted probability is 0.8 and it happens, then the Brier Score is (0.8-1)^2 =0.04.
  • If the predicted probability is 0.2 and it happens, then the Brier Score is (0.2-1)^2 =0.64.
  • If the predicted probability is 0.5, then the Brier Score is (0.5-1)^2 =0.25, irregardless of whether it happens.

By specifying fitstat option in proc logistic, SAS returns Brier score and other fit statistics such as AUC, AIC, BIC etc.

proc logistic data=train;
model y(event="1") = entry;
score data=valid out=valpred fitstat;
run;

A complete assessment of model performance should take into consideration both discrimination and calibration. It is believed that discrimination is more important than calibration.

SAS Macro : Best Model Selection
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 12 Responses to "How to Check Performance of a Predictive Model"
  1. Awesome work man :)....great site keep it up...please add arima also :)

    ReplyDelete
    Replies
    1. Thank you for your appreciation. Check out the series of ARIMA articles -
      http://www.listendata.com/search/label/Time%20Series

      Delete
  2. In the above tabulate of Hosmers lemeshow u were supposed to create 10 deciles but I can see only 8.

    ReplyDelete
    Replies
    1. I have also noticed 8 deciles instead of 10 deciles

      Delete
  3. Thank you for putting this site together. You're explanations are so clear and straight to the point; very helpful.

    ReplyDelete
  4. Cannot open the macro file..is it password protected?

    ReplyDelete
  5. Cannot open the macro file..is it password protected?

    ReplyDelete
  6. Hi, thanks for the post. The file "SAS Macro : Best Model Selection" requires a password. Whats the password ?

    ReplyDelete
  7. Whats the password for the excel macro? if you are not providing the password then why you upload and make visible?

    ReplyDelete
  8. Whats the Password for Macro file

    ReplyDelete
  9. Hi, kindly provide us with the macro passwords fro the excel file

    ReplyDelete
Next → ← Prev