Calculate Hosmer Lemeshow (HL) Test with Excel

Deepanshu Bhalla 3 Comments
Hosmer and Lemeshow Test (HL)

It measures the association between actual events and predicted probability. In other words, it is a measure of how close the predicted probabilities are to the actual rate of events.

In HL test, null hypothesis states that sample of observed events and non-events supports the claim about the predicted events and non-events. In other words, the model fits data well.

Steps to Calculate Hosmer Lemeshow Test
  1. You need to have two variables - Dependent Variable and Probability. Fit the binary classification model (like logistic regression) to your data and get the estimated probability for each observations.

    Raw Data
  2. Split data into 10 sections based on ascending order of probability.

    In Excel, let's say you have probability values in cells C3:C4692. Create a new variable called "Group" and then enter the following formula in cell D3 and then paste it down to the last observation.

    =ROUND(RANK(C3,$C$3:$C$4692)/COUNT($C$3:$C$4692)*10,0)+1
    Creating Groups
  3. Calculate the number of actual events and non-events in each group.

    In Excel, enter unique values of column D from cell F4.

    events and non-events in each group

    In cell G4, enter the following formula and then paste it down.

    =COUNTIFS($D$3:$D$4692,F4,$B$3:$B$4692,1)

    In cell H4, enter the following formula and then paste it down.

    =COUNTIFS($D$3:$D$4692,F4,$B$3:$B$4692,0)
  4. Calculate Predicted Probability = 1 by averaging probability in each group. In cell I4, enter the following formula and then paste it down.
    =AVERAGEIF($D$3:$D$4692,F4,$C$3:$C$4692)
  5. Calculate Predicted Probability = 0 by subtracting Predicted Probability=1 from 1. In cell J4, enter the following formula and then paste it down.
    =1-I4
    Calculate Predicted Probability
  6. Calculate expected frequency by multiplying number of cases by Predicted Probability = 1.

    In cell K4, enter the following formula and then paste it down.

    =I4*SUM(G4:H4)

    In cell L4, enter the following formula and then paste it down.

    =J4*SUM(G4:H4)
  7. Calculate chi-square statistics by taking frequency of observed (actual) and predicted events and non-events.

    (a) Calculate the HL Statistics : In cell M4, enter the following formula and then paste it down.

    =((G4-K4)^2/K4)+((H4-L4)^2/L4)

    (b) Chi-square is calculated by summing the HL Statistic using the formula =SUM(M4:M11).

    (c) P-value for the Chi-square is calculated using the following formula :

    =CHIDIST(SUM(M4:M11),COUNT(F4:F11)-2)
Calculate Hosmer Lemeshow (HL) Test with Excel
Hosmer Lemeshow Test
Rule : If p-value > .05, the model fits data well.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 3 Responses to "Calculate Hosmer Lemeshow (HL) Test with Excel"
  1. HI Deepanshu I was looking at the attached excel for HL in excel. In the column H, you do an average of all probability (irrespective of dependent variable =0 or 1) and then put that average value in Column I. I was wondering if that is the correct way or we need to filter only those probabilities where dependent variable =1 and then average those probabilities?

    ReplyDelete
  2. Thank you very much for posting such a wonderfull tool, it saves a lot of programming and accelerates the work however, I integrated it on my data set and I'm receiving two errors: I have 11 groups instead of 10 and there are negative values. I'm not sure if I missed something during the adaptation or if the problem arises of the wide probabilities range. Would it be possible for you to check my instalation? Thank you very much.

    ReplyDelete
Next → ← Prev