Understand Gain and Lift Charts

Deepanshu Bhalla 14 Comments ,
Gain and Lift charts are used to evaluate performance of classification model. They measure how much better one can expect to do with the predictive model comparing without a model. It's a very popular metrics in marketing analytics. It's not just restricted to marketing analysis. It can be used in other domains as well such as risk modeling, supply chain analytics etc. It also helps to find the best predictive model among multiple challenger models. In this tutorial, we will see how gain and lift metrics are calculated along with their interpretation.

Gain / Lift Analysis
  1. Randomly split data into two samples: 70% = training sample, 30% = validation sample. 
  2. Score (predicted probability) the validation sample using the response model under consideration. 
  3. Rank the scored file, in descending order by estimated probability 
  4. Split the ranked file into 10 sections (deciles) 
  5. Number of observations in each decile 
  6. Number of actual events in each decile 
  7. Number of cumulative actual events in each decile 
  8. Percentage of cumulative actual events in each decile. It is called Gain Score. 
  9. Divide the gain score by % of data used in each portion of 10 bins. For example, in second decile, divide gain score by 20.
Gain and Lift Table

Gain

Gain at a given decile level is the ratio of cumulative number of targets (events) up to that decile to the total number of targets (events) in the entire data set

Interpretation: 

% of targets (events) covered at a given decile level. For example,  80% of targets covered in top 20% of data based on model. In the case of propensity to buy model, we can say we can identify and target 80% of customers who are likely to buy the product by just sending email to 20% of total customers.


Lift

It measures how much better one can expect to do with the predictive model comparing without a model. It is the ratio of gain % to the random expectation % at a given decile level. The random expectation at the xth decile is x%.

Interpretation: 

The Cum Lift of 4.03 for top two deciles, means that when selecting 20% of the records based on the model, one can expect 4.03 times the total number of targets (events) found by randomly selecting 20%-of-file without a model.

Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 14 Responses to "Understand Gain and Lift Charts"
  1. Very helpful, thank you. But I disagree with your interpretation, however. It looks like the cumulative lift for the first two deciles is actually 4.03x the random data, and not 2.35x, correct? Or am I misinterpreting something?

    ReplyDelete
    Replies
    1. Thanks for pointing it out. I fixed it. Interpretation was correct but it was not sync with the lift chart shown above.

      Delete
  2. Excellent overview. Thank you.

    ReplyDelete
  3. Thank you Deepanshu for the detailed explanation.

    ReplyDelete
  4. Very well summarized. Thank you.

    ReplyDelete
  5. Thanks for the post. Can you please also share how we can plot this in SAS?

    ReplyDelete
  6. Thank You Deepanshu, where would i be without you ...

    ReplyDelete
  7. What is lift vs cumulative lift ?

    ReplyDelete
  8. Thanks Deepanshu. your article helped me to understand gain and lift charts clearly.

    ReplyDelete
  9. is anyone of them multiplicative? or both are additive?

    ReplyDelete
  10. Thank you so much. it is really helpful.

    ReplyDelete
  11. This is really helpful, however I had a doubt, please correct me if I am wrong

    The random model's positive expectation of xth decile should be calculated as x*10% right
    Ex: 2nd decile, random expectation should be 20% of total positive responses

    Please pardon if this doesn't seems correct

    ReplyDelete
Next → ← Prev