Gain and Lift charts are used to evaluate performance of classification model. They measure how much better one can expect to do with the predictive model comparing without a model. It's a very popular metrics in marketing analytics. It's not just restricted to marketing analysis. It can be used in other domains as well such as risk modeling, supply chain analytics etc. It also helps to find the best predictive model among multiple challenger models. In this tutorial, we will see how gain and lift metrics are calculated along with their interpretation.
Gain / Lift Analysis
- Randomly split data into two samples: 70% = training sample, 30% = validation sample.
- Score (predicted probability) the validation sample using the response model under consideration.
- Rank the scored file, in descending order by estimated probability
- Split the ranked file into 10 sections (deciles)
- Number of observations in each decile
- Number of actual events in each decile
- Number of cumulative actual events in each decile
- Percentage of cumulative actual events in each decile. It is called Gain Score.
- Divide the gain score by % of data used in each portion of 10 bins. For example, in second decile, divide gain score by 20.
Gain at a given decile level is the ratio of cumulative number of targets (events) up to that decile to the total number of targets (events) in the entire data set
Interpretation:
% of targets (events) covered at a given decile level. For example, 80% of targets covered in top 20% of data based on model. In the case of propensity to buy model, we can say we can identify and target 80% of customers who are likely to buy the product by just sending email to 20% of total customers.
Lift
It measures how much better one can expect to do with the predictive model comparing without a model. It is the ratio of gain % to the random expectation % at a given decile level. The random expectation at the xth decile is x%.
Interpretation:
The Cum Lift of 4.03 for top two deciles, means that when selecting 20% of the records based on the model, one can expect 4.03 times the total number of targets (events) found by randomly selecting 20%-of-file without a model.
Very helpful, thank you. But I disagree with your interpretation, however. It looks like the cumulative lift for the first two deciles is actually 4.03x the random data, and not 2.35x, correct? Or am I misinterpreting something?
ReplyDeleteThanks for pointing it out. I fixed it. Interpretation was correct but it was not sync with the lift chart shown above.
DeleteExcellent overview. Thank you.
ReplyDeleteThank you Deepanshu for the detailed explanation.
ReplyDeleteThanks alot Deepanshu.
ReplyDeleteVery well summarized. Thank you.
ReplyDeleteThanks for the post. Can you please also share how we can plot this in SAS?
ReplyDeleteThank You Deepanshu, where would i be without you ...
ReplyDeleteHaha. You are welcome Aveek :-)
DeleteWhat is lift vs cumulative lift ?
ReplyDeleteThanks Deepanshu. your article helped me to understand gain and lift charts clearly.
ReplyDelete