Population Stability Index and Characteristic Analysis

This tutorial describes the meaning and use of Population Stability Index and Characteristic Analysis.

In simple words, Population Stability Index (PSI) compares the distribution of a scoring variable (predicted probability) in scoring data set to a training data set that was used to develop the model. The idea is to check "How the current scoring is compared to the predicted probability from training data set".

Use of Population Stability Index (PSI)

There are multiple uses of Population Stability Index (PSI). They are listed below -

Model might be influenced by economic changes. Suppose you built a risk model during economic recession (year 2008) and you are using the same model to score datasets in year 2016. There is a high chance that various attributes of the model are changed drastically over last 8 years. It means it does not make sense to use this model anymore if features of the model are changed significantly.
Change in product offerings due to internal policy changes. For example, one of your product are relaunched recently so attributes may behave differently as compared to attributes of your model.
PSI can detect if any data integration or programming issues to run the scoring code.

How PSI is calculated?

PSI = (% of records based on scoring variable in Scoring Sample (A) - % of records based on scoring variable in Training Sample (B)) * In(A/ B)

Steps to calculate PSI

Sort scoring variable on descending order in scoring sample
Split the data into 10 or 20 groups (deciling)
Calculate % of records in each group based on scoring sample
Calculate % of records in each group based on training sample
Calculate difference between Step 3 and Step 4
Take Natural Log of (Step3 / Step4)
Multiply Step5 and Step6

Population Stability Index

Rules : Population Stability Index (PSI)

PSI < 0.1 - No change. You can continue using existing model.
PSI >=0.1 but less than 0.2 - Slight change is required.
PSI >=0.2 - Significant change is required. Ideally, you should not use this model any more.

To understand the cause of a change, we need to generate the characteristic analysis report.

Note : We can use chi-square for binned data or KS test to compare distribution of two data sets.

Characteristic Analysis

It answers which variable is causing a shift in population distribution. It compares the distribution of an independent variable in the scoring data set to a development data set. It detects shifts in the distributions of input variables that are submitted for scoring over time.

It helps to determine which changing variable is most influential in causing the model score shift.

Most Important Point-

Check the direction of impact due to model variable shifts.

Check the signs of the shifted attributes and the average values of those attributes compared to those from the previously scored population or development sample. This will indicate whether the model attribute shifts are increasing or decreasing the model scores.

About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

While I love having friends who agree, I only learn from those who don't
Let's Get Connected Email LinkedIn

Post Comment 15 Responses to "Population Stability Index and Characteristic Analysis"

Aditya ModakApril 26, 2017 at 2:40 AM
Thanks for this article !
UnknownJanuary 4, 2018 at 11:02 AM
Good article as well
AnonymousFebruary 9, 2018 at 4:21 PM
thumbs up
AnonymousMarch 28, 2018 at 5:44 AM
Is there any rule for characteristic analysis just like mentioned in PSI.
UnknownAugust 24, 2018 at 1:30 AM
Hello sir,
Very nicely explaned.. i am big friend of listendata.. I have 2 questions:
1) psi>.25 then what will be next step.. what changes we will do to bring it back to acceptable range..

2) If VSI> .25 then are we going to remove that variable

UnknownNovember 11, 2019 at 10:12 PM
Hello,
Why here uses PSI>0,2.
value of 0.2 based on what?
AnonymousJanuary 29, 2020 at 10:34 AM
HI Deepanshu,

I really love your blog.So is it possible for you to share a python script which calculate the population stability index(PSI) automatically.
AnonymousJune 17, 2020 at 4:34 AM
Is it possible to calculate csi in python
AnonymousJune 25, 2020 at 1:58 PM
Can you share CSI Code in Python?
Pavan kumarJuly 27, 2020 at 11:22 PM
is it possible to built psi without training data
UnknownNovember 7, 2020 at 1:41 AM
so easy to understand and informative!
SureshNovember 20, 2020 at 9:05 AM
I was trying to use the PSI on the binary variables(0, 1), but it's not working as expected.
I have below PSI on binary variables(bin 1: value 1, bin 2: value 0), bin 1 and 2 changed significantly. But the PSI value is very less.

bin Expected Actual % Expected % Actual PSI
1 25,034 7,223 0.0321052 0.0127180 0.0179527
2 754,714 560,714 0.9678948 0.9872820 0.0003845
Total 779,748 567,937 1.0000000 1.0000000 0.0183372
UnknownJuly 15, 2021 at 4:30 AM
Could you suggest how to find an optimal size of the data for which we want to measure PSI w.r.t. training set?