Weight of Evidence and Information Value for Continuous Dependent Variable

In this post, we will cover how you can use Weight of Evidence (WOE) and Information Value (IV) when dependent variable is continuous.

Information Value (IV) is used to measure predictive power of independent variables. It is used as a variable selection technique when dependent variable is binary which means only 2 values. This technique is very popular in banking (risk) analytics when you build credit scoring or customer attrition model.

Weight of Evidence (WOE) is used as a method of transformation of independent variables to build strict linear relationship with log odds (in case of logistic regression). With the use of WOE, we can handle categorical variables which means it avoids the need to create dummy variables in case of categorical variables.

Can I use WOE and IV When target variable is continuous?

Answer is Yes if you modify the original WOE and IV formula.
Modified WOE = ln(%Y / %Obs)
Modified IV = ∑((%Y- %Obs) * Modified WOE)
  1. Split Continuous Independent Variable (x) into 10 or 20 buckets (call variable 'rank'). If you have categorical independent variable, you don't need to split as they are already categorized.
  2. Calculate min and max of x by rank. Compute sum of target variable (y) by rank. Let’s name it as ‘SumY’.
  3. Calculate total count and % of observations falling in each bucket of rank variable
  4. Calculate %Y which is calculated by SumY / ∑SumY
  5. WOE = ln(%Y / %Obs). %Obs represents percentage of observations (calculated in step 3)
  6. IV = ∑((%Y- %Obs) * WOE)
Download Working Excel Workbook Positive WOE means percentage of target variable is higher than the percentage of total observations in the bucket.
When these modified WOE and IV are useful?
Suppose you are building a linear regression model to predict house price. Here you can use these techniques for variable transformation and selection. In credit risk, we can use this technique in LGD and EAD modeling.
How to select variables based on IV
Information Value Variable Predictiveness
Less than 0.02 Not useful for prediction
0.02 to 0.1 Weak predictive Power
0.1 to 0.3 Medium predictive Power
0.3 to 0.5 Strong predictive Power
>0.5 Suspicious Predictive Power
Important Point
Outliers can affect %Y as extreme values would influence and make SumY value high. Hence outliers should be treated before running WOE and IV. This is not the issue in original WOE and IV as percentage of bads and goods are aggregation of 0 and 1.
Related Posts
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in data science. During his tenure, he has worked with global clients in various domains like Banking, Insurance, Telecom and Human Resource.

0 Response to "Weight of Evidence and Information Value for Continuous Dependent Variable"

Post a Comment

Next → ← Prev
Love this Post? Spread the Word!
Share