Weight of Evidence and Information Value for Continuous Dependent Variable

Deepanshu Bhalla 4 Comments
In this post, we will cover how you can use Weight of Evidence (WOE) and Information Value (IV) when dependent variable is continuous.

Information Value (IV) is used to measure predictive power of independent variables. It is used as a variable selection technique when dependent variable is binary which means only 2 values. This technique is very popular in banking (risk) analytics when you build credit scoring or customer attrition model.

Weight of Evidence (WOE) is used as a method of transformation of independent variables to build strict linear relationship with log odds (in case of logistic regression). With the use of WOE, we can handle categorical variables which means it avoids the need to create dummy variables in case of categorical variables.

Can I use WOE and IV When target variable is continuous?

Answer is Yes if you modify the original WOE and IV formula.
Modified WOE = ln(%Y / %Obs)
Modified IV = ∑((%Y- %Obs) * Modified WOE)
  1. Split Continuous Independent Variable (x) into 10 or 20 buckets (call variable 'rank'). If you have categorical independent variable, you don't need to split as they are already categorized.
  2. Calculate min and max of x by rank. Compute sum of target variable (y) by rank. Let’s name it as ‘SumY’.
  3. Calculate total count and % of observations falling in each bucket of rank variable
  4. Calculate %Y which is calculated by SumY / ∑SumY
  5. WOE = ln(%Y / %Obs). %Obs represents percentage of observations (calculated in step 3)
  6. IV = ∑((%Y- %Obs) * WOE)
Download Working Excel Workbook Positive WOE means percentage of target variable is higher than the percentage of total observations in the bucket.
When these modified WOE and IV are useful?
Suppose you are building a linear regression model to predict house price. Here you can use these techniques for variable transformation and selection. In credit risk, we can use this technique in LGD and EAD modeling.
How to select variables based on IV
Information Value Variable Predictiveness
Less than 0.02 Not useful for prediction
0.02 to 0.1 Weak predictive Power
0.1 to 0.3 Medium predictive Power
0.3 to 0.5 Strong predictive Power
>0.5 Suspicious Predictive Power
Important Point
Outliers can affect %Y as extreme values would influence and make SumY value high. Hence outliers should be treated before running WOE and IV. This is not the issue in original WOE and IV as percentage of bads and goods are aggregation of 0 and 1.
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

4 Responses to "Weight of Evidence and Information Value for Continuous Dependent Variable"
  1. What are the treatments for variable with information value more than 0.5?

    ReplyDelete
  2. Could you please post SAS code, similar to what you have posted for binary dependent variable?

    ReplyDelete
  3. Please share sas code to understand better

    ReplyDelete
  4. I see people posting requests for SAS-code. My guess is, that the reason there isn't any is, that PROC HPBIN can't do it natively. The algorithm itself is simple to implement in SAS, but I can't seem to make the automagic PROC HPBIN approach work. Have anyone succeeded?

    ReplyDelete
Next → ← Prev