Information Value (IV) is used to measure predictive power of independent variables. It is used as a variable selection technique when dependent variable is binary which means only 2 values. This technique is very popular in banking (risk) analytics when you build credit scoring or customer attrition model.
Weight of Evidence (WOE) is used as a method of transformation of independent variables to build strict linear relationship with log odds (in case of logistic regression). With the use of WOE, we can handle categorical variables which means it avoids the need to create dummy variables in case of categorical variables.
Can I use WOE and IV When target variable is continuous?
Answer is Yes if you modify the original WOE and IV formula.Modified WOE = ln(%Y / %Obs)
Modified IV = ∑((%Y- %Obs) * Modified WOE)
- Split Continuous Independent Variable (x) into 10 or 20 buckets (call variable 'rank'). If you have categorical independent variable, you don't need to split as they are already categorized.
- Calculate min and max of x by rank. Compute sum of target variable (y) by rank. Let’s name it as ‘SumY’.
- Calculate total count and % of observations falling in each bucket of rank variable
- Calculate %Y which is calculated by SumY / ∑SumY
- WOE = ln(%Y / %Obs). %Obs represents percentage of observations (calculated in step 3)
- IV = ∑((%Y- %Obs) * WOE)
Information Value | Variable Predictiveness |
---|---|
Less than 0.02 | Not useful for prediction |
0.02 to 0.1 | Weak predictive Power |
0.1 to 0.3 | Medium predictive Power |
0.3 to 0.5 | Strong predictive Power |
>0.5 | Suspicious Predictive Power |
What are the treatments for variable with information value more than 0.5?
ReplyDeleteCould you please post SAS code, similar to what you have posted for binary dependent variable?
ReplyDeletePlease share sas code to understand better
ReplyDeleteI see people posting requests for SAS-code. My guess is, that the reason there isn't any is, that PROC HPBIN can't do it natively. The algorithm itself is simple to implement in SAS, but I can't seem to make the automagic PROC HPBIN approach work. Have anyone succeeded?
ReplyDelete