Weight of Evidence (WOE) and Information Value Explained

Logistic regression model is one of the most commonly used statistical technique for solving binary classification problem. It is acceptable in almost all the domains. These two concepts - weight of evidence (WOE) and information value (IV) evolved from the same logistic regression technique. These two terms have been in existence in credit scoring world for more than 4-5 decades. They have been used as a benchmark to screen variables in the credit risk modeling projects such as probability of default. They help to explore data and screen variables. It is also used in marketing analytics project such as customer attrition model, campaign response model etc.

Weight of Evidence (WOE)

The weight of evidence tells the predictive power of an independent variable in relation to the dependent variable.

Since it evolved from credit scoring world, it is generally described as a measure of the separation of good and bad customers. "Bad Customers" refers to the customers who defaulted on a loan. and "Good Customers" refers to the customers who paid back loan.
 WOE Calculation
Distribution of Goods - % of Good Customers in a particular group
Distribution of Bads -   % of Bad Customers in a particular group
Many people do not understand the terms goods/bads as they are from different background than the credit risk. It's good to understand the concept of WOE in terms of events and non-events. It is calculated by taking the natural logarithm (log to base e) of division of % of non-events and % of events.
WOE = In(% of non-events ➗ % of events)
 Weight of Evidence Formula

Steps of Calculating WOE
1. For a continuous variable, split data into 10 parts (or lesser depending on the distribution).
2. Calculate the number of events and non-events in each group (bin)
3. Calculate the % of events and % of non-events in each group.
4. Calculate WOE by taking natural log of division of % of non-events and % of events
Note : For a categorical variable, you do not need to split the data (Ignore Step 1 and follow the remaining steps)
 Weight of Evidence Calculation

Terminologies related to WOE

1. Fine Classing
Create 10/20 bins/groups for a continuous independent variable and then calculates WOE and IV of the variable
2. Coarse Classing
Combine adjacent categories with similar WOE scores

Usage of WOE

Weight of Evidence (WOE) helps to transform a continuous independent variable into a set of groups or bins based on similarity of dependent variable distribution i.e. number of events and non-events.

For continuous independent variables : First, create bins (categories / groups) for a continuous independent variable and then combine categories with similar WOE values and replace categories with WOE values. Use WOE values rather than input values in your model.
data age1;
set age;
if age = . then WOE_age = 0.34615;
if age >= 10 then WOE_age = -0.03012;
if age >= 20 then WOE_age = 0.07689;
run;
proc logistic data=age1 descending;
model y = WOE_age;
run;
For categorical independent variables : Combine categories with similar WOE and then create new categories of an independent variable with continuous WOE values. In other words, use WOE values rather than raw categories in your model. The transformed variable will be a continuous variable with WOE values. It is same as any continuous variable.

Why combine categories with similar WOE?

It is because the categories with similar WOE have almost same proportion of events and non-events. In other words, the behavior of both the categories is same.

Rules related to WOE
1. Each category (bin) should have at least 5% of the observations.
2. Each category (bin) should be non-zero for both non-events and events.
3. The WOE should be distinct for each category. Similar groups should be aggregated.
4. The WOE should be monotonic, i.e. either growing or decreasing with the groupings.
5. Missing values are binned separately.

Number of Bins (Groups)

In general, 10 or 20 bins are taken. Ideally, each bin should contain at least 5% cases. The number of bins determines the amount of smoothing - the fewer bins, the more smoothing. If someone asks you ' "why not to form 1000 bins?" The answer is the fewer bins capture important patterns in the data, while leaving out noise. Bins with less than 5% cases might not be a true picture of the data distribution and might lead to model instability.

Handle Zero Event/ Non-Event

If a particular bin contains no event or non-event, you can use the formula below to ignore missing WOE. We are adding 0.5 to the number of events and non-events in a group.

AdjustedWOE = ln (((Number of non-events in a group + 0.5) / Number of non-events)) / ((Number of events in a group + 0.5) / Number of events))

How to check correct binning with WOE

1. The WOE should be monotonic i.e. either growing or decreasing with the bins. You can plot WOE values and check linearity on the graph.

2. Perform the WOE transformation after binning. Next, we run logistic regression with 1 independent variable having WOE values. If the slope is not 1 or the intercept is not ln(% of non-events / % of events) then the binning algorithm is not good. [Source : Article]

Information Value (IV)

Information value is one of the most useful technique to select important variables in a predictive model. It helps to rank variables on the basis of their importance. The IV is calculated using the following formula :
IV = ∑ (% of non-events - % of events) * WOE
 Information Value Formula

Information Value Variable Predictiveness
Less than 0.02 Not useful for prediction
0.02 to 0.1 Weak predictive Power
0.1 to 0.3 Medium predictive Power
0.3 to 0.5 Strong predictive Power
>0.5 Suspicious Predictive Power

According to Siddiqi (2006), by convention the values of the IV statistic in credit scoring can be interpreted as follows.

If the IV statistic is:
1. Less than 0.02, then the predictor is not useful for modeling (separating the Goods from the Bads)
2. 0.02 to 0.1, then the predictor has only a weak relationship to the Goods/Bads odds ratio
3. 0.1 to 0.3, then the predictor has a medium strength relationship to the Goods/Bads odds ratio
4. 0.3 to 0.5, then the predictor has a strong relationship to the Goods/Bads odds ratio.
5. > 0.5, suspicious relationship (Check once)

Important Points
1. Information value increases as bins / groups increases for an independent variable. Be careful when there are more than 20 bins as some bins may have a very few number of events and non-events.
2. Information value should not be used as a feature selection method when you are building a classification model other than binary logistic regression (for eg. random forest or SVM) as it's designed for binary logistic regression model only.

Information Value (IV) and Weight of Evidence (WOE) in R

Step 1 : Install and Load Package
First you need to install 'Information' package and later you need to load the package in R.
install.packages("Information")
library(Information)
Step 2 : Import your data

Step 3 : Summarise Data

In this dataset, we have four variables and 400 observations. The variable admit is a binary target or dependent variable.
summary(mydata)
```     admit            gre           gpa            rank
Min.   :0.000   Min.   :220   Min.   :2.26   Min.   :1.00
1st Qu.:0.000   1st Qu.:520   1st Qu.:3.13   1st Qu.:2.00
Median :0.000   Median :580   Median :3.40   Median :2.00
Mean   :0.318   Mean   :588   Mean   :3.39   Mean   :2.48
3rd Qu.:1.000   3rd Qu.:660   3rd Qu.:3.67   3rd Qu.:3.00
Max.   :1.000   Max.   :800   Max.   :4.00   Max.   :4.00
```

Step 4 : Data Preparation

Make sure your independent categorical variables are stored as factor in R. You can do it by using the following method -
mydata\$rank <- factor(mydata\$rank)
Important Note : The binary dependent variable has to be numeric before running IV and WOE as per this package. Do not make it factor.

Step 5 : Compute Information Value and WOE

In the first parameter, you need to define your data frame followed by your target variable. In the bins= parameter, you need to specify the number of groups you want to create it for WOE and IV.
IV <- create_infotables(data=mydata, y="admit", bins=10, parallel=FALSE)
It takes all the variables except dependent variable as predictors from a dataset and run IV on them.

This function supports parallel computing. If you want to run you code in parallel computing mode, you can run the following code.
IV <- create_infotables(data=mydata, y="admit", bins=10,  parallel=TRUE)
You can add ncore= parameter to mention the number of cores to be used for parallel processing.

Information Value in R

In IV list,  the list Summary contains IV values of all the independent variables.
IV_Value = data.frame(IV\$Summary)
 Information Value Scores
To get WOE table for variable gre, you need to call Tables list from IV list.
print(IV\$Tables\$gre, row.names=FALSE)
```> print(IV\$Tables\$gre, row.names=FALSE)
gre  N Percent     WOE    IV
[220,420] 38  0.0950 -1.3748 0.128
[440,480] 40  0.1000 -0.0820 0.129
[500,500] 21  0.0525 -1.4860 0.209
[520,540] 51  0.1275  0.2440 0.217
[560,560] 24  0.0600 -0.3333 0.223
[580,600] 52  0.1300 -0.1376 0.225
[620,640] 51  0.1275  0.0721 0.226
[660,660] 24  0.0600  0.7653 0.264
[680,720] 53  0.1325  0.0150 0.265
[740,800] 46  0.1150  0.7653 0.339
```
To save it in a data frame, you can run the command below-
gre = data.frame(IV\$Tables\$gre)

Plot WOE Scores

To see trend of WOE variables, you can plot them by using plot_infotables function.
plot_infotables(IV, "gre")
 WOE Plot

To generate multiple charts on one page, you can run the following command -
plot_infotables(IV, IV\$Summary\$Variable[1:3], same_scale=FALSE)
 MultiPlot WOE

Important Point
It is important to note here the number of bins for 'rank' variable. Since it is a categorical variable, the number of bins would be according to unique values of the factor variable. The parameter bins=10 does not work for a factor variable.

Related Tutorials

Statistics Tutorials : 50 Statistics Tutorials

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like banking, Telecom, HR and Health Insurance.

While I love having friends who agree, I only learn from those who don't.

Let's Get Connected: Email | LinkedIn

Related Posts:
38 Responses to "Weight of Evidence (WOE) and Information Value Explained"
1. Why should WOE be monotonic? For example, when there is U/inverse U relationship between independent variable and outcome.

1. It is because logistic regression assumes there must be a linear relationship between logit function and independent variable.

2. Can I ask for your help? I am a first time SPSS user. I need to calculate WOE and IV for more than thousands of variables in a SPSS dataset. Can you tell me how to write a SPSS macro to calculate WOE and IV automatically and output the result?
I have been struggling for a month how to do it already and really need your help.

1. Dont know for SPSS, but in R you can use *Information* package and *smbinning* package.

3. May I get the SAS code for doing fine classing to fit a logistic regression model ? I am using base SAS

4. This is a question regarding a practice or method followed by some of my colleagues. While making a logistic regression model, I have seen people replace categorical variables (or continuous variables which are binned) with their respective Weight of Evidence (WoE). This is supposedly done to establish a monotonic relation between the regressor and dependent variable. Now as far as I understand, once the model is made, the variables in the equation are NOT the variables in the dataset. Rather, the variables in the equation are now kind of the importance or weight of the variables in segregating the dependent variable!

My question is : how do we now interpret the model or the model coefficients? For example for the following equation :
log(p/1−p)=Î²0+Î²1x1

we can say that exp(Î²1)
is the relative increase in odd's ratio for 1 unit increase in the variable x1

.

But if the variable is replaced by its WoE, then the interpretation will be changed to : relative increase in odd's ratio for 1 unit increase in the IMPORTANCE / WEIGHT of the variable

1. Your understanding is correct. It's difficult to interpret the coefficient of a variable once it is replaced with WOE values. But we sometimes focus on improving the accuracy of a predictive model and give less priority interpreting the coefficients.

5. Very well written however I think the place where you say value greater than 0.3 the predictor does have a strong relationship with the odds ratio however we don't take values over 0.5 as well. I hope my understanding is correct or I am missing anything

1. There is no clear cut rule. Sometimes IV values above 0.5 make sense and sometimes not. We should be cautious if IV value is very high.

6. I think the formula that you have given is wrong it is ln(%events/%nonevents)
http://ucanalytics.com/blogs/information-value-and-weight-of-evidencebanking-case/

Also same can be found in UCLA website and SAS institute site

1. Deepanshu Bhalla has left a new comment on your post "Weight of Evidence (WOE) and Information Value Exp...":

Please don't post anonymous when you suggest for correction in article. Your understanding is incorrect and incomplete. SAS has implemented the same formula that i have mentioned in the article in PROC HPBIN procedure. Check out the SAS website link -
http://support.sas.com/documentation/cdl/en/prochp/67530/HTML/default/viewer.htm#prochp_hpbin_details03.htm

If you try to understand the mathematics behind it, the opposite fraction of events and non events hardly matters. When you take log of opposite fraction of % of events and non events, only the sign value would change and it would balance out in IV. I mean (-) (-) = +. Cheers!

7. i think the Formula its OK, when you make a binary logistic regression you said; "0" if the event does not occur, for the other hand you said "1" if the event occur. it can be confusing, maybe, include an example can help.
best regards from Chile.

PD: Sorry for my english, can you help me whit a problem?
When the IV it more than 1 (i know that more than 0.5 it suspect), that is becouse it very wrong or its a very good predictor?

1. It is because IV can be misleading when it is greater than 1. For example, a categorical variable with high number of levels (let's say 50) generally have IV value more than 1.

8. Hi Deepanshu, I want to say thank you to you for this easy explanation of WOE , IV and everything. I have a doubt, when calculating WOE for categorical variable, how to create bin/group? Ex :categorical Var - Region ( North, East, South, West ).

1. Categorical variables have their own groups/bins. For example, North /South are groups of variable region. Then you can calculate number of events and non-events in each group. Hope it helps!

9. This comment has been removed by the author.

1. Use Information package. See the code below -

library(Information)
data(train, package="Information")
train <- subset(train, TREATMENT==1)
IV <- Information::create_infotables(data=train, y="PURCHASE", parallel=FALSE)
print(IV\$Tables\$N_OPEN_REV_ACTS, row.names=FALSE)
closeAllConnections()

2. Thanks, can you please tell me the code for calculating WOE in R ?

10. Thanks for this. Please help me to calculate WOE in R package, I am trying to download all type of packages, but none of them are working. Like,
install.packages("devtools")
library("devtools")
install_github("tomasgreif/riv")
install_github("InformationValue")

which package should I use to calculate WOE and IV in R ?
what is the code for this calculation ?

1. Did you try Information package?
install.packages("Information")
library(Information)

Read up the manual guide of this package.

11. Hi Deepanshu, I have tried Information package and it works. Thanks for this.
I have one doubt in logistic regression, normally we are dividing our data into Train and Test. If there are lot of negative responses in the data, then what is the procedure to divide the data into train and test ? normally Train > Test, what will be the scenario in this case?
I will be thank full to you if you have time to acknowledge this.

12. I believe you have a typing error in this chunk of code:
if age >= 10 then WOE_age = -0.03012;
else if age >= 20 then WOE_age = 0.07689;
else if age >= 30 then WOE_age = . ;

If age is (not larger) than 10, it cannot be larger than 20.

Nice article, anyway.

1. Thanks for pointing it out. Corrected. Cheers!

13. Hi Deepanshu,
If no of events for a particular bin is zero, woe would be 'not defined' because of log function. Is is recommended to replace it by some value based on subject expertise or advisable to leave it as zero. Please help. Thanks

1. I have added a method to treat these cases. Hope it helps!

14. Hi Deepanshu,

I assume the events & non-events used in WOE calculation are possible only when the Dependent variable is binary variable. What should we do when it is a multi leveled DV?
Correct me if my assumption is wrong

1. Yes, it is for binary dependent variable. Use other variable selection method such as wald-chisquare for multinomial logistic regression.

15. hi. i learn before, that woe have opposite value with bad_rate.

Can we calculate event% and non-event% after we get the WOE and IV? (so we can get event_rate)
Because, the formula of Information::create_infotables only get the Percentage of N (total) of those binning. And formula WOE(Data,Independent,Continuous,Bin,Bad,Good) has different calculation/answer with Information::create_infotables

16. This comment has been removed by the author.

17. As WOE variables are sensitive to the event rate what adjustment do you suggest for handling unbalanced sample to apply oversampling? Do you suggest using WOE variables out of original sample and only adjusting the model coefficients for getting rid of oversampling effect?

18. Thanks Deepanshu for such a nice explanation about WOE and IV.

19. Hi. Mr. Deepanshu Bhalla!!!. When you indicate that: "Information value should not be used as a feature selection method when you are building a classification model other than binary logistic regression (for eg. random forest or SVM) as it's designed for binary logistic regression model only." which one(s) is (are) the reason(s), to establish that important point as a warning??? Thanks.

20. Hi Deepanshu,

Thanks for sharing amazing and very well organised information.

when we replace continuous variable with woe and it come as significant variable in model then how to interpret it . for ex. how to interpret WOE_age with respect to logistic regression model on y(purchase or not purchase).

Regards
Yogesh

21. Hi Deepanshu,

thanks for amazing article on woe/iv.

Can you please explain what can be done if woe values are not monotonic with bins i.e they are both increasing and decreasing with bin no. , Is the binning incorrect ? should similar values be clubbed into one bin to make woe monotonic ?

22. How do we calculate IV value for numeric(not binary) dependent variable?

23. i am getting inf values in attributes ,how should be aware from getting inf values