tag:blogger.com,1999:blog-7958828565254404797.post366234897168435829..comments2024-10-14T13:09:21.088-07:00Comments on ListenData: Weight of Evidence (WOE) and Information Value (IV) ExplainedDeepanshu Bhallahttp://www.blogger.com/profile/09802839558125192674noreply@blogger.comBlogger79125tag:blogger.com,1999:blog-7958828565254404797.post-23775660779974174642023-04-14T22:21:06.501-07:002023-04-14T22:21:06.501-07:00Thanks for the article Deepanshu. very insightful
...Thanks for the article Deepanshu. very insightful<br />But what if we get IV value > 1 for a continuous variable and bins we created are less then 10. <br />Should we user this feature in logistic regression.unknown-kirahttps://www.blogger.com/profile/17280150784430030384noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-25412926966570134632022-09-14T08:34:27.642-07:002022-09-14T08:34:27.642-07:00Thanks! https://www.listendata.com/2015/03/weight-...Thanks! https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html?sc=1663169654969#c966867219846697236<br />JeffThorsen777https://www.blogger.com/profile/01444214970070994178noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-56422098618979698792022-06-15T02:42:05.701-07:002022-06-15T02:42:05.701-07:00In credit risk domain, bad customers are "eve...In credit risk domain, bad customers are "events" because we are interested in the probability of default. Hope it helps!Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-42973491126133203852022-06-06T18:44:34.736-07:002022-06-06T18:44:34.736-07:00Thank you for the article. I am confused by what ...Thank you for the article. I am confused by what you wrote about the WoE and the actual formula. You wrote, "Positive WOE means Distribution of Goods > Distribution of Bads<br />Negative WOE means Distribution of Goods < Distribution of Bads<br />Hint : Log of a number > 1 means positive value. If less than 1, it means negative value." <br />and your formula - when you first introduced it at the top of the article - is congruent with this: <br />WOE = ln(Dist of Goods / Dist of Bads)<br />However, later in the article you wrote the formula as: <br />WoE = ln(% of non-events / % of events), which is the opposite of your first version of the formula. <br />Then the Weight of Evidence and Information Value Calculation Table contradicts what you wrote above about a positive of negative WoE. For example, in the range 0-50 the % of Events (5.9) is greater than the % of Non-Events (5.4), yet the WoE is negative. Likewise, in the range 51-100 the % of Events (10.1) is less than the % of Non-events (12.3), yet the WoE is positive. Which formula is correct, and would you please help me clear up the confusion? Thanks.Jennyhttps://www.blogger.com/profile/17160481748531024803noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-17225119624222640352022-05-17T02:52:05.264-07:002022-05-17T02:52:05.264-07:00Fixed it. Thanks!Fixed it. Thanks!Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-54559652607108094922021-11-26T10:25:06.647-08:002021-11-26T10:25:06.647-08:00Thanks for the article. For the 5% Rule, should we...Thanks for the article. For the 5% Rule, should we consider the missing bin? <br /><br />i.e:<br />We consider Missing records as separate bins. If that Missing bins has less than 5% of records, then what should be done. <br /><br />Inorder to check the bin_size:<br />I used ((# of records in that bin / Total # of records) < 0.05).any(). Return True if any one of the bin size is less than 5%. Anonymoushttps://www.blogger.com/profile/12133028756477750185noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-69094335065811618632021-06-14T06:57:53.062-07:002021-06-14T06:57:53.062-07:00Amazing blog.Amazing blog.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-14614351591602320692021-05-13T04:16:28.192-07:002021-05-13T04:16:28.192-07:00Hi Deepanshu, I dont think your python code consid...Hi Deepanshu, I dont think your python code considers missing category in calculating WOE and IV.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-11388251244813825042021-05-11T13:28:27.776-07:002021-05-11T13:28:27.776-07:00Hello, great explanation on WOE and IV.
I have a ...Hello, great explanation on WOE and IV.<br /><br />I have a remark on Python code for following lines:<br /><br />d['WoE'] = np.log(d['% of Events']/d['% of Non-Events'])<br />d['IV'] = d['WoE'] * (d['% of Events'] - d['% of Non-Events'])<br /><br />Shouldn't be :<br />d['WoE'] = np.log(d['% of Non-Events'] / d['% of Events'])<br />d['IV'] = d['WoE'] * (d['% of Non-Events'] - d['% of Events'])<br /><br />as both formulas consider Non-Events / Events and Non-Events - Events respectively as you described in this article (and from theory).<br /><br />Nevertheless, the result is the same in the end.<br /><br />RalucaRalucanoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-65325789237803149582021-02-10T07:27:11.052-08:002021-02-10T07:27:11.052-08:00can we use IV for survival analysis model?can we use IV for survival analysis model?syhttps://www.blogger.com/profile/18324640366831098928noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-53787989109348871492021-02-05T21:00:39.016-08:002021-02-05T21:00:39.016-08:00Very good article. Very good article. Anonymoushttps://www.blogger.com/profile/04335476171478717193noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-11804133523656845902020-12-15T16:16:08.653-08:002020-12-15T16:16:08.653-08:00Ok well noted. Thank you very much for the quick r...Ok well noted. Thank you very much for the quick response.Dr.https://www.blogger.com/profile/16477883497268373533noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-8090599938652713592020-12-15T06:18:18.078-08:002020-12-15T06:18:18.078-08:00Yes it works only for binary dependent variable Yes it works only for binary dependent variable Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-12894535970772568712020-12-15T05:07:18.041-08:002020-12-15T05:07:18.041-08:00Hello, please does IV only work for binary depende...Hello, please does IV only work for binary dependent variables? I tried it on multi-class dependent variable but it's not working. Thanks.<br />Dr.https://www.blogger.com/profile/16477883497268373533noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-15722544512271729042020-10-22T09:14:21.237-07:002020-10-22T09:14:21.237-07:00Great article on WOW and IV. Excellent read!
I wa...Great article on WOW and IV. Excellent read!<br /><br />I was hoping if you could shed some light on arriving at the final IV value of a variable. From what i gather, IV value of a variable is a summation of IVs from all bins of that variable. However, when i leverage Information package in R and iv$Summary to look into IV values, it doesn't output the final IV instead the IV value of the last bin is captured. <br />I am trying to extract IVs for each variable and when looked into the WOE table, i noticed the IV reported by Summary function reflects the IV of one of the bins of the variable and not the summation. Appreciate any help. Thankssanthanhttps://www.blogger.com/profile/17592840708809383112noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-69334327487502247092020-07-22T05:12:28.975-07:002020-07-22T05:12:28.975-07:00Hi Deepanshu , my some coefficients are coming neg...Hi Deepanshu , my some coefficients are coming negative post woe transformation variable what does this means?Abmashahttps://www.blogger.com/profile/01540355976166347681noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-32262128539612094592020-07-06T06:13:49.727-07:002020-07-06T06:13:49.727-07:00Hi Deepanshu..thank you for the article..I have do...Hi Deepanshu..thank you for the article..I have done the above approach in my model building process.I had a question.Can I get the explaination of the coefficient of any categorical variable on the log odds.If the variables has 4 levels after doing the WOE approach and it comes significant then what does the coefficient say about the variable.The coeff for state variable being 2.5(say) what does it say about the probability or log odds ?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-19583910139606515292020-06-09T07:31:02.894-07:002020-06-09T07:31:02.894-07:00Thank you very much for this useful article. Can y...Thank you very much for this useful article. Can you please help me in how do we get the woe values as varaibles in the original data frame IN python so that we can proceed for the modeling Bhavanihttps://www.blogger.com/profile/05069236727930692288noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-15819307665534461422020-04-03T13:03:35.992-07:002020-04-03T13:03:35.992-07:00Clear and useful.
Thank you!Clear and useful.<br />Thank you!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-55393952499791417202019-10-01T07:51:37.088-07:002019-10-01T07:51:37.088-07:00In my data there is a strong asymmetry in the poss...In my data there is a strong asymmetry in the possible values. Does this mean that I cannot use this approach? Example below:<br />age events<br />[1-28] 4000<br />[28-31] 5000<br />[31-33] 3500<br />[33-35] 600<br />etc...<br />It seems that grouping doesn't satisfy 5% rule...Anonymoushttps://www.blogger.com/profile/04883417449889778046noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-42643136208051504832019-09-11T03:54:41.875-07:002019-09-11T03:54:41.875-07:00Nope because calculation of IV considers percentag...Nope because calculation of IV considers percentages rather than count.Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-75898671177961269232019-09-09T03:20:40.200-07:002019-09-09T03:20:40.200-07:00Does the skewness of a dataset in terms of event% ...Does the skewness of a dataset in terms of event% being very low (~0.5%) as compared to non-event% impact the value of IV ?Anirbanhttps://www.blogger.com/profile/00080706839786138886noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-30301268136503277932019-08-23T07:12:27.735-07:002019-08-23T07:12:27.735-07:00Hi Deepanshu,
My woe values for a variable are wa...Hi Deepanshu,<br /><br />My woe values for a variable are wavy like shown below:<br /><br /> Variable Cutoff N Events % of Events Non-Events % of Non-Events WoE IV<br />0 Amount (-0.001, 1.0] 30492 181 0.367886 30311 0.106611 1.238590 3.236134e-01<br />1 Amount (1.0, 3.57] 26473 27 0.054878 26446 0.093017 -0.527664 2.012431e-02<br />2 Amount (3.57, 8.91] 28559 36 0.073171 28523 0.100322 -0.315588 8.568554e-03<br />3 Amount (8.91, 13.0] 28405 13 0.026423 28392 0.099861 -1.329554 9.764019e-02<br />4 Amount (13.0, 22.0] 28714 14 0.028455 28700 0.100944 -1.266236 9.178828e-02<br />5 Amount (22.0, 37.0] 28375 17 0.034553 28358 0.099741 -1.060092 6.910594e-02<br />6 Amount (37.0, 59.8] 28366 24 0.048780 28342 0.099685 -0.714687 3.638094e-02<br />7 Amount (59.8, 100.0] 28915 50 0.101626 28865 0.101525 0.000997 1.010233e-07<br />8 Amount (100.0, 203.0] 28050 45 0.091463 28005 0.098500 -0.074117 5.215200e-04<br />9 Amount (203.0, 25691.16] 28458 85 0.172764 28373 0.099794 0.548817 4.004719e-02<br /><br /><br />Even if i club adjacent bins i dont see there is any monotonic pattern in woe. What can be done in such case?Anonymoushttps://www.blogger.com/profile/12102463565945495017noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-35548368893060062812019-07-17T12:16:55.528-07:002019-07-17T12:16:55.528-07:00Hi Deepanshu Bhalla, thank you for the explanation...Hi Deepanshu Bhalla, thank you for the explanation. WOE and IV can be used to build a scorecard model to give a customer, i.e a loan applicant, a score to determine whether "good" or "bad". But can I use the similar technique to score a company? If I need to analyze 1000+ customer companies, and I have 10+ attributes (similar to the loan/borrower attributes, but these are at company level) of each company, can I build a scorecard using WOE/IV and logistical regression? So far, the biggest question I have is, not like loan, Company doesn't "default/fail" that often. In the history, I have seen only 2 or 3 customer companies failed before. If WOE/IV and logistical regression are not suitable in this case, do you have any suggestions as to how to build a scorecard model?Kubo929noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-41047649910923074292019-07-12T09:35:52.308-07:002019-07-12T09:35:52.308-07:00Yes we can. That's what everyone has been usin...Yes we can. That's what everyone has been using before proc hpbin procedure was implemented in SAS.Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.com