tag:blogger.com,1999:blog-7958828565254404797.post2024168061714215545..comments2023-06-01T05:36:10.525-07:00Comments on ListenData: Oversampling for rare eventDeepanshu Bhallahttp://www.blogger.com/profile/09802839558125192674noreply@blogger.comBlogger12125tag:blogger.com,1999:blog-7958828565254404797.post-51296496185930156032020-10-12T14:26:38.951-07:002020-10-12T14:26:38.951-07:00In the section shown below you describe p0 and p1 ...In the section shown below you describe p0 and p1 but do not reference them in the calculations. Are they supposed to be in the formulas? Thanks.<br /><br />Correcting Confusion Matrix<br /><br />Suppose, π0 is the proportion of non-events before sampling . π1 is the proportion of events before sampling. ρ1 is the proportion of events after sampling. ρ0 is the proportion of non-events after sampling.<br />True proportion of true positives = π1 * sensitivity.<br />True proportion of true negatives = π0 * specificity<br />True proportion of false positives = π0 * (1 - specificity)<br />True proportion of false negatives = π1 * (1 - sensitivity)Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-1854539188631610512019-04-29T08:51:47.545-07:002019-04-29T08:51:47.545-07:00Hi Deepanshu,
Does this mean that you oversample ...Hi Deepanshu,<br /><br />Does this mean that you oversample AFTER you split your train and validation data?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-85654526302839851472018-02-08T05:17:48.533-08:002018-02-08T05:17:48.533-08:00Can't weight option of proc logistic be used t...Can't weight option of proc logistic be used to handle such cases?Anonymoushttps://www.blogger.com/profile/13781665375199676189noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-26086334796315779072017-08-04T14:56:57.304-07:002017-08-04T14:56:57.304-07:00Nice work Deepanshu. I just had a small question. ...Nice work Deepanshu. I just had a small question. Could you please elaborate on why does the beta coefficients of the covariates not change after the oversampling? Anonymoushttps://www.blogger.com/profile/13708448765713350139noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-60719901104266214222017-01-13T07:30:15.570-08:002017-01-13T07:30:15.570-08:00Hi Deepanshu, If I have a case where I am using sa...Hi Deepanshu, If I have a case where I am using sample of 150k from the base and my churn rate is 1%, so 1500 cases of churners (events), do I really need to oversample if I am testing around 30 variables and final model has <20 variables. Also, as my probabilities are very low, my confusion matrix is super screwed at 0.4 cut off. How do I explain this ?Preetyhttps://www.blogger.com/profile/15995230361301035952noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-62814699649767995352016-12-05T13:21:13.466-08:002016-12-05T13:21:13.466-08:00Cheers DeepanshuCheers DeepanshuAnalytics With SAShttps://www.blogger.com/profile/10035233408291446656noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-80911661163285190162016-12-05T09:15:51.638-08:002016-12-05T09:15:51.638-08:00yes, your understanding is correct. Low event rate...yes, your understanding is correct. Low event rate does not matter if you have enough events dependending on the number of variables. This rule applies only to Logistic Regression. It's not safe to generalize for all the algorithms.Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-35333246554753686582016-12-05T09:07:06.516-08:002016-12-05T09:07:06.516-08:00Thanks, this is useful.
Another question, does e...Thanks, this is useful. <br /><br />Another question, does event rate matter if you have enough volume of events in the model? I am working on Churn model for telecom (as you have given the example), churn (event) rate is 0.7% but I have around 10,000 event volume for around 1 million observations. I am am testing around 20 variables in the model and final model has around 10 variables. My understanding is that if you have enough Event volume like in this case 10K, based on number of independent variables, low event rate should not matter?Analytics With SAShttps://www.blogger.com/profile/10035233408291446656noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-83326027182938680612016-12-05T07:57:04.525-08:002016-12-05T07:57:04.525-08:00Yes, priorevent = 0.016 is correct. The idea of us...Yes, priorevent = 0.016 is correct. The idea of using validation dataset is to validate the model and fitting equation derived from the training dataset on validation dataset. You have built your model on training data and now you are checking whether model works well on data outside training. If you do oversampling on validation data as well, it would NOT be a right method of validation of your model. It is because the real desired outcome rate (event rate) is 1.6% which you are trying to predict for the future population. Hope it helps!Deepanshu Bhallahttps://www.blogger.com/profile/09802839558125192674noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-11722464233976892002016-12-05T00:51:06.703-08:002016-12-05T00:51:06.703-08:00Hi, I have come across similar problem where I hav...Hi, I have come across similar problem where I have 1.4 % churn rate (event) for around 3 million obs. I have taken 50-50 (all events and some non events). So in this case is it correct to use priorevent=0.016 in the score statement ( because my event rate was 1.6% before over sampling )?. Another question, if I do oversampling on training data and NOT on validation data, wouldn't event rate be very low in the validation dataset for sas to do validation? Many thanks. Analytics With SAShttps://www.blogger.com/profile/10035233408291446656noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-92052083199706447682016-11-20T04:47:48.794-08:002016-11-20T04:47:48.794-08:00Very well explained. ThanksVery well explained. ThanksAnalytics With SAShttps://www.blogger.com/profile/10035233408291446656noreply@blogger.comtag:blogger.com,1999:blog-7958828565254404797.post-29298633816106131172016-05-18T23:04:09.109-07:002016-05-18T23:04:09.109-07:00Deepanshu it helped a lot!!Deepanshu it helped a lot!!Anonymoushttps://www.blogger.com/profile/00085428412467255754noreply@blogger.com