Data Intelligence, Business Analytics
Most people use logistic regression for modeling response, attrition, risk, etc. And in the world of business, these are usually rare occurences.
One practise widely accepted is oversampling or undersampling to model these rare events. Sometime back, I was working on a campaign response model using logistic regression. After getting frustrated with the model performance/accuracy, I use weights to oversample the responders. I remember clearly that I got the same or a very similar model.
According to Gordon Linoff and Michael Berry's blog
"Standard statistical techniques are insensitive to the original density of the data. So, a logistic regression run on oversampled data should produce essentially the same model as on the original data. It turns out that the confidence intervals on the coefficients do vary, but the model remains basically the same."
But everyone seems to extol or recommend oversampling/undersampling for modeling rare events using logistic regression. What are your experiences and opinions on this?
Regards,
Tags: logistic, oversampling, regression, undersampling
Permalink Reply by Jozo Kovac on June 23, 2010 at 4:32pm
Permalink Reply by Jeff on June 23, 2010 at 6:50pm
Permalink Reply by Joseph Foutz on July 6, 2010 at 2:05pm
Permalink Reply by Joseph Hilbe on July 19, 2010 at 11:36am
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC