Data Intelligence, Business Analytics
Most people use logistic regression for modeling response, attrition, risk, etc. And in the world of business, these are usually rare occurences.
One practise widely accepted is oversampling or undersampling to model these rare events. Sometime back, I was working on a campaign response model using logistic regression. After getting frustrated with the model performance/accuracy, I use weights to oversample the responders. I remember clearly that I got the same or a very similar model.
According to Gordon Linoff and Michael Berry's blog
"Standard statistical techniques are insensitive to the original density of the data. So, a logistic regression run on oversampled data should produce essentially the same model as on the original data. It turns out that the confidence intervals on the coefficients do vary, but the model remains basically the same."
But everyone seems to extol or recommend oversampling/undersampling for modeling rare events using logistic regression. What are your experiences and opinions on this?