Data Intelligence, Business Analytics
Below is a quote regarding logistic regression. It seems it is saying OLS regression requires independent variables to be normally distributed. Based on my past experience, most independent variables are not normally distributed in real datasets. Could anyone comment on
this?
Source: http://www.sagepub.com/upm-data/5081_Spicer_Chapter_5.pdf, page 13
"The assumptions required for statistical tests in logistic regression are far less restrictive than those for OLS regression. There is no formal requirement for multivariate normality, homoscedasticity, or linearity of the independent variables within each category of the
dependent variable."
Tags:
Permalink Reply by Vincent Granville on July 4, 2012 at 6:13pm They don't need to be, indeed you can have variables that are binary, or dummy variables. What can cause problems though is highly skewed variables, e.g. a variable taking the value zero 98% of the time (e.g. in the context of fraud detection). Another source of problems is residual errors that are not independent, and strong cross-correlations. All these issues can be addressed by
Permalink Reply by Sean Flanigan on July 19, 2012 at 11:38pm Given dummy codes are not normal, would this generalize to impact the business presentation of "on average a unit increase in x produces an increase in y" if both are not normally distributed, or at least based on some fundamental assumption that the mean is the best predictor for both distributions, which is not the case in price elastic distributions. If so, what would be a better phrase to describe this by leaving out the on average part? Or is this consideration of "on average" altogether not really relevant?
The reason I am asking is because the binary representation of the category can be replaced by the within category means and the coefficient will be close to 1 in a uni-variate model.
Permalink Reply by Sean Flanigan on July 20, 2012 at 12:55am Sorry, I meant the within category means of the DV, not the IV.
Permalink Reply by Sean Flanigan on July 19, 2012 at 6:23am All you have to do is transform the shape of the distribution.
There can be hidden distributions within ranges of the IV. So there are techniques to transform the whole distribution, or restrict the range of the IV and transform those. All the rules apply for validation.
Permalink Reply by Ralph Winters on July 19, 2012 at 8:39pm
You are correct in saying most independent variables are not normally distributed in that (in classic statistics)
predictors are from designed experiments and are not random in that sense. But more importantly, the usual assumption of normality applies to the distribution of error predictions (observed - predicted) and not to the independent variables.
-Ralph Winters
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC