Data Intelligence, Business Analytics
Hi,
I'm currently working on cleaning of data for a regression model. The data has variables such as the average interest rate, stock market indices etc. Could you please let me know whether it would make sense to transform the data? I did a simple calculation of the variance and the standard deviation of the variables; noticed that it was pretty high for these indicators.
Hence I was wondering would it make sense to transform the data?
Thanking you in advance.
Regards,
Nimish
Tags:
Permalink Reply by Sergio Del Vecchio on August 19, 2012 at 8:00pm a rule of thumb I recently was made aware of......... if the data you are attempting to model has more than a 10-fold difference between the min and max values, then it may be worth natural logging this data to create a more Normal distribution.
Permalink Reply by Kevin Pedde on August 21, 2012 at 12:02pm Hi Nimish,
If you are performing simple linear regression I would examine the residuals of the model output. From there you can determine if you need to add higher order terms or need to transform the independent variable. If you plot the residuals against the predicted values or predictor variable values you should see a random pattern among the residuals, that's when you know you do not have to transform the variable or add a higher order term.
For multiple regression I believe you can look at each individual variable and do the same process. Feel free to correct me if I am wrong.
Permalink Reply by Nimish on August 22, 2012 at 5:56am Hi Keven,
Thank you for your reply!!
One quick question what do you mean by a higher order item?
Cheers,
nimish
Permalink Reply by Kevin Pedde on August 22, 2012 at 7:14am Like a quadratic or cubic of that variable. You will see a parabolic or cubic shape in the residuals.
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC