Data Intelligence, Business Analytics
I'm currently working on cleaning of data for a regression model. The data has variables such as the average interest rate, stock market indices etc. Could you please let me know whether it would make sense to transform the data? I did a simple calculation of the variance and the standard deviation of the variables; noticed that it was pretty high for these indicators.
Hence I was wondering would it make sense to transform the data?
Thanking you in advance.
a rule of thumb I recently was made aware of......... if the data you are attempting to model has more than a 10-fold difference between the min and max values, then it may be worth natural logging this data to create a more Normal distribution.
If you are performing simple linear regression I would examine the residuals of the model output. From there you can determine if you need to add higher order terms or need to transform the independent variable. If you plot the residuals against the predicted values or predictor variable values you should see a random pattern among the residuals, that's when you know you do not have to transform the variable or add a higher order term.
For multiple regression I believe you can look at each individual variable and do the same process. Feel free to correct me if I am wrong.
Thank you for your reply!!
One quick question what do you mean by a higher order item?
Like a quadratic or cubic of that variable. You will see a parabolic or cubic shape in the residuals.