Data Intelligence, Business Analytics
Tags:
Permalink Reply by Sandeep Raut on April 27, 2011 at 7:51am there are two methods to reduce variables as you may be aware. Principal component analysis & factor analysis.
for all the technical details you can refer to statsoft text book at http://www.statsoft.com/textbook/principal-components-factor-analysis/
Permalink Reply by Thomas Ball on April 28, 2011 at 6:19am
Permalink Reply by Ralph Winters on April 28, 2011 at 9:08am Tom - I am not understanding your objection to Factor Analysis (or Regression). Certainly the time taken to understand the relationships in the factors is equivalent to understanding the 100's of decision trees that can be output via the Random forest process. And that still wouldn't solve your correlation problem. Any modeling technique using correlated variables will have the importance of those effects diminished, whether linear or not.
-Ralph Winters
Permalink Reply by Thomas Ball on April 28, 2011 at 9:33am Ralph-
Thanks for your post. You are correct in noting that the random trees approach doesn't solve the predictor correlation problem. It merely develops a greatly shortened scorecard or laundry list of predictors to be used in a further stage of model refinement. But I would also suggest that FA, too, merely develops a similar laundry list requiring refinement. If one has run an orthogonal factor solution then it's guaranteed that the factors are linearly independent. There is no such guarantee, however, that the predictors are similarly independent since each item has a loading on each factor which may or may not be nonzero. In addition and since the factor solution has been developed in the absence of the predictors' relationship with the DV, further model refinement is a requirement. I remain unconvinced of the value of the information from FA in the absence of a DV.
Thanks,
Tom
Permalink Reply by Puneet Agarwal on April 28, 2011 at 10:23am Hi Ralph,
The thing that I have against using the Factor Analysis to reduce the number of predictors is that you would loose the explanatory power of the model. A model built on Factors become very difficult to explain to the business partners and it is impossible to use a model built by factors to help in any strategy formulation. If the aim is just to build a model, then its fine but if it for a business requirement, it may be rendered useless.
Thanks,
Puneet
Permalink Reply by Ralph Winters on April 28, 2011 at 10:48am Tom and Puneet,
Another point I would like to add is that you can use Factor Analysis to simplify things by highlighing variables which have NO significant loadings on any particular factor. In these cases variables will simply drop out of the original model. I have found this to be the case when using variables from Enterprise databases where there is much duplication and redundancy. No one forces you to perform Factor Analysis and actual use the factors!
-Ralph Winters
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC