Data Intelligence, Business Analytics
I have recently got an assignment from an FMCG company for market segmentation (clustering) using more than 200 variables. As a first step, I plan to carryout data reduction using Factor Analysis. So far I have been testing linearity of variables in SPSS using scatter plots. However the number of combinations in this case would be simply too large to carry out scatter plots. Can anyone suggest some solution to me how I can test linearity using SPSS.
Tags:
Permalink Reply by Jay Dasari on February 24, 2012 at 3:45pm How about reducing the dimensionality using principal components? I suspect a majority of the variance in the data can be explained by a fraction of the factors.
Permalink Reply by Ralph Winters on February 28, 2012 at 10:30am Have you tried computing Pearson correlation coefficients for all of your pairs? It should be part of the bi-variate analysis.
Permalink Reply by Ravi Sangal on March 8, 2012 at 6:59am Thanks Ralph however not too clear about your reply. If there is correlation amongst variables can that be taken as an indicator of linearity. I personally do not think so!
Permalink Reply by Ralph Winters on March 8, 2012 at 11:49am Ravi - Correlation does not necessarily mean linear correlation. But yes, if you are using a Pearson coefficient, it is the linear part that it is measuring. Did you have anything specific in mind?
Permalink Reply by Ravi Sangal on April 27, 2012 at 5:16am Ralph- sorry I am coming back to you so late. However can you please give me more on correlation which is not linear.... Ravi
In a *pinch*, one quick trick is to utilize OLS. Generally, the first listed assumption of the Gauss-Markov Theorem is that Y and Xs are linearly related. Using this to your advantange, choose the continuous variables of interest and select one to be Y (the rest Xs) and then model using OLS. Therefore, if a variable is not significant, you can *assume* that it's not linearly related with Y. Then, randomly select *at least* 15-25 of the remaining, significant variables and run probablity plots to ensure linearity.
Permalink Reply by Ravi Sangal on April 27, 2012 at 5:42am I have, for a few years been helping clients with predictive modeling using SPSS. Have also been often using Factor Analysis and Clustering. I am keen to know, can some of these tools be used on data from an e-commerce site. I am talking to a company which is successfully selling watches, jewellery, goggles, stationery etc thru an e-commerce site. What can I propose besides web analytics which they are already doing? Suggestions would help....Ravi
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC