Data Intelligence, Business Analytics
I have a question that if we are using proc varclus to eliminate redundancy in the IV's, how do we go about selecting the cluster representatives? I know the lower the (1-R^2) ratio, the better is a variable as a representative, however, if we use other factors such as business sense or univariate chi square of a variable along with (1-R^2) ratio then should we select cluster representatives that have a higher univariate chi square or making more 'business sense' even if they are having a higher (1-R^2) ratio?.. Please advise..!
Tags:
Permalink Reply by RockyRambo on October 11, 2012 at 1:35pm .Or else, we should go by selecting the top 5 , top 10 variables per cluster and then look at other statistics later on?
Permalink Reply by Ralph Winters on October 20, 2012 at 5:17pm Varun, I haven't used varclus for a while, but I would say that you could swap one variable for another if it made better business sense. The 1-R^2 ratio is only a guide. Also, look at the relationship between the 2 candidate variables. They should be correlated.
-Ralph Winters
Permalink Reply by Edmund Freeman on October 24, 2012 at 12:01pm I would go with the business sense here. One of the things I like about proc varclus is that it takes a really hard problem for humans -- picking out some variables from hundreds -- into a bunch of very reasonable variables -- picking one or two variables out of 10 or so.
Permalink Reply by RockyRambo on October 24, 2012 at 12:21pm Thanks Ralph and Edmund..In fact, I used 100 such clusters and then looked at each variable in each cluster starting from the one having lowest (1-R^2) and left those variables which were 'redundant'..
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC