Subscribe to Dr. Granville's Weekly Digest

Hi - has anyone worked on clustering project using some non numeric variables? For e.g. clustering customer behavior based on brand preference, type of product purchase etc? I only have SAS EG available with me and couldn't think of a way to do it as yet...

any help would be great!

Views: 6142

Reply to This

Replies to This Discussion

Any link to SOM sources?
Would not be my tool of choice, however here is a link to a SOM in Excel

http://www.geocities.com/adotsaha/NN/SOMinExcel.html

-Ralph Winters
Multiple Correspondence analysis may be used as a preliminary step of clustering analysis. SAS PROC CORRESP followed by PROC FASTCLUS OR PROC CLUSTER is an interesting combination.
Firstly, Thanks to ALL of you for all the valuable suggestions. I have been working on this on and off for last couple of months, hence the delay.

I tried out something very simple since our clients wanted to see "something" very quick. I created dummy (1 or 0) variables from the categorical variables. For e.g. xi=1 if brand=i is purchased and xi=0 otherwise. With this I ended up with ~30 variables. I also had some numeric vars (like distance to closest competitor, guest scores etc) which I left aside for the time being since the clients were more interested in the dummy variables than the others. I derived 3 principal components from these dummy variable space. Once I was satisfied with these princomps, I used them to cluster guests ending with 6 clusters. As a sanity check, I ran an anova on these 6 groups for each of the numeric variables to ensure there was a significant difference in this numeric variable across all 6 groups. All the anova results showed that at least one cluster was different from the rest. The results were received well but I know, we can do lot better to improve the results. Do let me know your thoughts.

But I'd definitely like to try some of the suggestions you've made e.g. creating the dissimilarity matrix, using the cohesion measure. I am studying these techniques, so any help would be welcome!

Lastly, one of my team mates has access to SAS EM and he let me know that SOM was also giving great results. It made the clustering output more visually appealing. But it remains to be seen how does it compare with other techniques. I guess running tests would be the only way to know :-)

Thanks again.
Anindo,

Here is a partial list of free open source predictive analytics tools that are out there that can help you with clustering categorical values using Decision Trees or other methods

http://www.brunocm.com/blog/data-exploration-tools/

Hope this helps.
Hi Anindo,

I working on similar kind of project, I would like to know how you performed Factor Analysis on Binary data. I have Base SAS and tried Proc factor,Proc Princomp but for binary data they dont seem to work.
Finally I am now trying Correspondance Analysis (Proc Corresp) but im not able to interpret the output.
Any help is appreciated.

Thanks,
Hari

Anindo can you please share what you did post the princomp part and how did you calculate the cluster distance etc. Code snippets or web examples will be deeply appreciated

Did you use a normal princomp or any special cases,what would be the best method to do variable clustering on binary data???

hi sir,
im dng my phd in clustering area.i need some research tiltes in clustering.just nw i have registered.so p[lz provide me the details to my mail id s1k1_sk@yahoo.com
thanking u
please have a look to www.co2alarm.com. It is a clustering application on text mining results. The web site is green-centric but the algorithm is domain independend. It is a small ruby on rails application. What kind of data do you have?
i wnt to be like you. a cluster numeric expertry
In the past I've used matching coefficients, multiple correspondence analysis followed by k-means and "canonical cluster analysis", which uses optimal scaling as the first step. Nowadays I leans towards latent class.

RSS

© 2014   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service