Subscribe to Vincent Granville's Weekly Digest:
Is it possible to carry out a cluster analysis using categorical variables ?

Views: 6943

Reply to This

Replies to This Discussion

Tom, I am not sure about "Latent Class" analysis.
I wouldn't recommend recoding categorical variables into numerics. I would stick with decision trees, correspondence analysis, or latent class analysis. You cannot do latent class analysis in SAS using EG, but there is a PROC LCA which will do the trick.

-Ralph Winters
Could you please elaborate the reason for not converting categorical variables to numeric ?
Does Proc LCA work in SAS EG ?
Also if you have write ups on Latent Class analysis it would be really great!
Main reason is that nominal categorical variables do not have order. for others, you are assigning them arbitrarily. The dummy variable technique is fine for regression where the effects are additive, but am not sure how I would interpret them in a cluster analysis with multi levels. Maybe adding with 1 binary variable would be OK.

Haven't tried Proc LCA in SAS EG, but it might work in the code node.

-Ralph Winters
Hi Ralph. What are the benefits of Latent Class analysis?
Tom. The two main benefits are ability to mix different scaled variables within the model framework, and no distribution assumptions. LCA does not assign to only 1 group, it computes the posterior probabilities of an observation belonging to all of the groups. Like cluster or factor analysis, the theory seems to be that the variation is explained via "hidden" groups (latent classes) rather than thru the variables themselves.

I can't really recommend a good book on LCA, since it is a relatively new field, and I'm still looking for one myself. Suggest you follow the documentation for whatever package you use.

-Ralph Winters
Ah, so, in other words, save for the two assumptions that you mentioned above, LCA is essentially a form of Factor analysis. Except, Factor Analysis would be a data reduction technique used on a bunch of same-scale (eg. 1-10) attitudinal variables; Latent Class Analysis (LCA) would be a data reduction technique that might include a $ spent variable, an attitudnal (1-10 scale variable), and a a index variable all in one 'Latent Variable'. But, the probability table that is outputted is pretty much the same as Factor Analysis one, is it not?

In essence, a 'Factor' in Factor Analysis is akin to a 'Latent Variable' in Latent Class Analysis?
Yes, you are correct. A factor is similar to a Latent Class variable. Good synopsis.

-Ralph Winters
This problem is very simple now. There is a procedure called "Two Step Cluster Analysis", where we can use both categorical and continuous variables. All leading software have this procedure included in their list. Please look into the assumptions. This technique is useful for a wide class of problems.
Yes. the most important part of cluster analysis is the measure of "statistical distance" between two data points, which has numerous forms for either numerical or categorical variables. Try to google some keywords like "distance measure of categorical variables", I am sure you will find something useful.

I've never heard the name 'Latent Class Analysis', but from this discussion, it seems to me that it is a Structural Equation Model. Am I right?

Anyway, there are a lot of distance measures for categorical data, just like Simon said. You can surely use it, just make sure to take a look at some of them before using as they vary depending on your final objectives and might need (most likely WILL need) some data recoding.

Hi ,

for treating categorical varible in segmentation. Do first canonical discriminant analysis. After gettin " Can " result through ncan in SAS. Do final cluster for segmentation

RSS

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service