I am working on a Project on Time Series Data. I need to segment customer who have similar time series trends. Can someone guide me how to approach such Time Series Clustering issues.
This is Biswajit represent StatSoft India. www.StatSoft.com. We will appreciate if you can share your details & particulars to discuss in length & details.Anticipate & expect may contribute & add value to your current project.
I assume when you say "trend" you mean long-run stocastic trend. The only way I know of extracting long-run stocastic trend information is through cointegration techniques. Moving average or ARIMA models will not do it. However, you need a lot of observations to make identification of cointegrated time series meaningful. The data can be analyzed in an unrestricted VAR model and then tested. The CATS for RATS program will do this, as will several other econometric packages. Good luck. -jr
Rubin, Thank You for all your suggestion and time. I try those models and contact you for any further advice.
Biswajit, the data is about consumer transaction variable collected for every month for 12 months period. We have thousands of consumers. We need to see whether there are common patterns in consumer spending by clustering techniques.
This is Biswajit represent StatSoft India creator & publisher of STATISTICA since 1984.We embed approximate 6,00,000 & 1000 Users globally & in India trust,confidence & reference to our engagement.We have a product called STATISTICA AUTOMATED NEURAL NETWORK (SANN) for predictive analytics address the areas of Regression,Time Series,Cluster, Classification for a 360 degree technical view you may reach us biswajit@statsoftindia.com or + 91 98913 93138 (M)
Why don't you try correspondence analysis? Suppose there are 1000 customers and 12 months of transaction data. First, set up the data in a 1000 x 12 matrix of customers by variables. Second, scale each customers' data so that the sum of its 12 months of transaction data equals 1. The rescaling removes size differences among customers. Then perform a correspondence analysis on the scaled matrix.
Suppose that customers have one of 3 different linear trends and there is no other variation in the data. If this is the case the first 3 right eigenvectors will identify these trends. Of course, there is always other variation in the data so the entire set of right eigenvectors must be examined.
The correspondence analysis identifies all sorts of patterns. For example, if one of the customer segments has a quadratic trend, an eigenvector with that pattern will show up. The customers in that segment will have a large coordinate on that component while customers in the other segments will not. Also, if there is a seasonal pattern, correspondence analysis will identify it.
Thanks for your suggestion Goldrick. I used Correspondence Analysis as you mentioned, but I'm not sure how good to use this for mining large database.Could you please give me some example or applications on time series data.
Okay. In your application, the correspondence analysis generates 11 eigenvectors in order of their explanatory power, from highest to lowest. The next thing to do is perform a "biplot" of the first 2 dimensions. The biplot will show both customers and months on the same 2-dimensional graph. If customers actually do cluster meaningfully, you will see it in this plot. You will also see which customer clusters are "close" to which months. This will tell you something about which customers increase their purchases around Christmas.
Now, it may be the case that customers cluster on one or more of the other 9 dimensions that have less explanatory power than the first 2 dimensions. You can check for this by performing a biplot on these other dimensions, for example, 3 and 4, 5 and 7, etc.
You will have to look at the patterns of the column eigenvectors to interpret them. For example, if there are 2 column eigenvectors that are monotonically increasing or decreasing from January to December, this probably indicates that a large number of customers follow 1 of 2 different trends. If you perform a biplot of these 2 dimensions you'll be able to see which customers cluster near which trend.
If there are too many customers for the biplot to be visually informative, you can perform hierarchical agglomerative clustering on the row "eigenvectors" for the subset of dimensions of interest (trend, Christmas, etc.).
Sorry Goldrick, I was working on some other project and couldn't able to respond. I tried doing hierarchical clustering on eigen vectors since we have lot of consumer data. Our results shows that there are peak transactions almost in every month. Should validate these results looking into other parameters.
i will suggest u about this problem, just make clusters of your customers then u can simply make a 3d histogram taking cluster number on one axis and month on the other axis(any software will do this for u).
this may give u a picture of which spending patterns of group of customers in different months
Did you mention what platform you are on? Microsoft's Sequence Clustering algorithm in SQL Server Analysis Services is designed to address that kind of a problem--sequence patterns. Also, I would think that web analytics software might have something to offer as sequences of clicks on websites are critical, but I have very limited experience with that.