Data Intelligence, Business Analytics
I have to create a model to simulate a sales pipeline and forecast sales for 6 months. I have several questions -
1) How can I use time series along with Monte Carlo simulation?
2) Currently I have assumed monthly seasonality and using Jan 2011 to predict Jan 2012. However, i understand it is not right to base a forecast on a single data point. How can I use multiple data points from past to make the monthly forecasts?
3) Which is the best software to do this?
It is certainly possible to analyze a time series with Monte Carlo, given certain conditions. The important thing is to have random events that occur at points in time.
Some random events are periodic. For example, the Dow Jones Industrial Average can be measured at the end of every day or week and the value will go up or down by some amount. However, today's closing price will be highly correlated to yesterday's closing price. In this way, commodity prices measured at periodic intervals can be treated as a Random Walk, well-suited to Monte Carlo methods.
Other random events occur at random points in time. For example, nearly every comes down with a cold sooner or later. We can look at the location in time of the next cold as a random event. Alternately, the interval between two successive colds can be treated as a random variable and analyzed with Monte Carlo methods.
In Markov Chain Monte Carlo (MCMC) analysis, a Markov event is needed that randomizes behaviors or outcomes in some manner. For a Markov Chain, a succession of events over time is required. Thus, MCMC is very well suited to time series analysis. In conventional MCMC applications, a single object executes a random walk over time, with Markov events occurring intermittently to randomize the subsequent path.
Dr. Jon Bjorkman, a statistical astrophysicist at the University of Toledo, has investigated the movement of photons through the atmosphere of a star using Markov Chain Monte Carlo simulations. Dr. Bjorkman has found that, in the case of multiple objects in motion, the entire path can be treated as the random event. In this case, a large number of objects start at the same place at the same time and end up in different locations. While this may seem very abstract, I am presently engaged in applying Dr. Bjorkman's work, along with my own research in emerging time series methodologies, to develop a model to predict future purchase activity on behalf of a major automotive supplier - can't get more practical than that!
as David points out there are many examples of usage of Monte Carlo methods in analysis of natural events. I myself used it in my PhD on star formation rates in the early universe, basically modelling an entire volume of the universe in a time sequence relying principally on Monte Carlo method... here is the basic recipe:
1. Identify all the events that lead to your final event, in this case a sale. Your sale pipeline is a connection of events that occur and when each event connect with a certain probability they end up a a sale event.
2. Model each event to your best of ability. You need to model each event as a probability distribution, majority of events in the world are Normal based, hence if you have a limited data sample to study your event distribution you'll need to use some stats tools to find your Normal curve (here is a good start, http://exceluser.com/explore/statsnormal.htm)
3. Once each event is modelled into a Normal, you use a random number generator (again very important that is as close as possible to random, there are many approximate RNG algorythms which induce correlations when you start to use them repetitively. this was a big problem when I used this on my thesis work.... so again do a some background check by doing correlation checks on your number generating algorithm. (for discussion see http://www.random.org/, for good open c++ libraries see http://www.boost.org/doc/libs/1_36_0/libs/random/index.html)
4. Run event cumulation runs: basically you start with your first event model probability distribution, run a random number, allocate an event point, determine from your model is this event point leads to the next event in your sales pipeline, run another number for that event distribution, get another point, see if it meets criteria to move to the next event and so on.
5. you run sufficient iterative runs on your model to finally get a hit at the end of the pipeline (ie a sale in this case) in such a way that you model every time slot with sufficient hits that you get statistical meaning out of it. You will then get a distribution of sales as a factor of time (eg hours, days, week depending on how good your model is and how much initial data you have). In time you will be able to fine tune your predictive model with actual data. Obviously, the further away in the future your prediction, the weaker it becomes... hence running you code every day or every hour may be required in order to take into account latest input seed conditions for your model.
I hope the above has somewhat given you a little idea as to how to go about it. However, without more concrete information as to what sales and other conditioning factors you are trying to model, it is hard for me to give you more specific details.
best of luck