]]>

I want to forecast tourist arrivals using time series analysisI expected to use data from 2000-2013. But due to the civil war, the behavior of tourist arrivals is rather different after 2008.Hence I have to use data from 2009 to 2013. But as it is not sufficient in time series analysis I supposed to simulate data.But I do not know whether simulation is a possible solution for my problem.So I'd be grateful if anyone can help me in this problemThank youSee More

]]>

]]>

Key Performance Indicators: The 75+ Measures Every Manager Needs to KnowBernard MarrPerformance indicators are essential tools which will tell you if your business is on target or veering off course. Using the right indicators will help you deliver the right results.Key Performance Indicators cuts straight to the 75 + KPIs that matter. It explains what key performance indicators are, gives you short overviews of each metric and describes how to use the measure effectively. There are worked examples throughout which will equip you with the skills to understand, assess and interpret the most important aspects of any business. From net profit margin, to customer satisfaction through to brand equity, six sigma level and employee engagement the book gives you the manageable and essential key indicators. Key Performance Indicators is an essential reference guide to these indispensable business evaluation tools. You can dip in and learn about each KPI as and when you need it or you can read the book as a whole to help complete your performance management framework, balanced scorecard and business intelligence strategy. Covering the essential 75 financial and non-financial KPIs that every manager needs to know, the book includes a practical example of each indicator, tips on data collection, target setting and benchmarking, measurement frequency and risks. Key Performance Indicators includes: • Financial perspectives• Customer perspectives• Sales and marketing perspectives• Operational perspectives and supply chain perspectives• Employee perspective• Corporate social responsibility Read moreOther booksSample KPI's from the book:38. Six Sigma Level39. Capacity Utilisation Rate (CUR)40. Process Waste Level41. Order Fulfilment Cycle Time42. Delivery In Full, On Time (DIFOT) Rate43. Inventory Shrinkage Rate (ISR)44. Project Schedule Variance (PSV)45. Project Cost Variance (PCV)46. Earned Value (EV) Metric47. Innovation Pipeline Strength (IPS)48. Return on Innovation Investment (ROI2)49. Time to Market50. First Pass Yield (FPY)51. Rework Level52. Quality Index53. Overall Equipment Effectiveness (OEE)54. Process or Machine Downtime Level55. First Contact Resolution (FCR)See More

What exactly is big data to an asset managers? It actually means more than sucking up social media data or spying on the satellite data. These are good data collection methodology - but realistically when you have a limited budget but you want to start something from ground zero, what should you do first? If you are to start it, what different elements you have to consider? I have some thinking about it - I wish that we can share more ideas here.Consider supply chain: If you google "Big Data" and "hedge fund", you have get lots of cutting edge ideas. What you do not see is how to manage the Data after you have acquired? One key consideration is how to link up insight you crawled from different industries into one universe. And NOT all of the stocks in the S&P 500 are consumer related, at least more than half are not. It means that all those old folks blowing about "social media", crawling over news are too distant for a normal institutional investors. Let's say we are covering a Steel industry producer. The first thing we need to find out, is the supply chain data of this steel company. If company level information is not available, you can try to find out the industry information from input-output table, which will describes on average, for every dollar the company paid out, how does it go to different suppliers or employees payroll; same for every dollar earned, where does it earn from, whether it is exported, sold to another industry such as car or consumed (as raw materials for art object at the very extreme case).Existing insight: As a buy side firm, do you consider managing your existing insight from the fundamental analysts? They do lots of researches on the ground, can those company level parameters be generalized? assumptions on industry growth and industry dynamics can also be modeled into the bigger Economy model, which includes models about the different industries and how they are linked. Indeed, the fundamental analysts are producers of Big Data if you collect them properly.Crawling your customers' customers' customers' customers: As I have mentioned it in my other previous blog, the key to craw or generate insight for your industry economically is to find out the drivers. Of course you can be very good at monitoring the steel industry itself by monitoring the shipment or the work hours of the the steel industry participants, it may not give you a head start by time you have built the crawling and identifying a proper data provider. Not all industry are created equally and not all industries have the same crawl-able data available to you. Pragmatically, you can only look for the best available data source that has good and reliable quality. Early indication can be the amount of cars projected. If you want to have a head start - you can look for the income level to monitor for the amount of cars to be consumed. Wealth has to be accumulated in order to produce enough capital for car...if you are good, then you can also find the proxy for car sales quickly in the key countries/markets. Aggregating these information will give you an idea about how much steel are required. Then you can also look for the next industry that uses steel - construction ...so on. Gradually, you will be building a network of industries with interlinked supplier-customers relationship. If you could monitor your customers' customers' demand, such as sales or existing inventory, you will have an idea about how much are required. Coupling the length of the sales cycle - e.g. 3 months - then you know how much time does it take from the customers' customers' to pay your customers. With idea about your customers' inventory and demand, you also know more about your target companies.Building the capability to crawl, not buying crawlers: There are so many different types of data displayed in so many different ways. Generally speaking, data are divided into several types: on webpage as text or pictures - you need to build the fundamental ability to recognize these two types of text/picture, otherwise you will have to pay a different price for different types of crawlers as web technology evolves.Here are the practical consideration an investors should consider when building the mythical "Big Data" team. See More

Cross-row and group computation often involves computing link relative ratio and year-on-year comparison. Link relative ratio refers to comparison between the current data and data of the previous period. Generally, it takes month as the time interval. For example, compare the sales amount of April with that of March, and the growth rate we get is the link relative ratio of April. Hour, day, week and quarter can also be used as the time interval. Year-on-year comparison is the comparison between the current data and data of the corresponding period of the previous year. For example, compare the sales amount of April 2014 with that of April 2013 and compute the growth rate which is April’s year-on-year comparison. Data of multiple periods are usually compared to find the variation trend in practical business.Now let’s look at the method of computing link relative ratio and year-on-year comparison in R language through an example.Case description:Compute the link relative ratio and year-on-year comparison of each month’s sales amount during a specified period of time. The data come from orders table sales, in which column Amount contains order amount and column OrderDate contains order dates. Some of the data are as follows:Code:1 sales<-read.table("E:\\ salesGroup.txt",sep="\t", header=TRUE)2 filtered<-subset(sales,as.POSIXlt(OrderDate)>=as.POSIXlt('2011-01-01 00:00:00') &as.POSIXlt(OrderDate)<=as.POSIXlt('2014-08-29 00:00:00'))3 filtered$y<-format(as.POSIXlt(filtered$OrderDate),'%Y')4 filtered$m<-format(as.POSIXlt(filtered$OrderDate),'%m')5 agged<-aggregate(filtered$Amount, filtered[,c("m","y")],sum)6 agged$lrr<- c(0, (agged$x[-1]-agged$x[-length(agged$x)])/agged$x[-length(agged$x)])7 result<-agged[order(agged$m),]8 result$yoy<-NA9 for(i in 1:nrow(result)){10 if(i>1 && result[i,]$m==result[i-1,]$m){11 result[i,]$yoy<-(result[i,]$x-result[i-1,]$x)/result[i-1,]$x12 }13 }Code interpretation:1. The first four lines of code are easy to understand. read.table is used to read data from the table and subset to filter data, and two format functions are used to generate year and month respectively. Note that the beginning and ending time should be output dynamically from the console using scan function; here they are simplified as fixed constants.After computing, some of the values of database frame filtered are:2. agged<-aggregate(filtered$Amount, filtered[,c("m","y")],sum), this line of code summates the order amount of each month of each year. Note that in the code, the month must be written before the year though data are grouped by the year and the month according to business logic. Otherwise R language will perform grouping first by the month, then by the year, which will get result inconsistent with business logic and make data viewing inconvenient.After computing, some of the values of data frame agged are:3. agged$lrr<- c(0, (agged$x[-1]-agged$x[-length(agged$x)])/agged$x[-length(agged$x)])，this line of code computes link relative ratio. The result will be stored in the new column Irr. Business logic is (order amount of the current month – order amount of the previous month)\order amount of the previous month.Note: [-N] in the code represents that the Nth row of data is removed. So agged$x[-1]means the first row of data is removed and agged$x[-length(agged$x)]means the last row of data is removed. By performing certain operation between the two, link relative ratio can be obtained indirectly. But the result won’t include the link relative ratio of the first month (i.e. January 2011), so a zero should be added to the code. We can see that the code logic and the business logic share some similarities but are quite different. The code is difficult to understand.At this point, some of the values of data frame aggedare:4. result<-agged[order(agged$m),], this line of code sorts data by the month and the year. Since the data of the year are ordered, we just need to perform sorting by the month. result$yoy<-NA initializes a new column which will be used to store the year-on-year comparison of sales amount.Now the value of result is:5. The loop judgment in the last four lines of code is to compute the year-on-year comparison. Business logic: (order amount of the current month – order amount of the previous month)\order amount of the previous month. Code logic: from the second line, if the month in the current line is the same as that in the previous line, the code will compute year-on-year comparison. Detailed code is result[i,]$yoy<-(result[i,]$x-result[i-1,]$x)/result[i-1,]$x. We can see that the code written in this way is easy to understand and its logic is quite similar to the business logic.The only weakness of this piece of code is that it cannot use the loop function of R language, which makes it a little lengthy. But compared with the difficult operation of link relative ratio, maybe a longer but simple code is better.The final results are as follows:Summary:R language can compute link relative ratio and year-on-year comparison, but the operation of link relative ratio is difficult to understand and the code of year-on-year comparison is a little lengthy. The codes of both operations are not easy to learn.The third-party solution Python, esProc and Perl, all of which can perform structured data computation, can be used to handle this case. In the following, we’ll briefly introduce esProc and Python’s solutions.esProcesProc is good at expressing business logic freely with agile syntax. Its code is concise and easy, as shown below:In the above code, groups function is used to group and summarize data by the year and the month. The derive functions in A4 and A6 generate link relative ratio and year-on-year comparison respectively.As can be seen from the code, esProc also uses[-N]. Different from [-N] in R language, it doesn’t represent removing the Nth row; it represents the Nth row counted from the current line. For example, [-1] is the previous line. In this way, the operation of link relative ratio can be simply expressed as (x-x[-1])/x[-1].But R language hasn’t expressions for relative positions, which makes its code difficult to understand.In the year-on-year comparison operation, esProc uses judgment function if in loop function, making it avoid the lengthy loop statement and its code simpler. While R language only has the judgment statement but hasn’t the judgment function. This is the reason why its code is lengthy.Finally, these are the computed results:Python（Pandas）Pandas isPython’s third-party package. Its basic data type is created by imitating R’s dataframe but gets improved greatly. At present, its latest version is 0.14. Its code for handling this case is as follows:1 sales = pandas.read_csv('E:\\salesGroup.txt',sep='\t')2 sales['OrderDate']=pandas.to_datetime(sales.OrderDate,format='%Y-%m-%d %H:%M:%S')3 filtered=sales[(sales.OrderDate>='2011-01-01 00:00:00') & (sales.OrderDate<='2014-08-29 00:00:00')]4 filtered['y']=filtered.OrderDate.apply(lambda x: x.year)5 filtered['m']=filtered.OrderDate.apply(lambda x: x.month)6 grouped=filtered.groupby(['y','m'],as_index=False)7 agged=grouped.agg({'Amount':[sum]})8 agged['lrr']=agged['Amount'].pct_change()9 result=agged.sort_index(by=['m','y'])10 result.reset_index(drop=True,inplace=True)11 result['yoy']=result.apply(lambda _:numpy.nan, axis=1)12 for row_index, row in result.iterrows(): 13 if(row_index>0 and result.ix[row_index,'m']==result.ix[row_index-1,'m']):14 result.ix[row_index,'yoy']=(result.ix[row_index,'Amount']-result.ix[row_index-1,'Amount'])/result.ix[row_index-1,'Amount']In the code, pct_change() function is used to directly compute the link relative ratio, which is more convenient than the method used by R language and esProc. But this kind of function is not universal and can only deal with isolated cases. When it is required to compute link relative ratio or year-on-year comparison, Pandas can only complete the task by combining div function and shift function, which makes its code more difficult to understand than R’s.In computing year-on-year comparison, Pandas’ code is as lengthy as R’s. This is because Pandas also cannot use if function in loop function. I’m afraid cooperation of apply function and lambda syntax is needed if we want to write simpler code.Finally, let’s look at the computed results:Please pay attention to the following easy-to-get-wrong details:1 The code must be sort_index(by=['m','y'])when we perform sorting by the month and the year. The simple form sort(m), which used in R language and esProc, is not allowed.2 Pandas has the assignment syntax asresult.loc[row_index,‘yoy’]=value. But when assigning value to a certain element in data frame, we should write the code as result.ix[row_index,'yoy']=value.3 When iterrows()is used to perform loop, its loop number row_indexis index instead of row number. To make the row number conform to the index, reset_index() should be used to reset the indexes. See More

We studied university curricula (from computer science, stats and business schools). The top contenders are linear programming, regression, clustering, Neural networks and SVM.Then we looked at the peer groups. Obviously, the top 10 algos are published once in 4 years I guess. The current list is C5.0, KNN, SVM, EM, K-means, Pagerank, CART, Naive Bayes and a few more. We also looked at competition sites like Kaggle and found the winning algos. Singular value decomposition, Restricted boltzman machines, random forests, spectral methods seem to be the leaders there.Lastly, we asked industry practitioners. As expected the focus was on data engineering, feature engineering, cleaning and visualization with modeling sort of not much emphasis!Personally, they suggested they would also add genetic algorithms to this list as a very important technique. They almost always use it for optimization.While developing the curriculum of INSOFE programs, they spent a lot of time pondering about this.See More

The text does a great job of showing how to do each step using the data mining tool Rattle and related R concepts as appropriate. This makes it a great tool for someone who does not know much about R and wants to learn more about the powerful options available in R for data mining.” (Roger M. Sauter, Technometrics, Vol. 54 (3), August, 2012).Encourages the concept of programming with data - more than just pushing data through tools, but learning to live and breathe the dataAccessible to many readers and not necessarily just those with strong backgrounds in computer science or statisticsDetails some of the more popular algorithms for data mining, as well as covering model evaluation and model deploymentData mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.The book covers data understanding, data preparation, data refinement, model building, model evaluation, and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.Content Level » ResearchKeywords » Data mining - R applications - Rattle software - analytics - data exploration - graphical user interfaces - machine learning - model buildingBuy the bookCheck other similar booksSee More

Predictive Analytics World, November 4-5, 2014, in Berlin (http://predictiveanalyticsworld.de/) is the business event for predictive analytics professionals, managers and commercial practitioners, covering today's commercial deployment of predictive analytics, across industries and across software vendors. The conference delivers case studies, expertise and resources to achieve two objectives: Bigger Wins: Strengthen the business impact delivered by predictive analyticsBroader Capabilities: Establish new opportunities with predictive analyticsBig Data: Leverage bigger data for prediction and drive bigger value PAW's program is packed with the top predictive analytics experts, practitioners, authors and business thought leaders, including keynote addresses from industry heavy-weights Prof. Dr. Wil van der Aalst, Eindhoven University of Technology and Dean Abbot, President, Abbott Analytics, Inc. Case Studies: How the Leading Enterprises Do It Predictive Analytics World focuses on concrete examples of deployed predictive analytics. You can hear from the horse's mouth precisely how Fortune 500 analytics competitors and other top practitioners deploy predictive modeling, and what kind of business impact it delivers. PAW's Berlin 2014 program will feature 14 sessions including case studies from companies such as Abbott Analytics, Activision, GfK, Gartner, Miles&More, Nokia Siemens Network, and many more. WORKSHOPS: PAW Germany also features a pre-conference workshop with Dean Abbott, President, Abbott Analytics, Inc., that complements the core conference program. Join PAW and access the premier keynotes, sessions, workshops, exposition, expert panel, networking coffee breaks, networking party at the prime location in Berlin, the DDB Networking Lounge, brand-name enterprise leaders, and industry heavyweights in the business. Cross-Industry Applications Predictive Analytics World is the only conference series of its kind, delivering vendor-neutral sessions across verticals such as banking, financial services, e-commerce, education, government, healthcare, high technology, insurance, non-profits, publishing, social gaming, retail and telecommunications.And PAW covers the gamut of commercial applications of predictive analytics, including response modeling, customer retention with churn modeling, product recommendations, fraud detection, online marketing optimization, human resource decision-making, law enforcement, sales forecasting, and credit scoring. Why bring together such a wide range of endeavors? No matter how you use predictive analytics, the story is the same: Predicatively scoring customers optimizes business performance. Predictive analytics initiatives across industries leverage the same core predictive modeling technology, share similar project overhead and data requirements, and face common process challenges and analytical hurdles. Rave Reviews from other PAW events "I came to PAW because it provides case studies relevant to my industry. It has lived up to the expectation and I think it's the best analytics conference I've ever attended!" Shaohua ZhangSenior Data Mining AnalystRogers Telecommunications "Hands down, best applied analytics conference I have ever attended. Great exposure to cutting-edge predictive techniques and I was able to turn around and apply some of those learnings to my work immediately. I've never been able to say that after any conference I've attended before!" Jon FrancisSenior StatisticianT-MobileRead more: Articles and blog entries about PAW can be found at pawcon.com/pressroom.phpVENDORS. Meet the vendors and learn about their solutions, software and service. Discover the best predictive analytics vendors available to serve your needs - learn what they do and see how they compare. GET STARTED. If you're new to predictive analytics, kicking off a new initiative, or exploring new ways to position it at your organization, there's no better place to get your bearings than Predictive Analytics World. See what other companies are doing, witness vendor demos, participate in discussions with the experts, network with your colleagues and weigh your options!Register today: http://predictiveanalyticsworld.de/anmelden/Take €100 off the Early Bird or the Advance Two Day Pass registration fee with this posting's promotional discount code: ABR100.Save an additional €100 for each additional attendee from the same company registered at the same time. What is predictive analytics? See the Predictive Analytics Guide:www.predictiveanalyticsworld.com/guideIf you'd like our informative event updates, sign up at:www.predictiveanalyticsworld.com/signup-us.phpTo sign up for the PAW group on LinkedIn, see:www.linkedin.com/e/gis/1005097Follow PAW International on Twitter: http://twitter.com/pawcon/Follow PAW DE on Facebook:http://www.facebook.com/pages/Predictive-Analytics-World-Deutschland/307488819294764For inquiries e-mail sales@risingmedia.com or call Tel.: 089 76704459, Mobil: 0172 8267364ALL PREDICTIVE ANLYTICS WORLD EVENTS:Predictive Analytics World Boston – Oct 5-9, 2014 - http://www.pawcon.com/boston/2014/Predictive Analytics World for Healthcare Boston – Oct 6-7, 2014 - http://www.pawcon.com/health/2014/Predictive Analytics World London – Oct 29-30, 2014 - http://www.pawcon.com/london/2014/Predictive Analytics World Berlin – Nov 4-5, 2014 - http://predictiveanalyticsworld.de/PAW Videos: Available on-demand – www.pawcon.com/videoSee More