Subscribe to Vincent Granville's Weekly Digest:

The next revolution in analytics: it's not about software, it's about data

It is about integrating external data sources in your data warehouse, and leveraging this data to answer questions such as "why are we losing so many users last month" or "why do we have so few new users recently", or "what new product / feature should I produce". The answer (and the cure) might not come from within your internal data, but from the outside:

- what are my competitors up to?
- what do my clients / employees write on Facebook, Twitter or elsewhere?

The external data in question can be gathered and analyzed using web crawling and text mining techniques, or surveys - to automatically find out and summarize what is being said about your company... and about your competitors. Combined with internal data, it could answer critical business questions.

In my opinion, the potential in properly exploiting external data far exceeds results that you could get from improved software (cloud, analytics as a service, hidden decision trees etc.) I believe the future of analytics is more about finding relevant data (and identifying the right metrics - this is absolutely critical and I will discuss it later), than software improvements.

Views: 707

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by John Gins on September 16, 2011 at 9:59am

When I worked at Hyperparallel, before Yahoo bought it, one of my fellow researchers worked on a project studying real time traffic patterns thru a store. He was able to use GIS related software to analyze time of day against travel patterns against Trip Mission (based on what was bought).

 

In the northeast Stop and Shop was using Smart Baskets and the buyers Loyalty Card Information to help inform the shopper of deals that were targeted toward them. There was a lot of propensity modeling statistics going on in the background to allow that to happen.

Comment by Deepak Babu P R on September 15, 2011 at 3:06am

I see a lot of comments related to use of GIS information is increasingly gaining acceptance. I have a point in support of the same. We have foursquare(locaiton based services), we know when my customers are entering into the store, there are opportunities to integrate this with my internal customer behavior data to identify customer value and do marketing in REAL-TIME with offers relevant to his/her needs. I had presented this idea in the form of a deck at an Analytics expo, where my team won the first place. felt its a cool idea.

 

Regards

Deepak

http://bit.ly/Next-Gen-BI

Comment by Themos Kalafatis on August 25, 2011 at 12:11am

Competitive Intelligence works surprisingly well. For example by collecting Tweets discussing about  all Telcos we are then able to understand :

 

- which Telco is associated with signal problems

- which Telco products have the highest positive / negative sentiment

- Mining phrases that suggest that someone will churn, the reasons and for which Telco this refers to

 

 

and all this in near real time.

Comment by Rana Singh on February 14, 2011 at 5:17pm

Vincent, I couldn't agree more. There is so much valuable data that sits outside of companies which needs to be analyzed with the internal data to get even better insights.

 

 

Comment by John Gins on November 24, 2010 at 12:41pm
There are several interesting problems with GIS data. The data for a point source, versus data for a line source, vs data for and area/volume source require different treatments to associate one with another. Ken Reed and Joe Berry used an overlay system to grid (i.e. 100m by 100m) a map then treated the grid key as a join key to allow pont data, line data and area data to be joined (the area and line data would be scaled.)
An example might be anew natural gas pipeline being laid in a neihborhood. A point source might be a residence, line source the pipeline, an area could be the census block. Join the data to estimate the propensity of residences willing to allow a gas hookup to their residence.
Comment by Tom Wolfer on November 24, 2010 at 12:01pm
Hi John. As long as data - any type of data - can be assigned with a Latitude/Longitude coordinate, there should be no difficulty combining it with raw GPS data : which itself uses a Latitude and Longitude coordinate to reference each data point. Customer, satisfaction and purchase detail data can all be assigned with a Latitude/Longitude coordinate if a Zip/Postal Code is on file (either of the customer or a store location). Web log data may also be merged using the location of the IP provider (although, not perfect in any application) or e-mail accountholder geo-location data. Or, if a web form requires that a user enter his or her postal/zip code in order to complete an action such as a product order (for delivery). It should be possible to combine these various sources of data into one dataset within a GIS software system: it just requires some thought at the data collection stage to ensure that the data is captured properly. I hope that this helps.
Comment by John Gins on November 24, 2010 at 8:33am
I realized about 20 years ago that Geographical Based data (GIS) and the the data types it brings (point, line/curve, area, volume) can actually dwarf the amount of data we use today. My current company just barely uses GIS related information (Distance, Census). The retail industry does looks at product placement in geometrical terms on a shelf and has started paying attention to placement in the store. (How often these days will you find the magazine rack near the pharmacy department in a large Drug store?)
With GPS information becoming more commonly used. What tools should Statisticians and Data Scientists be using to integrate this information with GIS data and more typical data sources?
Comment by Richard Boire on November 24, 2010 at 5:10am
Victor, you are absolutely right. We have always stressed the data with the organizations that we work with. But it's not just external data that we focus on. We will focus on the company's own internal data and arrive at unique approaches in manipulating this information into new variables. Software is secondary and data is primary.
Comment by Tom Wolfer on November 22, 2010 at 2:03pm
It just occurred to me that no mention has been made of video data analytics in this discussion. Where does this fit in? As mobile keeps growing, video data will continue to exponentially increase regardless of sector or data collection source. Video poses the greatest challenge to all aspect of analytics: collecting it, tagging it, storing it, and merging it with other forms of data collected: for inclusion in analytical techniques such as an inductive decision tree. These above issues don't include the upstream challenges related to bandwidth for playing and downloading video, space for storing it, and the steps involved to ensure that all videos are assigned proper meta tags. Mining video data properly is a much more challenging task than text, embedded links, or, for that matter, images. Innovations (software or processes) that result from mining video data (the way we mine numbers by themselves) in combination with text and numbers will be the next revolution in data analytics, from my perspective, anyway.
Comment by Ralph Winters on November 22, 2010 at 11:30am
Vincent: Yes I think you can get good results from this providing you have enough agreement
between the identifiers, and have assigned them optimal prior weights. If you are dealing with identifiers like SSN, name/address, gender, phone number you are on solid ground. If you have a lot of missing values or you dealing with a lot of unstructured data that will further complicate.

-Ralph Winters

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service