Subscribe to Dr. Granville's Weekly Digest

Future of Web Analytics: Interview with Dr. Vincent Granville

Dr. Granville is the founder of  Analyticbridge, the leading social network for analytic professionals, with more than 30,000 members. He has created several patents related to web traffic quality scoring, and he is an invited speaker at leading international data mining conferences. Vincent has consulted with Visa, eBay, Wells Fargo, Microsoft, CNET, LowerMyBills, InfoSpace and a number of startups on projects such as fraud detection, user experience, core KPIs, metric selection, change point detection, multivariate testing, competitive intelligence, keyword bidding optimization, taxonomy creation, scoring technology and web crawling.

 

Q: What is web analytics, vs. Advanced web analytics?

Web analytics is about extracting data, creating database schemas, defining core metrics that have potential to be exploited to increase profits or reduce losses, and reporting / visualization of summary statistics that could lead to actions, for instance detecting terrorists based on analyzing Twitter posts. This task is typically handled by business analysts and very senior software engineers,  in collaboration with executive management.

Advanced web analytics is about designing pattern recognition algorithms and machine learning strategies that will actually catch terrorists and spammers, better target customers, create better advertising campaigns, make web sites easier to navigate, reduce churn, root cause analysis, etc. This task is handled by statisticians and scientists. Metrics used to measure success or improvement are called lift measures.

Q: How do you see the future of web analytics?

Integration of external data (harvested on the web, social networks and other sources) with internal corporate data, via fuzzy merging. Increased concern about scoring users, page views, keywords, referrals: not all page views are created equal. Text mining and taxonomy improvement. On-demand, web-based AaaS (Analytics as a Service) provided by programmable APIs that use scoring algorithms, able to process more than one million rows in real time. Also, blending technologies from fields as varied as biometrics, military intelligence, statistics, operations research, quant, econometrics, psychometrics, computer science, six sigma etc.

Q: What is your plan regarding Analyticbridge, with respect to the web analytics community?

We are growing fast and we want to reach as many web analytic professionals as possible, and provide them with valuable resources: jobs, courses, articles, news, think tank, software reviews, success stories, etc.  We will continue to post more and more seminal articles and offer state-of-the-art technology to the community, such as HDT (hidden decision trees, designed in our research laboratory) as open source or co-branded with our partners.

Q: Which books, conferences, certifications and software do you recommend?

They are too numerous to mention. Visit our website to check new books and new journals, webinars, certifications, awards, etc. (www.analyticbridge.com). In terms of  conferences, I recommend eMetrics, AdTech, SES, Predictive Analytics world, Text Analytics News and the SAS data mining conferences. Note that we offer a free web analytics certification based on your experience (minimum requirement is a master degree from a respected University). The Web Analytics Association also offers a certification.

Q: How did you started you career in web analytics?

I've been interested in mathematics for as long as I can remember, started a doctorate in computational statistics in Belgium in 1993, earning a postdoctoral degree at the statlabs at Cambridge University (England), then moved to the states and got my first job with CNET then NBCI. Initially working in market research, then fraud detection, user behavior, traffic scoring and keyword intelligence. By 2007, I created Analyticbridge, one of the few profitable social networks.

Q: How do you participate in creating standards for our community?

I've patented a few scoring technologies and I continue to work on HDT and AaaS. I plan to deliver these technologies as open source. I've also designed countless metrics that can be used to assess lift in keyword campaigns: coverage or yield, keyword commercial value etc. Most importantly, I publish and present at conferences and discuss the correct methodology to use when dealing with sampling, Monte Carlo and model fitting. In particular, I've discussed at lengths about how to do correct cross-validation, how to compute meaningful confidence intervals for scores and why you need to provide them in the first place, and the importance of assessing the quality of your data sources through proper QA - and what to do when data is poor, wrong, or missing.

Q: Any success story you want to mention?

Detection of multiple Botnets generating more than $10 million yearly in fraud, resulting in developing sophisticated new rules involving association / collusion detection. Creation of a list of about 100,000 keywords representing 85% of the keyword pay-per-click commercial universe, in terms of Google advertising revenue. Currently working on a Google keyword  price and volume forecaster.Developing scoring algorithms that are 200 times faster than algorithms available in the marketplace (without using cloud).

Q: 10 mistakes web analytics consultant should avoid?

  • not listening well when discussing client requests
  • trying to impress client with obscure terminology, rather than with past success stories expressed in simple English
  • not understanding the big picture
  • be limited to just one or two analytical techniques
  • not using external data which could help detect flaws in client's internal data
  • not understanding where the bias might be, not understanding the metrics well enough
  • your model, no matter how good, can't be better than your data
  • lack of cross-validation or improper cross validation
  • failure to correctly deal with significant cross-correlations
  • no plan for maintenance, or not updating data / model at the right frequency
  • believing in the fact that R square is the perfect criterion for model validation
  • ignoring or not properly detecting outliers
  • using standard, black box techniques when robust, ad-hoc methodology should be preferred, or the other way around
  • lack of good judgment / gut feelings, too much faith in data or model, or the other way around
  • Ignore the 80/20 rule

Q: What do you suggest to new graduates?

Check certifications and training - visit our websiteThe Data Mining BlogKDNuggetsStatistics.comThe Predictive Modeling Agency and Association websites: INFORMS, AMSTAT, ACM, WAA, SEMPO. Also get familiar with the Google Analytics and Bing Intelligence blogs. Get an internship with a company that is good with web analytics. Download free data from the web  (write your own web robot) and analyze it. Create your own web site or social network (check ning.com) and campaigns to have a feel about metrics and concepts such as user engagement, collaborative filtering, semantic web, page view value, churn etc. Indeed, one of the largest low-frequency  click fraud Botnets ever detected was found by analyzing traffic from the small website I created in 1999 - . Download and try open source data mining software, e.g Rapid Miner.

Views: 838

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

© 2014   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service