Interesting discussion on DataSpora:
Last night, I moderated our Bay Area R Users Group kick-off event with a panel discussion entitled “The R and Science of Predictive Analytics”, co-located with the Predictive Analytics World conference here in SF.
The panel comprised of four recognized R users from industry:
* Bo Cowgill, Google
* Itamar Rosenn, Facebook
* David Smith, Revolution Computing
* Jim Porzak, The Generations Network (and Co-Chair of our R Users Group)
The panelists were asked to explain how they use R for predictive analytics within their firms, its strengths and weaknesses as a tool, and provide a case study. What follows is my summary with comments.
I began by describing R as a programming language with strengths in three areas: (i) data manipulation, (ii) statistics, and (iii) data visualization.
What sets it apart from other data analysis tools? It was developed by statisticians, it’s free software, and it is extensible via user-developed packages — there are nearly 2000 of them as of today at the Comprehensive R Archive Network or CRAN.
Many of these packages can be used for predictive analytics. Jim highlighted Max Kuhn’s caret package , which provides a wrapper for accessing dozens of classification and regression models, from neural networks to naive Bayes.
Bo Cowgill, Google
R is the most popular statistical package at Google, according to Bo Cowgill, and indeed Google is a donor to the R Foundation. He remarked that “The best thing about R is that it was developed by statisticians. The worst thing about R is that… it was developed by statisticians.” Nonetheless, he’s optimistic to see that as the R developer community has expanded, R’s documentation has improved, and its performance has gained.
One theme that Bo first brought up, but which was echoed by others, was that while Google uses R for data exploration and model prototyping, it is not typically used in production: in Bo’s group, R is typically run in a desktop environment.
The typical workflow that Bo thus described for using R was: (i) pulling data with some external tool, (ii) loading it into R, (iii) performing analysis and modeling within R, (iv) implementing a resulting model in Python or C++ for a production environment.
Full article at dataspora.com/blog/predictive-analytics-using-r