Here Dr. Vincent Granville (Analyticbridge's founder) answers selected questions, or comments on discussions posted by various contributors on Quora, LinkedIn, newspapers, blogs, Google+ and other networks. Let's start with six of Vincent's answers.
Q1. Is it true that most Data Scientists have at least a Masters or PhD degree? ( Quora) - http://www.quora.com/Is-it-true-that-most-Data-Scientists-have-at-l...
A. While I do have a Ph.D. in computational statistics, what makes me a data scientist is not my education. I would perform the same tasks with the same level of efficiency and expertise even if I had never attended college. I don't even need a degree to get a job, not even high school, since I run my own business. If you run your own business, the money invested in education must be carefully measured: a $40,000 education combined with 4 years of "no work" (while at school) will produce a ROI far below than if you had efficiently invested in in your business, and worked during these 4 years.
In my case, you could argue that I should earn a MBA as well. Indeed, I have acquired all the MBA competencies (including a great network) without wasting time and money at school.
Q2. Everyone Should Learn Statistics [about statistical flaw in DUI issue] (Chronicle.com) - http://chronicle.com/blogs/brainstorm/everyone-should-learn-statist...
A. The 0.08 threshold is arbitrary to start with. Many drive better with 0.16 than 95% of all drivers (though they would drive even better if their BAC was 0.00). And many drive worse with 0.00 than 95% of all drivers. In short, the more fundamental flaw here is in the choice of the metric used in the law.
Q3. What are the basic talents that a novel PhD student in data mining or machine learning should have? (Quora) - http://www.quora.com/What-are-the-basic-talents-that-a-novel-PhD-st...
A. You must have or acquire business acumen. In my opinion, a good data scientist has a PhD in data mining / statistics, and a MBA. She must be a Ph.D. with a clear idea of the big picture, someone who has integrated the "six-sigma" way of doing business. And most importantly has critical good judgment and vision. The diplomas are not necessary though, you can acquire the knowledge and expertise online for free if you are a self-learner.
Q4. What is machine learning? (Google+) - https://plus.google.com/111831540699221834313/posts
A. For many, machine learning is mostly techniques that in one way or another, perform supervised clustering using cross-validation, training sets and predictive modeling. Techniques can be statistics, SVM, neural networks, AI, pattern recognition, association rules, etc. Output can be a keyword taxonomy, stock trading system, transaction scores, automated medical diagnosis etc.
Q5. The Trouble with Big Data [discuss risks of sampling] (whatsthebigdata.com) - http://whatsthebigdata.com/2012/05/05/the-trouble-with-big-data/
A. If you want to extract a few pounds of gold out of a mountain, you can dig the whole mountain and process millions of tons of rock, and get all the gold. Or you can use smart strategies to detect where gold is likely to be located (e.g. metal sensors, rock sampling, spatial statistical to estimate lodes location) and get 50% of the gold. The first technique produces a negative ROI, the second one produces a positive ROI. The first technique is equivalent to processing raw big data, the second is equivalent to processing carefully selected small data.
Q6. The New Way to Segment For a 6x Greater Return (SmartDataCollective) - http://smartdatacollective.com/estebankolsky/50469/new-way-segment-...
A. Open-to-click ratio is low. Unsubscribe rate is unknown. This very high open rate is easy to achieve: remove from your mailing list all contacts who never opened their previous 10 messages. That should shrink the list from 200,000 to 50,000. Then perform traditional segmentation on the 50,000 to narrow it down to the 18,000 most likely to convert and least likely to unsubscribe. I will post details on Analyticbridge, including on real-time A/B testing during deployment.