Subscribe to DSC Newsletter

Statistical Thinking is not science fiction, but a data science necessity

Over 100 years ago, the great science fiction writer H. G. Wells was credited with saying"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write." It is clear that this statement is probably more true today than ever, as Big Data and Analytics are paraded before every aspect of life, business, government, and social media experience. Statistical thinking is the bedrock of data science as statistics is a core methodology for many disciplines, including experimental science, operations research, decision sciences, and marketing research. Yet many appear to have forgotten this (or maybe have let it "slip their mind") -- see the recent article by the American Statistical Association (ASA) President, Dr. Marie Davidian: "Aren't We Data Science?"  As we read this, we need to remember also that Data Science includes several core methodologies (disciplines): machine learning (data mining), visualization, data management (including data structures, indexing, modeling, taxonomies), applied mathematics, semantics (ontologies), and application-specific discipline science, as well as the original core "data science" of statistics!

Consequently, it is wise for us to avoid the pitfalls that await us if we ignore the tenets and truths of statistics. Some of these "truisms" include:

  1. Correlation does not imply Causation
  2. Sample variance does not go to zero, even with Big Data
  3. Sample bias does not necessarily go to zero, even with Big Data
  4. Absence of Evidence is not the same as Evidence of Absence

Read more about these specific examples in the full article "Statistical Truisms in the Age of Big Data" at

Views: 3616


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Marco Ortega on August 1, 2013 at 9:58am

This is true. Its not hard to foresee that statistics will leed us one day to live those stories told by I. Asimov. :)

Comment by Kirk Borne on July 31, 2013 at 9:32pm

@Vincent, thanks for the links to your two articles. You have really clarified the issues here. As a small counter-example, I should say that there are two well known statisticians in my department at GMU, who have called themselves Data Scientists for decades, and yet they are very respectful of the "new data science" and have graciously welcomed the invasion by this astronomer into their territory. It is an excellent positive working relationship, which I appreciate every day, in which statisticians and "modern" data scientists can work side-by-side so effortlessly and productively. Your articles clearly suggest that not all circumstances are as productive or as enlightened as mine, and consequently we still have a ways to go in this big data revolution.

Comment by Vincent Granville on July 31, 2013 at 9:21pm

 I think the problem is two-fold: 

1) Statisticians have not been involved in the big data revolution. Some have written books such as applied data science, but it's just a repackaging of very old stuff, and has nothing to do with data science. Read my article on fake data science, at 

2) Methodologies that work for big data sets - as big data was defined back in 2005 (20 million rows would qualify back then) - miserably fail on post-2010 big data (terabytes). Read my article on the curse of big data, at 

As a result, people think that data science is just statistics, with a new name. They are totally wrong on two points: they confuse data science and fake data science, and they confuse big data 2005 and big data 2013.

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service