Big Data is big phenomenon: the use of “big data” analytics for improving the ability to identify areas of improvement across the organization; and increasing the degree of decision making based on facts. Big data analytics does this by helping organizations uncover patterns and trends in both structured and unstructured data assets and translate them into meaningful insights for competitive advantage.
Interestingly, there’re hot discussions spurred recently on weather emerging Big Data tools would replace data scientist to discover Big Data.
Big Data Tool or Data scientist, which one is more crucial to master Big Data?
- Data Scientist & Data Artist, Are you still Big Data Master?
Big Data deployment is both art and science. as some data experts well put: Data science is a collaboration between "human and machine”, the human knows the business problem, but the machine can do the grunt work of generating hundreds of thousands of potential useful signals from the data. The machine then searches those signals, and surfaces the ones that may be useful. The human is in the loop, and the machine makes the human work better and faster.
That said, the patterns that captured from data help its transformation into information and then to knowledge. The value of this transformation to the ultimate beneficiary is the quality of insight. There will always be a desire to automate in the interest of speed and convenience, but the definition of 'unstructured' work, is about leveraging the cognitive strength of the human talent. So far, human may still be Big Data master.
In addition, companies must move from the current model—the implicit strategy of maximizing the benefit of a set amount of data usage—to an explicit model where all employees are expected to maximize data usage to drive business benefits.
There are many talent Big Whos to master Big Data:
- Data Scientist: A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It's almost like a Renaissance individual who really wants to learn and bring change to an organization. Put simply, data scientists will have portions of their job automated, but their work will be much less automated than one might hope. Although we might hope to replace knowledge workers with algorithms, this will not happen as soon as some would like to claim.
- Data Artist: Data Scientist is no longer enough, the emergence of the title "Data Artist", which is a Data Scientist in a Marketer's body with the ability to tell compelling stories that management can actually understand and act on, think both like a marketer/scientist. read good textbooks on marketing, advertising, etc
- Data Solutionary: If the c-suite can't understand what's being put in front of them because it's too technical, too much "jargon", it will be ignored, Also, explore the field of epistemology, philosophy of science, and critical thinking to learn what knowledge is and how it is created, and it takes a human to move beyond the obvious statements that data makes, and provide the subtle analysis that can only be uncovered using intuition and insight, to solve business issues, which means beginning with the end, the end is business value that Big Data can deliver.
The future of tools need be sophisticated enough to help less-skilled user to come to satisfactory results. The sophisticated, but easy-to-use tools will cultivate more Big Data masters.
2. Big Data Tools are emerging
Automation is a good thing as it will replace repeatable effort. Horizontal-scaling technologies now allow the storing and processing of that data in ways that do not exponentially raise costs.
- Apache Hadoop is a platform that provides pragmatic, cost effective, scalable, fault-toleranct infrastructure, which is made up of a distributed filesystem called the Hadoop Distributed Filesystem, or HDFS, and a computation layer that implements a processing paradigm called MapReduce. Apache Hadoop is open source.
- Why use Hadoop? Hadoop solves the hard scaling problems caused by large amounts of complex data. Unlike older platforms, Hadoop is able to store any kind of data in its native format and to perform a wide variety of analyses and transformations on that data. Hadoop stores terabytes, and even petabytes, of data inexpensively, it is designed to survive in complexity by not only tolerating hardware and software failures, but treating it as a first class condition that happens regularly
- Hadoop is an ecosystem. In addition to products from Apache, the extended Hadoop ecosystem includes a growing list of vendor products that integrate with or expand Hadoop technologies. HDFS and MapReduce together constitute core Hadoop, which is the foundation for all Hadoop-based applications. For applications in BI, DW, and big data analytics, core Hadoop is usually augmented with Hive and Hbase., etc. As the amount of data in a cluster grows, new servers can be added incrementally and inexpensively to store and analyze it. Because MapReduce takes advantage of the processing power of the servers in the cluster, a 100-node Hadoop instance can answer questions on 100 terabytes of data just as quickly as a ten-node instance can answer questions on ten terabytes.
- 360-degree customer analytics: Big Tools such as Hadoop helps discover new facts and relationships typically results from tapping big data that was previously inaccessible to BI. Discovery also comes from mixing data of various types from various sources. HDFS and MapReduce enable the exploration of this eclectic mix of big data. Hadoop can provide insight and data to bump that up to thousands of attributes, which in turn provides greater detail and precision for customer base segmentation, more granular analytics, 360-degree customer view, and most customer views include hundreds of customer attributes.
In summary, Big Data is both art and science.The science part can be understood, automated via machine/tool, to improve efficiency; However, the art piece may need human's intuition & wisdom to understand the holistic business context and behavioral relevance to determine 'what do to', and more importantly, 'what to do with insights', is probably a long way off, to make judgment, and ensure effectiveness -doing right things instead of just doing things efficiently. While we can—and will—develop better tools for data analysis in the coming years, we will not do nearly as much as we hope to obviate the need for sound judgment, domain expertise and hard work which human talent can offer.