Subscribe to Vincent Granville's Weekly Digest:

New addition into Analytics DBMS – Hadoop as the ERP of web data and its Integration with SAS

Big data and Apache Hadoop are mentioned in the same breadth in conversations and industry practices. We have been talking about big data and unstructured data quite a lot recently. And then when the SAS blog reads in a March 2012 article that:-

‘The SAS/ACCESS Interface to Hadoop enables Hadoop users to tap into the power of SAS by extending support for the complete analytics life cycle to Hadoop, including discovery, data preparation, modeling and deployment.

Technical Details

  • LIBNAME statement makes Hive tables look like SAS data sets.
  • PROC SQL provides the ability to execute explicit HiveQL commands in Hadoop.
  • SAS procedures (including PROC FREQ, PROC RANK, PROC REPORT, PROC SORT, PROC SUMMARY, PROC MEANS and PROC TABULATE) are supported.’

(http://blogs.sas.com/content/datamanagement/2012/03/06/sas-hadoop-a...)

 

What is Apache Hadoop?

Apache is a non-profit community of developers who work on free and open source software. Hadoop is an open source software framework in Java that supports data intensive distributed applications. It started with Google when it was indexing the web and slotting user behaviour to improve performance algorithms and extract other useful and actionable data from it.

Thus, Hadoop helps you store and solve problems related to large volumes of unstructured and complex data that may not fit into structured tables. And it helps you run analytics like clustering and targeting on this data.

Hadoop is designed to run on a multiple machines which do not share any hardware. The server keeps track of where the different bits and pieces of the data is stored and multiple copies are made for the each data dump. Thus, it is a de-centralised database.

The complex computational queries are worked on the multiple processors and then the outputs are harnessed together to give a unified answer or result.

So which applications on Hadoop are free? Not a lot, many companies like IBM, SAS etc. have paid solutions that work on/with Hadoop.

And which are the largest users of Hadoop? Yahoo and Facebook are the largest users. The other notable names include-Amazon.com, American Airlines, AOL, Apple, eBay, Federal Reserve Board of Governors, Hewlett-Packard, IBM, ISI, Twitter, SAS Institute, Linked In , Microsoft etc.

Interestingly, Hadoop was the name of the toy elephant owned by the son of Dough Cutting (creator of Hadoop) !! 

About the Author: -  Subhashini  is currently active in the Analytics Training (http://jigsawacademy.com/), Blogging and Consulting  arena, and  has a decade of experience across roles in Analytics in Retail Finance and Banking. These roles have been across Risk Management, Collections strategy, Fraud Control and Marketing. Her area of interest is the integration of results / outputs of Analytics with Business Decisions – Tactics and Strategy.

(Link to profile - http://in.linkedin.com/pub/subhashini-s-tripathi/3/405/77b )

Views: 273

Tags: analytics, blogs, business, in, on, software, training

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service