How Hadoop is revolutionizing…
February 6, 2012 Leave a Comment
Business Intelligence and Data Analytics.
This presentation: How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics by Dr. Amr Awadallah (CTO at Cloudera) delves into how a business can structure Hadoop in its Business Intelligence (BI) and Data Analytics efforts.
The below was the initial thesis:
Pre-Hadoop (and Hadoop like infrastructure) – BI applications access the data that is available in a data store such as a database and a data warehouse and produce actionable items from this data. As time moves on, the data from the data storage gets archived and essentially disappears or dies or gets aggregated/reduced for offline storage.
The below is the new anti-thesis:
Post- Hadoop – The approach here is to have live data available at all times in the raw and/or processed data form.. The Hadoop approach is to take the application to the data – distributed data and distributed applications as well acting and exploring this data.
The reason why the anti-thesis has this form is largely because as data storage has become commoditized (and rather large), data pipes enlarged and data computation rather fast, both computation and pipes have not (and perhaps need not) expanded as much as storage has. At the same time, it has become a human imperative to put out as much junk as possible er,.. be more creative and big data apps and their providers (Facebook, Google etc) have followed suit.
The synthesis – Yet to be.
But here’s a guess. Right-Compute and Right-Data. The premise of Big Compute and Big Data is that in the pile of horse manure, there must be a pony in there somewhere : a white stallion to be sure. As many past Masters (Who is a Master? – Think Sun Tzu, Newton) will tell you – the objective of dealing with Big Data (and is there any bigger Data out there than the human, natural and metaphysical world) is to elicit the laws that underlie them.
Now in the past, we as the curious ones have depended on intuiting and hypothesizing about the Big Data out there. Today, it seems that we’re done with the hypothesizing and are jumping straight into letting the Data speak for itself.
Right Compute and Right Data is about getting back to the hypothesize and test scheme that has proven remarkably successful in our developmental journey,