web analytics

3 Things Every Hadoop Startup Should Know: Mike Olson, Cloudera CEO

A report by the CEO of Cloudera about the three things that every Hadoop Startup should know. They are:

One: There’s great money to be made building apps that run on Hadoop, so please do that! We need that innovation to drive business adoption of the platform.

Two: Hiring is brutal. Your best bet is to hire great people and teach them Hadoop. Deep Hadoop internals skills are awfully thin on the ground.

Three: Cloudera loves you, and our Cloudera Connect program is how we show it, so check it out!

In other words – an opportunity well, talent crunch and Cloudera loves you…

 

Facebook To Speed Up Biz Analytics Tool Insights To Report In Real-Time

I believe that this sub-domain of Big Data is where a significant portion of the future’s wrangles and intense competition is going to be. TechCrunch has an article on how Facebook (which by the way is one of the big Hadoop users) is using Analytics tools more or less in real-time in order to “glean” information about the activity on their site.

Facebook to Speed up Biz Analytics Tool Insights to Report in Real-Time is the article.

Facebook’s analytics tool Insights will soon begin showing Page performance data in real-time or near real-time rather than on average 48 hour delay, the company Facebook plans to announce at Wednesday’s Facebook Marketing Conference in New York City according to our sources.

and

Making real-time Insights data available through the API “will give Page owners an opportunity to see how their Page actually lives and breathes,” says Facebook analytics tool provider EdgeRank Checker‘s founder Chad Wittman

And that’s the brave new world…

Big Recognition for IBM Big Data

IBM’s smarter computing blog talks about Big Recognition for IBM Big Data. From the blog post,

IBM was among the select companies that Forrester invited to participate in The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012, (February 2, 2012). Technologies evaluated were IBM InfoSphere BigInsights (IBM’s Hadoop-based offering), and IBM Netezza Analytics. In this evaluation, IBM was placed in the Leaders category of the Wave and achieved the highest possible score in both the Strategy and Market Presence segments. In the third segment, Current Offering, IBM received the second highest score.

The Forrester report on the current players in the Big Data space can be downloaded from IBM’s site here.

How Hadoop is revolutionizing…

Business Intelligence and Data Analytics.

This presentation:  How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics by Dr. Amr Awadallah (CTO at Cloudera) delves into how a business can structure Hadoop in its Business Intelligence (BI) and Data Analytics efforts.

The below was the initial thesis:

Pre-Hadoop (and Hadoop like infrastructure) – BI applications access the data that is available in a data store such as a database and a data warehouse and produce actionable items from this data. As time moves on, the data from the data storage gets archived and essentially disappears or dies or gets aggregated/reduced for offline storage.

The below is the new anti-thesis:

Post- Hadoop – The approach here is to have live data available at all times in the raw and/or processed data form.. The Hadoop approach is to take the application to the data – distributed data and distributed applications as well acting and exploring this data.

The reason why the anti-thesis has this form is largely because as data storage has become commoditized (and rather large), data pipes enlarged and data computation rather fast, both computation and pipes have not (and perhaps need not) expanded as much as storage has. At the same time, it has become a human imperative to put out as much junk as possible er,.. be more creative and big data apps and their providers (Facebook, Google etc) have followed suit.

The synthesis – Yet to be.

But here’s a guess. Right-Compute and Right-Data. The premise of Big Compute and Big Data is that in the pile of horse manure, there must be a pony in there somewhere : a white stallion to be sure. As many past Masters (Who is a Master?  – Think Sun Tzu, Newton) will tell you – the objective of dealing with Big Data (and is there any bigger Data out there than the human, natural and metaphysical world) is to elicit the laws that underlie them.

Now in the past, we as the curious ones have depended on intuiting and hypothesizing about the Big Data out there. Today, it seems that we’re done with the hypothesizing and are jumping straight into letting the Data speak for itself.

Right Compute and Right Data is about getting back to the hypothesize and test scheme that has proven remarkably successful in our developmental journey,

Stay tuned…

Introducing Hadoop

Hadoop, if you’ve got your ear to the enterprise ground swell, is a big bet that many of the large enterphrises are making for the future – both near term and longer term. Sooner or later, you’re going to be hearing that word dropped just as often and just as ubiquitously as the “cloud” is uttered hither and thither.

So what is Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

As much as that sounds like gobbledygook and your puzzled faces might register the question that don’t these computer things do that even now? Well, Yes and No. Yes, because to a certain extent, we’ve always wanted that ability and we’ve gotten it one way or another. No, because we’ve never had it in the way that Hadoop delivers it. Over the next few posts, I’ll delve deeper into Hadoop as I setup a homegrown Hadoop cluster – a step by step tutorial as to how I went about setting up my cluster.