Introducing Hadoop
January 30, 2012 2 Comments
Hadoop, if you’ve got your ear to the enterprise ground swell, is a big bet that many of the large enterphrises are making for the future – both near term and longer term. Sooner or later, you’re going to be hearing that word dropped just as often and just as ubiquitously as the “cloud” is uttered hither and thither.
So what is Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
As much as that sounds like gobbledygook and your puzzled faces might register the question that don’t these computer things do that even now? Well, Yes and No. Yes, because to a certain extent, we’ve always wanted that ability and we’ve gotten it one way or another. No, because we’ve never had it in the way that Hadoop delivers it. Over the next few posts, I’ll delve deeper into Hadoop as I setup a homegrown Hadoop cluster – a step by step tutorial as to how I went about setting up my cluster.



Recent Comments