web analytics

Introducing Hadoop

Hadoop, if you’ve got your ear to the enterprise ground swell, is a big bet that many of the large enterphrises are making for the future – both near term and longer term. Sooner or later, you’re going to be hearing that word dropped just as often and just as ubiquitously as the “cloud” is uttered hither and thither.

So what is Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

As much as that sounds like gobbledygook and your puzzled faces might register the question that don’t these computer things do that even now? Well, Yes and No. Yes, because to a certain extent, we’ve always wanted that ability and we’ve gotten it one way or another. No, because we’ve never had it in the way that Hadoop delivers it. Over the next few posts, I’ll delve deeper into Hadoop as I setup a homegrown Hadoop cluster – a step by step tutorial as to how I went about setting up my cluster.

 

Introducing Hadoop

Hadoop, if you’ve got your ear to the enterprise ground swell, is a big bet that many of the large enterphrises are making for the future – both near term and longer term. Sooner or later, you’re going to be hearing that word dropped just as often and just as ubiquitously as the “cloud” is uttered hither and thither.

So what is Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

As much as that sounds like gobbledygook and your puzzled faces might register the question that don’t these computer things do that even now? Well, Yes and No. Yes, because to a certain extent, we’ve always wanted that ability and we’ve gotten it one way or another. No, because we’ve never had it in the way that Hadoop delivers it. Over the next few posts, I’ll delve deeper into Hadoop as I setup a homegrown Hadoop cluster – a step by step tutorial as to how I went about setting up my cluster.

 

The Zachman Framework

A few days ago, I came across the Zachman Framework for Enterprise Architecture. Perusing the site (and enjoying his views at the same time), he asks a critical question – When you build an airplane or building, there is a systematic way of doing things – drawings, blueprints, simulation etc. How about for an enterprise? What is the systematic way of doing things?

Systematic?

Hah – you’d be laughed out of the room…

You can get a copy of the Framework if you register (free registration) as a member. From an article on the Zachman site titled: Architecture is Architecture is Architecture

There is a universal set of descriptive representations for describing any or all industrial products. It is not mysterious what one dimension of the set of descriptions is as it is derived from the classic six primitive interrogatives that have existed since the origins of language. Answers to the six primitive interrogatives constitute a complete description of anything. Therefore, one set of descriptions includes:

Bills of Material – What the object is made of.

Functional Specs – How the object works.

Drawings – Where the components exist relative to one another.

Operating Instructions – Who is responsible for operation.

Timing Diagrams – When do things occur.

Design Objectives – Why does it work the way it does.

In many ways, that is the purpose of this blog – How should enterprises be even as we toy with the need for Big Data and its associated technologies?

The problem with Enterprise Software

The central problem of Enterprise Software is the Enterprise Software Makers. Think about it for a second – anything new in the Enterprise Software space is destined either to fail competitively or be acquired by one of the big boys if successful. The only progress possible then is through the mish mash of new and dying software mashups that live within the firms that acquire them.

There must be another way – this blog is primarily about another way!!!