web analytics

How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did

Or how Big Data is the wild wild west – where the saloon owners knew a lot more about who walked into their saloon than they themselves knew.

This article: How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did explores a real incident about the way that Target’s marketing and information based corporation targets its customers (and dare I say keeps even its own store managers at the front line in the dark.)

So Target started sending coupons for baby items to customers according to their pregnancy scores. Duhigg shares an anecdote — so good that it sounds made up — that conveys how eerily accurate the targeting is. An angry man went into a Target outside of Minneapolis, demanding to talk to a manager:

“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again.

and

On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”

What Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.

“If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” Pole told me. “We are very conservative about compliance with all privacy laws. But even if you’re following the law, you can do things where people get queasy.”

So we’ve gone POS (Point Of Sale) to gleaning information about clients from the patterns they exhibit. Information has been so free in the past that it sounds like it is going to become a whole lot more expensive.

Though what I loved best about this article is the way that the Forbes writer included a picture of Andrew Pole of Target Corp (linked from LinkedIn) in the article itself. A bit of reverse information gleaning…

The Age of Big Data

The Age of Big Data is an article at NY Times Sunday Review by Steve Lohr. Some of the key takeaways,

They help businesses make sense of an explosion of data — Web traffic and social network comments, as well as software and sensors that monitor shipments, suppliers and customers — to guide decisions, trim costs and lift sales.

And of course, something like this means jobs as welll…

A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired.

As for the impact

The story is similar in fields as varied as science and sports, advertising and public health — a drift toward data-driven discovery and decision-making. “It’s a revolution,” says Gary King, director of Harvard’s Institute for Quantitative Social Science. “We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”

and

Research by Professor Brynjolfsson and two other colleagues, published last year, suggests that data-guided management is spreading across corporate America and starting to pay off. They studied 179 large companies and found that those adopting “data-driven decision making” achieved productivity gains that were 5 percent to 6 percent higher than other factors could explain.

Finally, if you thought that our lives are going to get easier, be aware

Big Data also supplies more raw material for statistical shenanigans and biased fact-finding excursions. It offers a high-tech twist on an old trick: I know the facts, now let’s find ’em. That is, says Rebecca Goldin, a mathematician at George Mason University, “one of the most pernicious uses of data.”

It’s like the Wild wild west all over again.

Big Recognition for IBM Big Data

IBM’s smarter computing blog talks about Big Recognition for IBM Big Data. From the blog post,

IBM was among the select companies that Forrester invited to participate in The Forrester Wave™: Enterprise Hadoop Solutions, Q1 2012, (February 2, 2012). Technologies evaluated were IBM InfoSphere BigInsights (IBM’s Hadoop-based offering), and IBM Netezza Analytics. In this evaluation, IBM was placed in the Leaders category of the Wave and achieved the highest possible score in both the Strategy and Market Presence segments. In the third segment, Current Offering, IBM received the second highest score.

The Forrester report on the current players in the Big Data space can be downloaded from IBM’s site here.

How Hadoop is revolutionizing…

Business Intelligence and Data Analytics.

This presentation:  How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics by Dr. Amr Awadallah (CTO at Cloudera) delves into how a business can structure Hadoop in its Business Intelligence (BI) and Data Analytics efforts.

The below was the initial thesis:

Pre-Hadoop (and Hadoop like infrastructure) – BI applications access the data that is available in a data store such as a database and a data warehouse and produce actionable items from this data. As time moves on, the data from the data storage gets archived and essentially disappears or dies or gets aggregated/reduced for offline storage.

The below is the new anti-thesis:

Post- Hadoop – The approach here is to have live data available at all times in the raw and/or processed data form.. The Hadoop approach is to take the application to the data – distributed data and distributed applications as well acting and exploring this data.

The reason why the anti-thesis has this form is largely because as data storage has become commoditized (and rather large), data pipes enlarged and data computation rather fast, both computation and pipes have not (and perhaps need not) expanded as much as storage has. At the same time, it has become a human imperative to put out as much junk as possible er,.. be more creative and big data apps and their providers (Facebook, Google etc) have followed suit.

The synthesis – Yet to be.

But here’s a guess. Right-Compute and Right-Data. The premise of Big Compute and Big Data is that in the pile of horse manure, there must be a pony in there somewhere : a white stallion to be sure. As many past Masters (Who is a Master?  – Think Sun Tzu, Newton) will tell you – the objective of dealing with Big Data (and is there any bigger Data out there than the human, natural and metaphysical world) is to elicit the laws that underlie them.

Now in the past, we as the curious ones have depended on intuiting and hypothesizing about the Big Data out there. Today, it seems that we’re done with the hypothesizing and are jumping straight into letting the Data speak for itself.

Right Compute and Right Data is about getting back to the hypothesize and test scheme that has proven remarkably successful in our developmental journey,

Stay tuned…

The Coming Tech-led Boom

The Coming Tech-led Boom is a recent article published in the Wall Street Journal.

In January 2012, we sit again on the cusp of three grand technological transformations with the potential to rival that of the past century. All find their epicenters in America: big data, smart manufacturing and the wireless revolution.

Now, that’s what I call timing because I’ve been staking out the ground on two of those technological transformation – Smarter Manufacturing (on my @ Supply Chain Management blog) and Big Data here on this blog. My views on Smarter Manufacturing are here.

As for Big Data, this is what the authors have to say,

Information technology has entered a big-data era. Processing power and data storage are virtually free. A hand-held device, the iPhone, has computing power that shames the 1970s-era IBM mainframe. The Internet is evolving into the "cloud"—a network of thousands of data centers any one of which makes a 1990 supercomputer look antediluvian. From social media to medical revolutions anchored in metadata analyses, wherein astronomical feats of data crunching enable heretofore unimaginable services and businesses, we are on the cusp of unimaginable new markets.

While much of this is true, does it sound like a prediction? To me, it sounds like the inevitable inference. Except that I think a different actualization of the potential of Big Data. While we’re at the point of Big Data storage, retrieval, analyses, manipulation etc that is not the point of Big Data <anything>.

I see Big Data as a Resource like water or oil for example – a vast landscape to be discovered, molded and valued. That is the new economy – powered by a new resource altogether…

Introducing Hadoop

Hadoop, if you’ve got your ear to the enterprise ground swell, is a big bet that many of the large enterphrises are making for the future – both near term and longer term. Sooner or later, you’re going to be hearing that word dropped just as often and just as ubiquitously as the “cloud” is uttered hither and thither.

So what is Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

As much as that sounds like gobbledygook and your puzzled faces might register the question that don’t these computer things do that even now? Well, Yes and No. Yes, because to a certain extent, we’ve always wanted that ability and we’ve gotten it one way or another. No, because we’ve never had it in the way that Hadoop delivers it. Over the next few posts, I’ll delve deeper into Hadoop as I setup a homegrown Hadoop cluster – a step by step tutorial as to how I went about setting up my cluster.

 

Introducing Hadoop

Hadoop, if you’ve got your ear to the enterprise ground swell, is a big bet that many of the large enterphrises are making for the future – both near term and longer term. Sooner or later, you’re going to be hearing that word dropped just as often and just as ubiquitously as the “cloud” is uttered hither and thither.

So what is Hadoop?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.

As much as that sounds like gobbledygook and your puzzled faces might register the question that don’t these computer things do that even now? Well, Yes and No. Yes, because to a certain extent, we’ve always wanted that ability and we’ve gotten it one way or another. No, because we’ve never had it in the way that Hadoop delivers it. Over the next few posts, I’ll delve deeper into Hadoop as I setup a homegrown Hadoop cluster – a step by step tutorial as to how I went about setting up my cluster.

 

The Zachman Framework

A few days ago, I came across the Zachman Framework for Enterprise Architecture. Perusing the site (and enjoying his views at the same time), he asks a critical question – When you build an airplane or building, there is a systematic way of doing things – drawings, blueprints, simulation etc. How about for an enterprise? What is the systematic way of doing things?

Systematic?

Hah – you’d be laughed out of the room…

You can get a copy of the Framework if you register (free registration) as a member. From an article on the Zachman site titled: Architecture is Architecture is Architecture

There is a universal set of descriptive representations for describing any or all industrial products. It is not mysterious what one dimension of the set of descriptions is as it is derived from the classic six primitive interrogatives that have existed since the origins of language. Answers to the six primitive interrogatives constitute a complete description of anything. Therefore, one set of descriptions includes:

Bills of Material – What the object is made of.

Functional Specs – How the object works.

Drawings – Where the components exist relative to one another.

Operating Instructions – Who is responsible for operation.

Timing Diagrams – When do things occur.

Design Objectives – Why does it work the way it does.

In many ways, that is the purpose of this blog – How should enterprises be even as we toy with the need for Big Data and its associated technologies?

The problem with Enterprise Software

The central problem of Enterprise Software is the Enterprise Software Makers. Think about it for a second – anything new in the Enterprise Software space is destined either to fail competitively or be acquired by one of the big boys if successful. The only progress possible then is through the mish mash of new and dying software mashups that live within the firms that acquire them.

There must be another way – this blog is primarily about another way!!!