I’m a technologist. I believe technology, well utilized, can advance business goals. A business can derive a signficant advantage from making the right technology moves, exploiting information in just the right way.
But I am a bit skeptical of the excitement in the industry around Big Data, MapReduce, and Hadoop. While Google obviously has derived great benefit from MapReduce over the years, Google is special. Most businesses do not look like Google, and do not have information management requirements that are similar to Google’s. Google custom-constructs their PCs. At Google, the unit of computer hardware deployment is “the warehouse.”
If you underwrite insurance, or process medical records, or do scanning of transactions for fraud, or logistics optimization, or statistical process control, or any one of a variety of other typical business information tasks, your company is very much not like Google. If you don’t have hundreds of millions of users, generating billions of transactions, then you’re not like Google, and you should not try to emulate their technology strategy. Big Table is not for you, MapReduce is not something that will give you a strategic advantage.
Big Data seems to be the industry’s next touchstone. Everyone feels they need th “check the box.” There’s lots of interest by buyers, so vendors believe they need to talk about it. The tech press, with their persistently positive view of Google, encourages this. Breathless analyst reports fuel the flames. CS programs at universities teach MapReduce in 1st-year courses. Devs put MapReduce on their resume. All this combines to produce a self-reinforcing cycle.
But for most CIOs, MapReduce is a distraction. In this view, I am persuaded by Dewitt, Stonebraker et al. CIOs should be focusing on figuring out how to better utilize the databases they already have. Figure out cloud, and figure out how to improve management and governance of IT projects. Are you agile enough? Are you doing Scrum? Figure out what major pieces you can buy from your key technology partners.
I have read user stories of people using MapReduce to scan through log files, tens of gigabytes of log files. Seriously? Tens of gigabytes fits on a laptop hard-drvie. Unless you are talking about multiple terabytes of information, MapReduce is probably the wrong tool.
If you are doing analysis of the human genome, or weather modelling, or if you work for NSA or Baidu, then yes, you need MapReduce. Otherwise, Big Data is not yet mainstream.