Hadoop Adoption

Interesting. Michael Stonebraker, who has previously expressed skepticism regarding the industry excitement around Hadoop, has done it again.

Even at lower scale, it is extremely eco-unfriendly to waste power using an inefficient system like Hadoop.

Inefficient, he says!   Pretty strong words.  Stonebraker credits Hadoop for democratizing large-scale parallel processing. But he predicts that Hadoop will evolve radically to become a “true parallel” DBMS, or will be replaced.  He’s correct in noting that Google have moved away from MapReduce, in part.  Stonebraker describes some basic architectural elements of  MapReduce that, he says, represent significant obstacles for a large proportion of real-world problems.  He says that existing parallel DBMS systems have a performance advantage of 1-2 orders of magnitude over MapReduce. Wow.

It seems to me that, with Hadoop, companies are now exploring and exploiting the opportunity to keep and analyze massive quantities of data they had previously just discarded. If Stonebraker is right, they will try Hadoop, and then move to something else when they “hit the wall”.

I’m not so sure. The compounded results of steady development over time can bring massive improvements to any system. There is so much energy being invested in Hadoop that it would be foolhardy to discount its progress.

Companies used to “hit the wall” with simple so-called “2 tiered” RDBMS deployments.  But steady development over time, of hardware and software, has moved that proverbial wall further  and further out. JIT compilation and garbage collection used to be impractical for high-performance systems.  This is no longer true. And the same is true with any sufficiently developed technology.

As I’ve said before on this blog, I don’t think Hadoop and  MapReduce are ready today for broad, mainstream use.  That is as much a statement about the technology as it is about the people who are potential adopters.  On the other hand I do think these technologies hold great promise, and they can be exploited today by leading teams.

The big data genie is out of the bottle.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.