Hadoop Adoption

Interesting. Michael Stonebraker, who has previously expressed skepticism regarding the industry excitement around Hadoop, has done it again.

Even at lower scale, it is extremely eco-unfriendly to waste power using an inefficient system like Hadoop.

Inefficient, he says!   Pretty strong words.  Stonebraker credits Hadoop for democratizing large-scale parallel processing. But he predicts that Hadoop will evolve radically to become a “true parallel” DBMS, or will be replaced.  He’s correct in noting that Google have moved away from MapReduce, in part.  Stonebraker describes some basic architectural elements of  MapReduce that, he says, represent significant obstacles for a large proportion of real-world problems.  He says that existing parallel DBMS systems have a performance advantage of 1-2 orders of magnitude over MapReduce. Wow.

It seems to me that, with Hadoop, companies are now exploring and exploiting the opportunity to keep and analyze massive quantities of data they had previously just discarded. If Stonebraker is right, they will try Hadoop, and then move to something else when they “hit the wall”.

I’m not so sure. The compounded results of steady development over time can bring massive improvements to any system. There is so much energy being invested in Hadoop that it would be foolhardy to discount its progress.

Companies used to “hit the wall” with simple so-called “2 tiered” RDBMS deployments.  But steady development over time, of hardware and software, has moved that proverbial wall further  and further out. JIT compilation and garbage collection used to be impractical for high-performance systems.  This is no longer true. And the same is true with any sufficiently developed technology.

As I’ve said before on this blog, I don’t think Hadoop and  MapReduce are ready today for broad, mainstream use.  That is as much a statement about the technology as it is about the people who are potential adopters.  On the other hand I do think these technologies hold great promise, and they can be exploited today by leading teams.

The big data genie is out of the bottle.

Apigee’s Best Practices for REST API design

I just read Apigee’s paper on pragmatic RESTful API design.

Very sensible, practical guidance. Good stuff for organizations confronting the REST phenomenon.  There are obviously many REST-based interfaces out there. Facebook, Google, Digg, Reddit, LinkedIn are just a few of the more visible services, coincidentally all social networks, that support REST.  But of course there is real value for enterprises in exposing resources in the same way. Wouldn’t it be nice if public records would be exposed by your municipal government via REST?  How many times have you wanted the data from a hosted app – what we used to call “application service provider” – in a machine-comprehensible format, instead of in an HTML page?

It’s worth examining the results the pioneers have achieved, to benefit from their experience.

As pioneers rushing to market, the designers of these early social network APIs may have sacrificed some quality in design, for speed of delivery.  Understandable. Apigee’s paper critiques some of those designs, and describes some of the rough edges. It’s like sitting in on a design review – and it’s an excellent way to learn.

Once you “get” REST, it all makes sense. It falls into place and the design principles and guidance offered by Apigee will seem like second nature. But for those grappling with a novel problem, it’s good to have a firm foundation from which to start.