Evernote’s argument for delivering a REST-less API leaves me unimpressed.

The Evernote API is notable because it is not based on REST. The defense of that decision leaves me unimpressed.

When the world is going to REST, fully open and usable APIs, why would Evernote go the other way? They ought to have a good reason. Evernote’s VP of Platform Strategy Seth Hitchings has something to say about it. According to the article on ProgrammableWeb,…

Hitchings concedes that compared to the RESTful APIs, developers have to endure a bit of a learning curve to make use of the SDKs’ core functionality; to create, read, update, search, and delete Evernote content. But then again, according to Hitchings, Evernote is a special
needs case

OK, so it’s more work for the consuming developers. It’s also more work for the company, because they have to support all the various “SDKs”, as they call them. [Evernote delivers libraries for various platforms including iOS, Android, C#, PHP, JavaScript, and more. They call these things “SDKs”, but they’re really not SDKs. An SDK is a Kit, that includes a libraries, documentation, example code, tools, and other stuff. When Evernote uses the word “SDK” they mean “library.”] So… why? Why do it if everyone has to do more work?

Seeking the least compromise to data-transfer performance, Evernote needed a solution that could shuffle large quantities of data with minimal overhead. Despite its superior efficiency over XML, REST still wasn’t good enough.

Whoa. REST has “superior efficiency over XML”? That’s just nonsense. REST is not a data format. REST is an architectural approach. REST does not mean “not XML”. If you want to transfer XML data using the REST approach, go ahead. That’s why Roy Fielding, Tim Berners-Lee, and Henrik F. Nielsen invented the Content-Type header. That’s what MIME types are for. You can transfer XML, or binary, or any sort of data with REST.

The implicit and incorrect assumption is that REST implies JSON, or that REST implies not binary. That’s false. There is no need to avoid REST in order to attain reasonable data transfer performance.

According to the article, that faulty reasoning is why Evernote selected Apache Thrift. Furthermore, as a benefit!! Thrift has tools to generate libraries for many platforms:

Thrift’s code-generating ability to write-once-and-deploy-to-many is also the reason Evernote is able to offer SDKs for so many platforms.

Yippee! But guess what! If you used REST, you wouldn’t need to generate all those libraries. And you’d have even broader platform support.

Just for fun, let’s have a look at the API that is being generated via Thrift. The Evernote API Reference looks like this:

OMG, the horror. Look at all that stuff. The reason people like REST is that they can figure out the data model just by perusing the URLs. It’s obviously not possible to do so in this case.

Evernote’s is not a modern API. It is a mass of complexity.

Not impressed.

Not Impressed

The way Azure should have done it – A better Synonyms Service

This is a followup from my previous post, in which I critiqued the simple Synonyms Service available on the Azure Datamarket.

To repeat, the existing URI structure for the service is like this:

GET https://api.datamarket.azure.com/Bing/Synonyms/GetSynonyms?Query=%27idiotic%27

How would I do things differently?

The hostname is just fine – there’s nothing wrong with that. So let’s focus on the URI path and the other parts.

GET /Bing/Synonyms/GetSynonyms?Query=%27idiotic%27

Here’s what I would do differently.

  1. Simplify. The URI structure should be simpler. Eliminate Bing and GetSynonyms from the URI path, as they are completely extraneous. Simplify the query parameter. Eliminate the url-encoded quotes when they are not necessary. Result: GET /Synonyms?w=improved
  2. Add some allowance for versioning. GET /v1/Synonyms?w=positive
  3. Allow the caller to specify the API Key in the URI. (Eliminate the distorted use of HTTP Basic Auth to pass this information). GET /v1/Synonyms?w=easy&key=0011EEBB4477

What this gets you, as an API provider:

  1. This approach allows users to try the API from a browser or console without registering. The service could allow 3 requests per minute, or up to 30 requests per day, for keyless access. Allowing low-cost or no-cost exploration is critical for adoption.
  2. The query is as simple as necessary and no simpler. There is no extraneous Bing or GetSynonyms or anything else. It’s very clear from the URI structure what is being requested. It’s “real” REST.

What about multi-word queries? Easy: just URL-encode the space.
GET /v1/Synonyms?w=Jennifer%20Lopez&key=0011EEBB4477

There’s no need to add in url-encoded quotes for every query, in order to satisfy the 20% case where the query involves more than one word. In fact I don’t think multi-word would even be 20%. Maybe more like 5%.

For extra credit, do a basic content negotiation that looks at the incoming Accepts header and modifies the format of the result based on that header. As an alternative, you could include a suffix in the URI path, to indicate the desired output data format, as Twitter and the other big guys do it:

GET /v1/Synonyms.xml?w=adaptive&key=0011EEBB4477

GET /v1/Synonyms.json?w=adaptive&key=0011EEBB4477

As an API provider, conforming to a “pragmatic REST” approach means you will deliver an API that is immediately familiar to developers regardless of the platform they use to submit requests. That means you have a better chance to establish a relationship with those developers, and a better chance to deepen that relationship.

That’s why it’s so important to get the basic things right.

Azure Synonyms Service – How NOT to do REST.

Recently, I looked on the Azure data market place (or whatever it’s called) to see what sort of data services are available there. I didn’t find anything super compelling. There were a few premium, for-fee services that sounded potentially interesting but nothing that I felt like spending money on before I could try things out.

As I was perusing, I found a synonyms service. Nice, but this is nothing earth-shaking. There are already a number of viable, programmable synonyms services out there. Surely Thesaurus.com has one. I think Wolfram Alpha has one. Wordnik has one. BigHugeLabs has one that I integrated with emacs. But let’s look a little closer.

Let me show you the URL structure for the “Synonyms” service available (as “Community Technical Preview”!) on Azure.


https://api.datamarket.azure.com/Bing/Synonyms/GetSynonyms?Query=%27idiotic%27

Oh, Azure Synonyms API, how do I NOT love thee? Let me count the ways…

  1. There’s no version number. What if the API gets revised? Rookie mistake.
  2. GetSynonyms? Why put a verb in the URI path, when the HTTP verb “GET” is already implied by the request? Useless redundancy. If I call GET on a URI path with the word “Synonyms” in it, then surely I am trying to get synonyms, no?
  3. Why is the word Bing in there at all?
  4. Notice that the word to get synonyms of, must be passed with the query param named “Query”. Why use Query? Why not “word” or “term” or something that vaguely corresponds to the actual thing we’re trying to do here? Why pass it as a query param at all? Why not simply as part of the URL path?
  5. Also notice that the word must be enclosed in quotes, which themselves must be URL-encoded. That seems like an awkward design.
  6. What you cannot see in that URL is the authentication required. Azure says the authentication is “HTTP Basic Auth” which means you pass a username and password pair, joined by a colon then base64 encoded, as an HTTP Header. But… there is no username and password. Bing/Azure/Microsoft gives you an API Key, not a user name. And there’s no password. So you need to double the API key then base64 encode *that*, and pretend that it’s HTTP Basic Auth.

If readers aren’t persuaded that the above are evidence of poor API design, then you might consider heading over to the API Craft discussion group on Google Groups to talk it over.

Alternatively, or in addition, spend some time reading “the REST Manifesto,” Roy Fielding’s thesis paper, specifically chapter 5 in that document. It’s about 18 printed pages, so not too big a commitment.

The problem with releasing a poorly-designed API, is that it can do long-term damage.
As soon as a real developer takes a look at your service, he will not walk, he’ll RUN away to an alternative service. If your API is a pain to use, or is poorly designed, you are guaranteed to drive developers somewhere else. And they won’t come back! They might come just to poke around, but if they see a bad service, like this Synonyms service, they will flee, never to return. They will quickly conclude that you just don’t get it, and who could blame them?

So learn from Azure’s mistakes, and learn from the success of others. Take the time to get it right.

And now a word from my sponsor: Apigee offers a Rapid API Workshop service where we can send in experts to collaborate with your team on API design principles and practice. Contact us at sales@Apigee.com for more information.

Yes, it’s trite, but we really are at an Inflection Point

It may sound like a platitude, but…the industry is now in the midst of an inflection point.

Behind us is the technology of client-server, with web goo glommed-on:

  • UI rendered to fixed computers, implemented using HTML(5) and Javascript.
  • Application logic built in Servlet/JSP, PHP, or ASPNET.
  • Relational databases as a store. Data is accessed via datastore-specific protocols.

Ahead are pure web technologies:

  • UI rendered to mobile computers, and optimized for device capability. Android, iPhone, iPad, and Windows8 are the key options, but more will emerge. The Kindle, XBox, and PS3 are the up-and-comers. The HTML-based web-browser UI will remain as a least-common denominator for some time, but there’s a steady trend away.
  • Application logic built in dynamic languages. Ruby-on-Rails, PHP, Python. Javascript was the first web app server language (Netscape Live server in 1995 and ASP Classic in 1996) and it is now back, with Node.js.
  • Data stores using NoSQL databases with massive scaleout. Data is accessed over HTTP, via REST.

Remember when “Scale” meant a really large box with lots of CPUs in it? We’ve moved to farms of managed computers that accomplish the same thing. Rather than depending on the hardware design to support the scale out, we’ve now done it in software. Rather than relying on the CPU front side bus to move data around, we’re depending on 40GBps or even 100GBps ethernet and software-based data-dependent prioritization and routing.

The force behind the economy of scale of standard high-volume components has not abated. If you wanted to build a superfast computer for one moment in time you might resort to some custom hardware. But the pace of evolution and improvement in CPU, memory, storage, and networking is such that the value of any dedicated hardware declines rapidly, even during design. It makes no economic sense to pursue the scale-up course. Designs need to accommodate evolution in the ecosystem. Just as the “Integrated” vendor-specific computers of the late 80’s gave way to “open systems”, the integrated single computer model is giving way to the “farm of resources” model.

This is all obvious, and has been for some time. Companies like Google were ahead of the curve, and dragged the rest of the industry with them, but now architectures based on the idea that “the datacenter is the computer” are now available for low cost to just about everyone. These architectures have elastic compute, network, and storage, along with the software architecture to exploit it. The upshot is you can just add resources and you get additional, usable performance. Unlike the old “scale up” machines, this approach is not limited to 16 CPUs or 64 or 1024. Just keep going. People call it “cloud technology”, but the main point is elasticity.

The inflection point I spoke about is not defined by a particular month, like Novermber 2012. or even a year. But over the past 6 years, this transition has been slowly, inexorably proceeding.

The one missing piece to the puzzle has been management skills and tools; The gear was there, and the software has continued to improve to exploit the gear, but people were initially not comfortable with managing it. This is dissipating over time, as people embrace the cloud. We’re realizing that we no longer need to perform {daily,weekly} backups because the data is already stored redundantly in Cassandra.

Even as cloud technology gets democratized, drops in price, and becomes more manageable, the capability of a single high-volume server computer continues to ramp upward on a log scale. This means that the old “automation” tasks, tracking orders, ERP systems (whether custom or not)… will be fulfilled by single machines, with optional redundancy.

Cloud technology therefore presents a couple opportunities:

  • For technology conservatives, where IT is a cost center, the maturation of cloud tech drops the cost of deploying new systems, and of handling peak load. A company can purchase short-term options for compute to handle the proverbial “black friday” or “Victoria’s Secret Fashion show” load. This opportunity is somewhat conflated with the ongoing drop in the cost of technology. Basically, the cost is dropping, and it is dropping even more if you let someone else host your servers.
  • For companies that view technology as a business enabler, cloud tech allows them to pursue innovative new approaches for relatively low cost and risk. New partner-enabling initiatives; new channels; new markets or new approaches to the markets they already play in.

Without a doubt, the big payoffs come from the latter, expansive approach. You can’t grow by cutting costs. You *can* grow by increasing speed or efficiency – let’s say, dropping your turn-time on commercial loan approvals from 52 days to 22 days – but the big growth is in entirely new approaches.

But you don’t know what you don’t know. To uncover and develop the opportunities, companies need to dive in. They need to be pushing beyond their normal competencies, learning new things.

MongoDB booster would prefer Cassandra, if only he could store JSON in it. Have I got a data store for you!

Interesting article at GigaOM interviewing MongoLab Founder and CEO Will Shulman. GigaOM reports:

MongoLab operates under a thesis that MongoDB is pulling away as the world’s most-popular NoSQL database not because it scales the best — it does scale, Shulman said, but he’d actually choose Cassandra if he just needed a multi-petabyte data store without much concern over queries or data structure — but because web developers are moving away from the relational format to an object-oriented format.

Interesting comment. My spin alarm went off with the fuzz-heavy phrasing “…operates under a thesis…” I’ll buy that developers are moving away from relational and towards simpler data storage formats that are easier to use from dynamic scripting languages. But there is no evidence presented in support of the conclusion that “MongoDB is pulling away.” GigaOM just says that this is MongoLab’s “thesis”.

In any case, the opinion of Shulman that Cassandra scales much better than MongoDB leads to this question: If the key to developer adoption is providing the right data structures, then why not just build the easy-to-adopt object store on the existing proven-to-scale backend? Why build another backend if that problem has been solved by Cassandra?

Choosing to avoid this question, the creators of MongoDB have only caused people to ask it more insistently.

The combination of developer-friendly data structure and highly-scalable backend store has been done. You can get the scale of Cassandra and the easy of use of a JSON-native object store. The technology is called App Services, and it’s available from Apigee.

In fact, App Services even offers a network interface that is wire-compatible with existing MongoDB clients (somebody tell Shulman); you can keep your existing client code and just point it to App Services.

With that you can get the nice data structure and the vast scalability.

Thank you, Ed Anuff.

“No technology can ever be too arcane…”

From an ironic fictional interview with Linus Torvalds on TypicalProgrammer, via @ckindel.

Q: You released the Git distributed version control system less than ten years ago. Git caught on quickly and seems to be the dominant source code control system, or at least the one people argue about most on Reddit and Hacker News.

A: Git has taken over where Linux left off separating the geeks into know-nothings and know-it-alls. I didn’t really expect anyone to use it because it’s so hard to use, but that turns out to be its big appeal. No technology can ever be too arcane or complicated for the black t-shirt crowd.

Q: I thought Subversion was hard to understand. I haven’t wrapped my head around Git yet.

A: You’ll spend a lot of time trying to get your head around it, and being ridiculed by the experts on github and elsewhere. I’ve learned that no toolchain can be too complicated because the drive for prestige and job security is too strong.

We’ve all seen that phenomenon. On the other hand, some situations demand more complex solutions, unpleasant as that fact may seem. One cannot build a robot without a sophisticated control system. One cannot build an internet-scale social app without some sort of fault-tolerant distributed data storage infrastructure.

The trick is determining to what degree the complexity is necessary, and to what degree the complexity is self-sustaining due to the prestige and job security factors.

Hardware is Dead! Tablets will Explode!

Jay Goldberg writing for VentureBeat reports that he purchased a 7″ no-name touchscreen tablet, with 4g ram, Wifi, Android Ice Cream Sandwich, for $45 without haggling in Shenzhen, China. A revelation, he says.

  • Hardware margins are under siege. Making money on hardware is not a long-term defensible position. Companies that hope to make money need to market an “experience”. IOS is one such “experience”.
  • The number of different types of tablets will explode, and the number of actual tablets will explode. Derivative special-purpose devices based on tablet hardware will also explode. Touch-screens on your fridge, that sort of thing.

By the way: Hardware is Dead! Tablets will explode! It all sounds so apocalyptic. Why do we use such terms when discussing the technology business. It’s almost like we’re trying to scare ourselves.

This latter prediction – that the number of computing devices in the tablet form-factor will explode – isn’t really new. Business Insider made the same prediction a month ago.

Similarly, Fortune magazine ran a headline in February of this year about the coming explosion in tablets. In April, Forrester Research predicted the explosion, too, though by estimating 760M tablets in use by 2016, Forrester appears to have actually underestimated the trend.

Mr Goldberg seems willing to make the easy predictions, echoing all the people who came before him. He also doesn’t offer any deeper insight. The rapid growth in the popularity of tablet-based hardware may be the interesting headline, but to me, the implications are much broader.

  • a huge rise in the demand for apps. I am not one who imagines that the touchscreen in the door of your mom’s fridge needs access to an App Store. There is no need for a general purpose computing experience embedded in refrigerator doors. On the other hand, computers in refrigerators needs to run an app, a very specific app. So the number of apps will explode.
  • Specialized apps must be created by specialized developers. Extrapolate from the refrigerator into all the other specialized embedded systems, for all the other specialized user experiences. The demand for talented application developers will also explode.
  • The complement to apps and developers of course is cloud APIs, compute, and storage. Expect huge demand in all of these pieces, in direct correlation to the number of tablets sold.

But, I would say that, wouldn’t I? I work for the leading API Management company. True enough – I am biased. But I had this view before beginning my job here. I knew the need for apps and storage and cloud compute was exploding. I am an investor, though not one with a particularly large store of liquid assets. What I invest is my time, and I chose to work for Apigee because I believe it’s a good investment of my valuable time.

AWS “High I/O” EC2 instances

A while back I commented on Amazon’s DynamoDB and disagreed with the viewpoint from HighScalability.com that using SSD for storage was a “radical step.” In my comments, I predicted that

We will see SSD replace mag disk as a mainstream storage technology, sooner than most of us think.

And,

Amazon will just fold [SSD] into its generic store.

Now, Amazon has announced the availability of “high I/O” instances of EC2. They offer 2 TB of local SSD-backed storage, visible to the OS as a pair of 1 TB volumes.
The SSD storage is local to the instance.

Was that sooner than you thought?

Next question:  which compute tasks are not well-suited to deployment on “high I/O” instances of EC2?

The only reason Amazon describes these instances as “high I/O” is that they have a ton of existing magnetic disk already deployed.  We should all begin to think of  SSD-backed storage as “standard”, and magnetic platters as “low I/O”. People will rapidly refuse to pay the magnetic disk tax. It’s silly to pay for CPU that is spent waiting for heads to meet up with the appropriate location on a magnetic platter.

Going forward, the “High I/O” moniker will disappear, as it will be cheaper for Amazon to deploy and operate SSD. There may be a price premium today for “High I/O” but that is driven by temporary scarcity, not by actual operational costs.

What Amazon will do with all its magnetic drives is an open question, but be assured it will turn them off. The savings in A/C costs alone, associated to dissipating the heat generated by mechanical drives, will compel Amazon to transition rapidly to full SSD.

Resources:

 

Google’s Compute Engine: do you believe it?

Google has become the latest company to offer VM hosting, joining Microsoft (Azure) and Amazon (AWS), along with all the other “traditional” hosters.

Bloomberg is expressing skepticism that Google will stick with this plan.  Who can blame them? If I were a startup, or another company considering a VM hoster decision, I’d wonder: Does Google really want to make money in this space, or is it just trying to take mindshare away from Amazon and Microsoft?

Google still makes 97% of its revenue and a similar proportion of its profit from advertising. Does cloud computing even matter to them? You might say that Amazon is similar: the company gets most of its revenue from retail operations. On the other hand, Jeff Bezos has repeatedly said he is investing in cloud compute infrastructure for the long haul, and his actions speak much louder than those words. Clearly Amazon is driving the disruption. Microsoft for its part is serious about cloud because competing IaaS threatens its existing core business. Microsoft needs to do well in cloud.

As for Google – Do they even care whether they do well with their IaaS offering?

Bloomberg’s analysis resonates with me. Google has sprinkled its magic pixie dust on many vanity projects: phone OS, tablets, blogging, Picasa, web browsers, social networking, Go (the programming language). How about Sketchup? But it really doesn’t matter if any of those projects succeed. All of them added up together are still irrelevant in the shadow of Google’s Ad revenue. The executive management at Google know this, and act accordingly.

Would you bet on a horse no-one cares about?

 

Sauce Labs explains its move from NoSQL CouchDB to old-skool MySQL

Sauce Labs has rendered a valuable service to the community by documenting the factors that went into a decision to change infrastructure – to replace CouchDB with MySQL.

Originally the company had committed  to CouchDB, which is a novel NoSQL store originating from a team out of MIT.  CouchDB is termed a “document store” and if you are a fan of REST and JSON, this is the NoSQL store for you.

Every item in CouchDB is a map of key-value pairs of arbitrarily deep nesting.  Apps retrieve objects via a clean REST api, and the data is JSON.  Very nice, easy to adopt, and, with the ubiquity of json parsers, CouchDB is easy to access from any language or platform environment. Speaking of old-school, I built a demo connecting from Classic ASP / Javascript to CouchDB – it was very easy to produce.  I also did a small client in PHP, C#, Python – all of them are 1st class clients in the world of CouchDB.

It really is a very enjoyable model for a developer or systems architect.

For Sauce Labs, though, the bottom line was – drumroll, please – that CouchDB was immature.  The performance was not good. Life with incremental indexes was … full of surprises.  The reliability of the underlying data manager was substandard.

Is any of this surprising?

And, MySQL is not the recognized leader in database reliability and enterprise readiness, which makes the move by Sauce Labs even more telling.

Building blocks of infrastructure earn maturity and enterprise-readiness through repeated trials.  Traditional relational data stores, even open source options, have been tested in real-world,  high-load, I-don’t-want-to-be-woken-up-at-4am scenarios. Apparently CouchDB has not.

I suspect something similar is true for other NoSQL technologies, including MongoDB, Hadoop, and Cassandra. I don’t imagine they would suffer from the same class of reliability problems reported by Sauce Labs. Instead, these pieces lack maturity and fit-and-finish in other ways. How difficult is it to partition your data? What are the performance implications of structuring a column family a certain way? What kind of network load should I expect for a given deployment architecture? These are questions that are not well understood in the NoSQL world.  Not yet.

Yes, some big companies run their businesses on Hadoop and other NoSQL products. But chances are, those companies are not like yours. They employ high-dollar experts dedicated to making those things hum. They’ve pioneered much of the expertise of using these pieces in high-stress environments, and they paid well for the privilege.

Is NoSQL ready for the enterprise?

Ready to be tested, yes. Ready to run the business?  Not yet.

In any case, it’s very valuable for the industry to get access to such public feedback. Thanks, Sauce Labs.