The way Azure should have done it – A better Synonyms Service

This is a followup from my previous post, in which I critiqued the simple Synonyms Service available on the Azure Datamarket.

To repeat, the existing URI structure for the service is like this:

GET https://api.datamarket.azure.com/Bing/Synonyms/GetSynonyms?Query=%27idiotic%27

How would I do things differently?

The hostname is just fine – there’s nothing wrong with that. So let’s focus on the URI path and the other parts.

GET /Bing/Synonyms/GetSynonyms?Query=%27idiotic%27

Here’s what I would do differently.

  1. Simplify. The URI structure should be simpler. Eliminate Bing and GetSynonyms from the URI path, as they are completely extraneous. Simplify the query parameter. Eliminate the url-encoded quotes when they are not necessary. Result: GET /Synonyms?w=improved
  2. Add some allowance for versioning. GET /v1/Synonyms?w=positive
  3. Allow the caller to specify the API Key in the URI. (Eliminate the distorted use of HTTP Basic Auth to pass this information). GET /v1/Synonyms?w=easy&key=0011EEBB4477

What this gets you, as an API provider:

  1. This approach allows users to try the API from a browser or console without registering. The service could allow 3 requests per minute, or up to 30 requests per day, for keyless access. Allowing low-cost or no-cost exploration is critical for adoption.
  2. The query is as simple as necessary and no simpler. There is no extraneous Bing or GetSynonyms or anything else. It’s very clear from the URI structure what is being requested. It’s “real” REST.

What about multi-word queries? Easy: just URL-encode the space.
GET /v1/Synonyms?w=Jennifer%20Lopez&key=0011EEBB4477

There’s no need to add in url-encoded quotes for every query, in order to satisfy the 20% case where the query involves more than one word. In fact I don’t think multi-word would even be 20%. Maybe more like 5%.

For extra credit, do a basic content negotiation that looks at the incoming Accepts header and modifies the format of the result based on that header. As an alternative, you could include a suffix in the URI path, to indicate the desired output data format, as Twitter and the other big guys do it:

GET /v1/Synonyms.xml?w=adaptive&key=0011EEBB4477

GET /v1/Synonyms.json?w=adaptive&key=0011EEBB4477

As an API provider, conforming to a “pragmatic REST” approach means you will deliver an API that is immediately familiar to developers regardless of the platform they use to submit requests. That means you have a better chance to establish a relationship with those developers, and a better chance to deepen that relationship.

That’s why it’s so important to get the basic things right.

Azure Synonyms Service – How NOT to do REST.

Recently, I looked on the Azure data market place (or whatever it’s called) to see what sort of data services are available there. I didn’t find anything super compelling. There were a few premium, for-fee services that sounded potentially interesting but nothing that I felt like spending money on before I could try things out.

As I was perusing, I found a synonyms service. Nice, but this is nothing earth-shaking. There are already a number of viable, programmable synonyms services out there. Surely Thesaurus.com has one. I think Wolfram Alpha has one. Wordnik has one. BigHugeLabs has one that I integrated with emacs. But let’s look a little closer.

Let me show you the URL structure for the “Synonyms” service available (as “Community Technical Preview”!) on Azure.


https://api.datamarket.azure.com/Bing/Synonyms/GetSynonyms?Query=%27idiotic%27

Oh, Azure Synonyms API, how do I NOT love thee? Let me count the ways…

  1. There’s no version number. What if the API gets revised? Rookie mistake.
  2. GetSynonyms? Why put a verb in the URI path, when the HTTP verb “GET” is already implied by the request? Useless redundancy. If I call GET on a URI path with the word “Synonyms” in it, then surely I am trying to get synonyms, no?
  3. Why is the word Bing in there at all?
  4. Notice that the word to get synonyms of, must be passed with the query param named “Query”. Why use Query? Why not “word” or “term” or something that vaguely corresponds to the actual thing we’re trying to do here? Why pass it as a query param at all? Why not simply as part of the URL path?
  5. Also notice that the word must be enclosed in quotes, which themselves must be URL-encoded. That seems like an awkward design.
  6. What you cannot see in that URL is the authentication required. Azure says the authentication is “HTTP Basic Auth” which means you pass a username and password pair, joined by a colon then base64 encoded, as an HTTP Header. But… there is no username and password. Bing/Azure/Microsoft gives you an API Key, not a user name. And there’s no password. So you need to double the API key then base64 encode *that*, and pretend that it’s HTTP Basic Auth.

If readers aren’t persuaded that the above are evidence of poor API design, then you might consider heading over to the API Craft discussion group on Google Groups to talk it over.

Alternatively, or in addition, spend some time reading “the REST Manifesto,” Roy Fielding’s thesis paper, specifically chapter 5 in that document. It’s about 18 printed pages, so not too big a commitment.

The problem with releasing a poorly-designed API, is that it can do long-term damage.
As soon as a real developer takes a look at your service, he will not walk, he’ll RUN away to an alternative service. If your API is a pain to use, or is poorly designed, you are guaranteed to drive developers somewhere else. And they won’t come back! They might come just to poke around, but if they see a bad service, like this Synonyms service, they will flee, never to return. They will quickly conclude that you just don’t get it, and who could blame them?

So learn from Azure’s mistakes, and learn from the success of others. Take the time to get it right.

And now a word from my sponsor: Apigee offers a Rapid API Workshop service where we can send in experts to collaborate with your team on API design principles and practice. Contact us at sales@Apigee.com for more information.

Google’s Compute Engine: do you believe it?

Google has become the latest company to offer VM hosting, joining Microsoft (Azure) and Amazon (AWS), along with all the other “traditional” hosters.

Bloomberg is expressing skepticism that Google will stick with this plan.  Who can blame them? If I were a startup, or another company considering a VM hoster decision, I’d wonder: Does Google really want to make money in this space, or is it just trying to take mindshare away from Amazon and Microsoft?

Google still makes 97% of its revenue and a similar proportion of its profit from advertising. Does cloud computing even matter to them? You might say that Amazon is similar: the company gets most of its revenue from retail operations. On the other hand, Jeff Bezos has repeatedly said he is investing in cloud compute infrastructure for the long haul, and his actions speak much louder than those words. Clearly Amazon is driving the disruption. Microsoft for its part is serious about cloud because competing IaaS threatens its existing core business. Microsoft needs to do well in cloud.

As for Google – Do they even care whether they do well with their IaaS offering?

Bloomberg’s analysis resonates with me. Google has sprinkled its magic pixie dust on many vanity projects: phone OS, tablets, blogging, Picasa, web browsers, social networking, Go (the programming language). How about Sketchup? But it really doesn’t matter if any of those projects succeed. All of them added up together are still irrelevant in the shadow of Google’s Ad revenue. The executive management at Google know this, and act accordingly.

Would you bet on a horse no-one cares about?

 

Enderle on Microsoft’s New Tack

Rob Enderle demonstrates his fondness for dramatic headlines with his piece, The Death and Rebirth of Microsoft.  A more conservative editor might headline the same piece, “Microsoft Steadily Shifts its Strategy.”

Last week, Microsoft (Nasdaq: MSFT) effectively ended the model that created it. This shouldn’t have been a surprise, as the model hasn’t been working well for years and, as a result, Microsoft has been getting its butt kicked all over the market by Apple (Nasdaq: AAPL).

Well Microsoft apparently has had enough, and it decided to make a fundamental change and go into hardware.

Aside from the hyperbole, Mr Enderle’s core insight is correct: Microsoft is breaking free of the constraints of its original, tried-and-true model, the basis of the company for years. Under than plan, Microsoft provided the software, someone else provided the hardware. Surface is different: it’s Microsoft hardware, and it signifies a major step toward the company’s ability to deliver a more integrated Microsoft experience on thin and mobile devices. This aspect of the Surface announcement was widely analyzed.

This is what you may not have noticed: Azure is the analogous step on servers. With Azure, Microsoft can deliver IT infrastructure to mid-market and enterprise companies, without the  dependence on OEM partners, nor on the ecosystem that surrounds the phenomenon of OEM hardware installation – the networking and cabling companies, the storage vendors, the management software vendors and so on.

Just as Surface means Microsoft is no longer relying upon HP or Acer to manufacture and market cool personal hardware, and the rumored Microsoft handset would mean that Microsoft won’t be beholden to Nokia and HTC, Azure means Microsoft will not need to rely on Dell or HP or IBM to produce and install server hardware.

That is a big change for a company that was built on a strategy of partnering with hardware vendors. But times are different now. Microsoft is no longer purely software. In fact it is outgrowing its name, just as “International Business Machines” as a name has lost its meaning for a company that brings in 57% of its revenue through services. But while this is a big step, it’s not an a black-and-white thing. Microsoft maintains relationships with OEMs, for PCs, laptops, mobile devices and servers, and that will continue.  Surface and Azure are just one step away from purity of that model.

Microsoft’s Azure,  and Amazon’s AWS too, presents the opportunity for companies to completely avoid huge chunks of capital cost associated to IT projects; companies can  pay a reasonable monthly fee for service, rather than undertaking a big investment and contracting with 4 or 5 different vendors for installation. That’s a big change.

Very enticing for a startup, or a small SaaS company.

Mark Russinovich #TechEd on Windows Azure VM hosting and Virtual Networking

A video of his one-hour presentation with slides + demo.

Originally Windows Azure was a Platform-as-a-Service offering; a cloud-hosted platform. This was a new platform, something like Windows Server, but not Windows Server. There was a new application model, a new set of constraints. With the recent announcement, Microsoft has committed to running arbitrary VMs. This is a big shift towards what people in the industry call Infrastructure-as-a-Service.

Russinovich said this with a straight face:

One of the things that we quickly realized as people started to use the [Azure] platform is that they had lots of existing apps and components that they wanted to bring onto the platform…

It sure seems to me that Russinovich has put some spin into that statement.  It’s not the case that Microsoft “realized” customers would want VM hosting.  Microsoft knew very well that customers,  enterprises in particular, would feel confidence in a Microsoft OS hosting service, and would want to evaluate such an offering as a re-deployment target for existing systems.

This would obviously be disruptive, both to Microsoft partners (specifically hosting companies) and to Microsoft’s existing software licensing business.  It’s not that Microsoft “realized” that people would want to host arbitrary VMs. They knew it all along, but delayed offering it to allow time for the partners and its own businesses to catch up.

Aside from that rotational verbiage, Russinovich gives a good overview of some of the new VM features, how they work and how to exploit them.

Azure gets a well-deserved REST

In case you had any doubts of Programmable web’s data showing REST dominating other web API protocols, Microsoft, one of the original authors of SOAP, is fully embracing REST as the strategic web protocol for administering and managing Windows Azure services.

From Gigaom:

The new REST API that controls the entire system is completely rewritten, sources said.  ”Prior to this release, the Azure APIs were inconsistent. There was no standard way for developers to integrate their stuff in. That all changes now,” said one source who has been working with the API for some time and is impressed.

If you had 2 hours to spend learning stuff about web API protocols, spend 3 minutes understanding SOAP, and the balance on REST.

 

Microsoft’s Meet Windows Azure event

Thursday last week, Microsoft launched some new pieces to its Azure cloud-based platform.

The highlights in order (my opinion):

  1. Virtual Machine hosting. Since 2010, Microsoft tried to differentiate their cloud offerings from EC2 from Amazon by providing “platform services” instead of infrastructure services (OS hosting). But I suppose in response to customer demand, they will now offer the ability to host arbitrary Virtual Machines, including Windows Server of course but also Linux VMs of various flavors (read the Fact Sheet for details). This means you will now be able to use Microsoft as a hoster, in lieu of Rackspace or Amazon, for arbitrary workloads. MS will still offer the higher-level platform services, but you won’t need to adopt those services in order to get value out of Azure.
  2. VPN – you can connect those hosted machine to your corp network via a VPN. It will be as if the machines are right down the hall.
  3. Websites – Microsoft will deliver better support for the most commonly deployed workload. Previously websites were supported through a convoluted path, in order to comply to the Azure application model (described in some detail in this 2009 paper from David Chappell). With the announced changes it will be much simpler. Of course there’s support for ASPNET, but also Python, PHP, Java and node.js.

As with its entry into any new venture, Microsoft has been somewhat constrained by its existing partners. Steve Ballmer couldn’t jump with both feet into cloud-based platforms because many existing MS partners were a.) dependent upon the traditional, on-premises model of software delivery, or b.) in the cloud/hosting business themselves.

In either case they’d perceive a shift by MS towards cloud as a threat, or at the very least, disruptive. Also Microsoft itself was highly oriented toward on-premises software licensing. So it makes sense that MS was initially conservative with its cloud push. With these moves you can see MS steadily increasing pressure on its own businesses and its partners to move with them into the cloud model. And this is inevitable for Microsoft, as Amazon continues to gain enterprise credibility with EC2 and its related AWS offerings.


The upshot for consumers of IT is that price pressure for cloud platforms will continue downward. Also look for broader support of cloud systems by tools vendors, which means,  cloud-based platforms will become mainstream more quickly, even for conservative IT shops.

 

Does Metcalfe’s Law apply to Cloud Platforms and Big Data?

Cloud Platforms and Big Data – Have we reached a tipping point?  To think about this, I want to take a look back in history.

Metcalfe’s Law was named for Robert Metcalfe, one of the true internet pioneers, by George Gilder,  in an article that appeared in a 1993 issue of Forbes Magazine, it states that the value of a network increases with the square of the number of nodes.  It was named in the spirit of “Moore’s Law” – the popular aphorism attributed to Gordon Moore that stated that the density of transistors on a chip roughly doubles every 18 months. Moore’s Law succinctly captured why computers grew more powerful by the day.

With the success of “Moore’s Law”, people looked for other “Laws” to guide their thinking about a technology industry that seemed to grow exponentially and evolve chaotically, and “Metcalfe’s Law” was one of them.  That these “laws” were not really laws at all, but really just arguments, predictions, and opinions, was easily forgotten. People grabbed hold of them.

Generalizing a Specific Argument

Gilder’s full name for the “law” was “Metcalfe’s Law of the Telecosm”, and in naming it, he was thinking specifically of the competition between telecommunications network standards, ATM (Asynchronous Transfer Mode) and Ethernet.  Many people were convinced that ATM would eventually “win”, because of its superior switching performance, for applications like voice, video, and data.  Gilder did not agree. He thought ethernet would win, because of the massive momentum behind it.

Gilder was right about that, and for the right reasons. And so Metcalfe’s Law was right!  Since then though, people have argued that Metcalfe’s Law applies equally well to any network.  For example, a network of business partners, a network of retail stores, a network of television broadcast affiliates, a “network” of tools and partners surrounding a platform.  But generalizing Gilder’s specific argument this way is sloppy.

A 2006 article in IEEE Spectrum on Metcalfe’s Law says flatly that  the law is “Wrong”, and explains why:  not all “connections” in a network contribute equally to the value of the network.  Think of Twitter – most subscribers publish very little information, and to very limited circles of friends and family.  Twitter is valuable, and it grows in value as more people sign up, but Metcalfe’s Law does not provide the metaphor for valuing it. Or think of a telephone network: most people spend most of their time on the phone with around 10 people. Adding more people to that network does not increase the value of the network, for those people. Adding more people does not cause revenue to rise according to the  O(n2) metric implicit in Metcalfe’s Law.

Clearly the direction of the “law” is correct – as a network grows, its value grows faster.  We all feel that to be implicitly true, and so we latch on to Gilder’s aphorism as a quick way to describe it. But clearly also, the law is wrong generally.

Alternative “Laws” also Fail

The IEEE article tries to offer other valuation formulae, suggesting that the true value is not O(n2), but instead O(n*log(n)), and specifically suggests this as a basis for valuation of markets, companies, and startups.

That suggestion is arbitrary.  I find the mathematical argument presented in the IEEE article to be hand-wavey and unpersuasive. The bottom line is that networks are different, and there is not one law – not Metcalfe’s, nor Reed’s nor Zipf’s as suggested by the authors of that IEEE article – that applies generally to all of them. Metcalfe’s Law applied specifically but loosely to the economics of ethernet, just as Moore’s Law applied specifically to transistor density. Moore’s Law was not general to any manufacturing process, nor is Metcalfe’s Law general to any network.

Sorry, there is no “law”; One needs to understand the economic costs and potential benefits of a network, and the actual conditions in the market, in order to apply a value to that network.

Prescription: Practical Analysis

Ethernet enjoyed economic advantages in terms of cost of production and generalization of R&D development. Ethernet reached an economic tipping point, and beyond that other factors like improved switching performance of alternatives, were simply not enough to overcome the existing investment in tools, technology, and understanding of ethernet.

We all need to apply that sort of practical thinking to computing platform technology. For some time now, people have been saying that Cloud Platform technology is the Next Big Thing. There have been some skeptics, notably Larry Ellison, but even he has come around and is investing.

Cloud Platforms will “win” over existing on-premises platform options, when it makes economic sense to do so. In practice, this means, when tools for building, deploying, and managing systems in the cloud become widely available, and just as good as those that exist and are in wide use for on-premises platforms.

Likewise “Big Data” will win when it is simply better than using traditional data analysis, for mainstream data analysis workloads. Sure, Facebook and Yahoo use MapReduce for analysis, but, news flash: Unless you have 100m users, your company is not like Facebook or Yahoo. You do not have the same analysis needs. You might want to analyze lots of data, even terabytes of it. But the big boys are doing petabytes. Chances are, you’re not like them.

This is why Microsoft’s Azure is so critical to the evolution of Cloud offerings  Microsoft brought computing to the masses, and the company understands the network effects of partners, tools providers, and developers. It’s true that Amazon has a lead in cloud-hosted platforms, and it’s true that even today, startups prefer cloud to on-premises. But EC2 and S3 are still not commonly considered as general options by conservative businesses. Most banks, when revising their loan processing systems, are not putting EC2 on their short list. Microsoft’s work in bringing cloud platforms to the masses will make a huge difference in the marketplace.

I don’t mean to predict that Microsoft will “win” over Amazon in cloud platforms; I mean only to say that Microsoft’s expanded presence in the space will legitimize Cloud and make it much more accessible. Mainstream. It remains to be seen how soon or how strongly Microsoft will push on Big Data, and whether we should expect to see the same effect there.

The Bottom Line

Robert Metcalfe, the internet pioneer, himself apparently went so far as to predict that by 2013, ATM would “prevail” in the battle of the network standards. Gilder did not subscribe to such views. He felt that Ethernet would win, and Metcalfe’s Law was why. He was right.

But applying Gilder’s reasoning blindly makes no sense. Cloud and Big Data will ultimately “win” when they mature as platforms, and deliver better economic value over the existing alternatives.

 

Windows Azure goes SSD

In a previous post I described DynamoDB, the SSD-backed storage service from Amazon, as a sort of half-step toward better scalability.

With the launch of new Azure services from Microsoft, it appears that Microsoft will offer SSD, too.  Based on the language used in that report — The new “storage hardware” is thought to include solid state drives (SSDs) —  this isn’t confirmed, but it sure looks likely.

I haven’t looked at the developer model for Azure to find out if the storage provisioning is done automatically and transparently, as I suggested it should be in my prior post.  I’ll be interested to compare Microsoft’s offering with DynamoDB in that regard.

In any case, notice is now given to mag disk drives: do not ask for whom the bell tolls.

What does it mean? Microsoft upping efforts on Infrastructure-as-a-service

Wired is reporting a rumor that Microsoft will soon launch a new Infrastructure-as-a-service offering to compete with Amazon EC2, in June.

What Does it Mean?

I have no idea whether the “rumor” is true, or even what it really means. I speculate that the bottom line is that we’ll be able to upload arbitrary VHDs to Azure. Right now Microsoft allows people to upload VHDs that run Windows Server 2008.  With this change they may support “anything”.  Because it’s a virtual hard drive, and the creator of that hard drive has full control over what goes into it, that means an Azure customer will be able to provision VMs in the Microsoft cloud that run any OS, including Linux. This would also represent a departure from the stateless model that Windows Azure currently supports for the VM role. It means that VHDs running in the Windows Azure cloud will be able to save local state across stop/restart.

Should we be Surprised?

Is this revolutionary?  Windows Azure already offers compute nodes; it’s beta today but it’s there, and billable.  So there is some degree of Infrastructure-as-a-service capability today.

For my purposes “infrastructure as a service”  implies raw compute and storage, which is something like Amazon’s EC2 and S3. A “platform as a service” walks up the stack a little, and offers some additional facilities for use in applications. This might include application management and monitoring, enhancements to the storage model or service, messaging, access control, and so on. All of those are general-purpose things, usable in a large variety of applications, and we’d say they are “higher level” than storage and compute. In fact those services are built upon the compute+storage baseline.

For generations in the software business, Microsoft has been a major provider of platforms. With its launch in 1990, Windows itself was arguably the first broadly adopted “application platform”.  Since the early 90’s, specialization and evolution have resulted in an proliferation of platforms in the industry – we have client platforms, server platforms (expanding to include the Hypervisor), web platforms (IIS+ASP.NET, Apache+PHP), data platforms, mobile platforms and so on. And beyond app platforms, since Dynamics Microsoft has also beein in the business of offering applications as well, and it’s here we see the fractal nature of the space.  The applications can act as platforms for a particular set of extensions.  In any case, it’s clear that Microsoft has offerings in all those spaces, and more.

Beneath the applications like Dynamics, and beneath the traditional application platforms like Windows + SQL Server + IIS + .NET, Microsoft has continued to deliver the foundational infrastructure, specifically to enable other alternative platforms. Oracle RDBMS and Tomcat running on Windows is a great example of what I mean here. Sure, Microsoft would like to entice customers to adopt the entirety of their higher-level platforms, but the company is willing to make money by supplying lower-level infrastructure for alternative platforms.

Considering that history, the rumor that Microsoft is “upping efforts on infrastructure as a service” should not be surprising.  Microsoft has long provided offerings at many levels of “the stack”.  I expect that customers have clearly told Microsoft they want to run VHDs, just like on EC2, and Microsoft is responding to that.  Not everyone will want this; most people who want this will also want higher-level services.  I still believe strongly in the value of higher-level cloud-based platforms.

Platform differentiation in the Age of Clouds

It used to be that differentiation in server platforms was dominated by the hardware. There were real, though fluctuating and short-lived, performance differences between Sun’s Sparc, HP’s PA-RISC, IBM’s RIOS and Intel’s x86. But for the moment, the industry has found an answer to the hardware question; servers use x64.

With standard high volume servers, the next dominant factor for differentiation was on the application programming model.  We had a parade of players like CORBA, COM, Java, EJB, J2EE, .NET. More recently we have PHP, node.js, Ruby, and Python. The competition in that space has not settled on a single, decisive winner, and in my judgment, that is not likely to happen. Multiple viable options will remain, and the options that enjoy relatively more success do share some common attributes: ease of programming (eg, building an ASPNET service or a PHP page) is favored over raw performance (building an ISAPI or an Apache module in C/C++).  Also, flexibility of the model (JSP/Tomcat/RESTlets) is favored over more heavily prescriptive metaphors (J2EE). I expect the many options in server platform space to continue; the low-cost to develop and extend these platform alternatives means there is no natural economic value of convergence, as there was in server hardware where the R&D costs are relatively high.

Every option in the space will offer its own unique combination of strengths, and enterprises will choose among them. One company might prefer strong support for running REST services, while another might prefer the application metaphor  of Ruby on Rails.  Competition will continue.

But programmer-oriented features will not be the key differentiator in the world of cloud-hosted platforms. Instead, I expect to see operational and deployment issues to dominate.

  • How Reliable is the service?
  • How difficult is it to provision a new batch of servers?
  • How flexible is the hosting model? Sometimes I want raw VMs, sometimes I want higher-level abstractions. I might want to manage a “farm” of servers at a time, or even better, I might want to manage my application without regard for how many VMs back it.
  • How extensive are the complementary services, like access control, messaging, data, analysis, and so on.
  • What kind of operational data do I get out of that farm of servers? I want to see usage statistics and patterns of user activity.

It won’t be ease of development that wins the day.

Amazon has been very disruptive with its AWS, and Microsoft is warming to the competition. This is all good news for the industry. It means more choices, better options, and lower costs, all of which promotes innovation.