APIs within the Enterprise – a Webinar

Recently I did a web chat with colleague Greg Brail discussing the use of APIs in the Enterprise.

Quick summary: SOA has been used with success within enterprises to interconnect systems. APIs address a different set of problems, and there is real value to be gained by using APIs to interconnect systems within the enterprise, as well as to provide external or partner access into enterprise systems.

Preflight CORS check in PHP

I was reading up on CORS today; apparently my previous understanding of it was flawed.

Found a worthwhile article by Remy. Also found a problem in the article in the same PHP code he offered. This was server-side code that was shown to illustrate how to handle a CORS preflight request.

The “preflight” is an HTTP OPTIONS request that the user-agent makes in some cases, to check that the server is prepared to serve a request from XmlHttpRequest. The preflight request carries with it the special HTTP Header, Origin.

His suggested code to handle the preflight was:

// respond to preflights
if ($_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
  // return only the headers and not the content
  // only allow CORS if we're doing a GET - i.e. no saving for now.
  if (isset($_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD']) &&
      $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] == 'GET') {
    header('Access-Control-Allow-Origin: *');
    header('Access-Control-Allow-Headers: X-Requested-With');
  }
  exit;
}

But according to my reading of the CORS spec, The Access-Control-Xxx-XXX headers should not be included in a response if the request does not include the Origin header.

See section 6.2 of the CORS doc.

The corrected code is something like this:

// respond to preflights
if ($_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
  // return only the headers and not the content
  // only allow CORS if we're doing a GET - i.e. no saving for now.
  if (isset($_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD']) &&
       $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] == 'GET' &&
       isset($_SERVER['HTTP_ORIGIN']) &&
       is_approved($_SERVER['HTTP_ORIGIN'])) {
    header('Access-Control-Allow-Origin: *');
    header('Access-Control-Allow-Headers: X-Requested-With');
  }
  exit;
}

Implementing the is_approved() method is left as an exercise for the reader!

A more general approach is to do as this article on HTML5 security suggests: perform a lookup in a table on the value passed in Origin header. The lookup can be generalized so that it responds with different Access-Control-Xxxx-Xxx headers when the preflight comes from different origins, and for different resources. This might look like this:

// respond to preflights
if ($_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
  // return only the headers and not the content
  // only allow CORS if we're doing a GET - i.e. no saving for now.
  if (isset($_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD']) &&
      $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] == 'GET' &&
      isset($_SERVER['HTTP_ORIGIN']) &&
      is_approved($_SERVER['HTTP_ORIGIN'])) {
    $allowedOrigin = $_SERVER['HTTP_ORIGIN'];
    $allowedHeaders = get_allowed_headers($allowedOrigin);
    header('Access-Control-Allow-Methods: GET, POST, OPTIONS'); //...
    header('Access-Control-Allow-Origin: ' . $allowedOrigin);
    header('Access-Control-Allow-Headers: ' . $allowedHeaders);
    header('Access-Control-Max-Age: 3600');
  }
  exit;
}

Reference:

Yes, it’s trite, but we really are at an Inflection Point

It may sound like a platitude, but…the industry is now in the midst of an inflection point.

Behind us is the technology of client-server, with web goo glommed-on:

  • UI rendered to fixed computers, implemented using HTML(5) and Javascript.
  • Application logic built in Servlet/JSP, PHP, or ASPNET.
  • Relational databases as a store. Data is accessed via datastore-specific protocols.

Ahead are pure web technologies:

  • UI rendered to mobile computers, and optimized for device capability. Android, iPhone, iPad, and Windows8 are the key options, but more will emerge. The Kindle, XBox, and PS3 are the up-and-comers. The HTML-based web-browser UI will remain as a least-common denominator for some time, but there’s a steady trend away.
  • Application logic built in dynamic languages. Ruby-on-Rails, PHP, Python. Javascript was the first web app server language (Netscape Live server in 1995 and ASP Classic in 1996) and it is now back, with Node.js.
  • Data stores using NoSQL databases with massive scaleout. Data is accessed over HTTP, via REST.

Remember when “Scale” meant a really large box with lots of CPUs in it? We’ve moved to farms of managed computers that accomplish the same thing. Rather than depending on the hardware design to support the scale out, we’ve now done it in software. Rather than relying on the CPU front side bus to move data around, we’re depending on 40GBps or even 100GBps ethernet and software-based data-dependent prioritization and routing.

The force behind the economy of scale of standard high-volume components has not abated. If you wanted to build a superfast computer for one moment in time you might resort to some custom hardware. But the pace of evolution and improvement in CPU, memory, storage, and networking is such that the value of any dedicated hardware declines rapidly, even during design. It makes no economic sense to pursue the scale-up course. Designs need to accommodate evolution in the ecosystem. Just as the “Integrated” vendor-specific computers of the late 80’s gave way to “open systems”, the integrated single computer model is giving way to the “farm of resources” model.

This is all obvious, and has been for some time. Companies like Google were ahead of the curve, and dragged the rest of the industry with them, but now architectures based on the idea that “the datacenter is the computer” are now available for low cost to just about everyone. These architectures have elastic compute, network, and storage, along with the software architecture to exploit it. The upshot is you can just add resources and you get additional, usable performance. Unlike the old “scale up” machines, this approach is not limited to 16 CPUs or 64 or 1024. Just keep going. People call it “cloud technology”, but the main point is elasticity.

The inflection point I spoke about is not defined by a particular month, like Novermber 2012. or even a year. But over the past 6 years, this transition has been slowly, inexorably proceeding.

The one missing piece to the puzzle has been management skills and tools; The gear was there, and the software has continued to improve to exploit the gear, but people were initially not comfortable with managing it. This is dissipating over time, as people embrace the cloud. We’re realizing that we no longer need to perform {daily,weekly} backups because the data is already stored redundantly in Cassandra.

Even as cloud technology gets democratized, drops in price, and becomes more manageable, the capability of a single high-volume server computer continues to ramp upward on a log scale. This means that the old “automation” tasks, tracking orders, ERP systems (whether custom or not)… will be fulfilled by single machines, with optional redundancy.

Cloud technology therefore presents a couple opportunities:

  • For technology conservatives, where IT is a cost center, the maturation of cloud tech drops the cost of deploying new systems, and of handling peak load. A company can purchase short-term options for compute to handle the proverbial “black friday” or “Victoria’s Secret Fashion show” load. This opportunity is somewhat conflated with the ongoing drop in the cost of technology. Basically, the cost is dropping, and it is dropping even more if you let someone else host your servers.
  • For companies that view technology as a business enabler, cloud tech allows them to pursue innovative new approaches for relatively low cost and risk. New partner-enabling initiatives; new channels; new markets or new approaches to the markets they already play in.

Without a doubt, the big payoffs come from the latter, expansive approach. You can’t grow by cutting costs. You *can* grow by increasing speed or efficiency – let’s say, dropping your turn-time on commercial loan approvals from 52 days to 22 days – but the big growth is in entirely new approaches.

But you don’t know what you don’t know. To uncover and develop the opportunities, companies need to dive in. They need to be pushing beyond their normal competencies, learning new things.

MongoDB booster would prefer Cassandra, if only he could store JSON in it. Have I got a data store for you!

Interesting article at GigaOM interviewing MongoLab Founder and CEO Will Shulman. GigaOM reports:

MongoLab operates under a thesis that MongoDB is pulling away as the world’s most-popular NoSQL database not because it scales the best — it does scale, Shulman said, but he’d actually choose Cassandra if he just needed a multi-petabyte data store without much concern over queries or data structure — but because web developers are moving away from the relational format to an object-oriented format.

Interesting comment. My spin alarm went off with the fuzz-heavy phrasing “…operates under a thesis…” I’ll buy that developers are moving away from relational and towards simpler data storage formats that are easier to use from dynamic scripting languages. But there is no evidence presented in support of the conclusion that “MongoDB is pulling away.” GigaOM just says that this is MongoLab’s “thesis”.

In any case, the opinion of Shulman that Cassandra scales much better than MongoDB leads to this question: If the key to developer adoption is providing the right data structures, then why not just build the easy-to-adopt object store on the existing proven-to-scale backend? Why build another backend if that problem has been solved by Cassandra?

Choosing to avoid this question, the creators of MongoDB have only caused people to ask it more insistently.

The combination of developer-friendly data structure and highly-scalable backend store has been done. You can get the scale of Cassandra and the easy of use of a JSON-native object store. The technology is called App Services, and it’s available from Apigee.

In fact, App Services even offers a network interface that is wire-compatible with existing MongoDB clients (somebody tell Shulman); you can keep your existing client code and just point it to App Services.

With that you can get the nice data structure and the vast scalability.

Thank you, Ed Anuff.

Why is GIMP still so crappy?

Yes, it’s the question on everybody’s mind: Why does Gimp suck so bad?

In the old days I used Windows almost exclusively. I had a nice Windows machine set up; it worked for me. I used cmd.exe and Powershell and WSH for scripting. I used Outlook and Word and Powerpoint for office documents.

I used freeware and open source stuff for some things: I used emacs for editing files. DotNetZip for manipulating ZIP files. ReloadIt for reloading web pages automatically as I saved files. Cropper for capturing screenshots and posting them to cloud photo share services. Lots of other tools. One notable tool: Paint.NET for manipulating images.

It worked. It all worked!

I have since moved to a Mac, not because I didn’t like Windows, but because everyone around me in my new job uses Mac. Being different just means being left out and being unable to share stuff with people. So I converted to Mac.

I am thankful to still have emacs. Obviously I can no longer use WSH and Javascript for scripting basic stuff, but I do have Node.js, which is just fine. (I don’t miss Powershell. Truth be told I never did fully realize the benefits of the object pipeline. It sounded good in theory but it was too darned hard to figure everything out. I use bash for shell scripting now, and it feels simpler to me.)

And Now, when I want to manipulate image files, I often slip up, and try to use gimp. I try. Yoda might say, after trying Gimp, With Gimp there is no do. There is only try, and do not. Generally I give up before accomplishing my goal, which is usually really really simple, something like “remove part of this image and replace it with white fill.” Gimp sucks. I have never opened Gimp and tried to use it without swearing. That I keep opening it is a testament to my ongoing descent into lunacy.

The UI is so infuriating; at the very start it opens windows on the Mac which obscure other applications. When I activate the other applications the GIMP windows stay on top. Why? Because that’s the most infuriating thing it could possibly do, that’s why.

When I highlight part of an image and click ctrl-C top copy that portion, I get – and I’m not making this up – everything EXCEPT the part I highlighted. You would think that would be a simple thing to correct, right? There’s even a “Select” menu with an “Invert” menu item in GIMP. You’d think if a COPY action was copying everything EXCEPT the thing I wanted, then inverting the selection and retrying the COPY action would do what I wanted. But no. Why? Because it’s the most infuriating thing possible.

GIMP always does the most infuriating, frustrating, and ridiculous thing possible.

Handily, when you open an image file that is kinda small, GIMP opens the window beneath one of it’s own “floats above all other windows” windows. So you can’t see the thing, and you need to move Windows around to try to find it. Why? You know why.

There are people who say, “I’ve been using Gimp for 2 years. Sure, at first it’s a bit hard to learn, but after that it’s awesome.” That’s stupid. Absolutely idiotic. Software shouldn’t be this hard, sorry. Just because you invested your valuable time in compensating for a software designer’s madness, does not mean the software is good. It means you don’t value your own time as much as you should.

For an example of an image manipulation app that works, is easy for novices to pick up but also supports advanced features, look at Paint.NET. Only available on Windows! For an example of how to create endless user frustration, try Gimp.

Are DDoS attacks a novel threat to API servers? Nope.

Mark O’Neill, CTO at Vordel, published a post on programmableweb regarding DDoS attacks and the implications for APIs.

For those who learned programming before Friends became a hot TV show, the term of art “application programming interface” referred to the function names and signatures that you’d link your program to. These days, the term API refers most often to a Web API, in other words network interface, often a REST-based network interface. One program sends another program an HTTP request, and gets a reply of a given form in response.

I think O’Neill made things sound waaaay more dramatic than they actually are. The term that he used – “Soft underbelly” – was intended to imply that APIs represent a special vulnerability on the Web. That’s simply not accurate. API interfaces are just a “regular underbelly”, to coin a phrase; json access is just like html access. DDoS is a risk and it can affect json servers and html servers alike. O’Neill doesn’t provide any specific advice on why API servers are different, or what special steps need to be taken to protect API resources.

He does make some reasonable points : (a) that API access was given short shrift in the original reports; (b) that APIs are likely to rise in importance as the usage of mobile apps grows; and (c) and that hosting APIs separately from www traffic (on api.mybank.com vs www.mybank.com) might/could have mitigated problems.

But API management platforms such as the one sold by O’Neill’s company, are not likely to be effective against any non-naive DDoS. In fact the existing DDoS mitigation techniques, using network devices, are all we need to protect APIs. “Nothing to see here, move along.”

I understand that hype will attract attention to the post and to O’Neill’s company. On balance though, I think he’s doing more of a disservice to APIs by exaggerating or even mischaracterizing the risks.


Reference: Intro to Distributed Denial of Service attacks

Disclaimer: I work for Apigee, which is a purveyor of API Management solutions. These opinions re my own.