Naïve Data analysis leads to incorrect conclusions – WWII Bomber Plane edition

Raid by the 8th Air Force

Here’s a good story showing us how focusing on the data we have in front of us may lead us to incorrect conclusions.

The summary is this: In World War II, Allied bomber command undertook an analysis effort to determine how to optimally reinforce bomber planes to protect them from German anti-aircraft guns. They studied the bullet hole patterns in planes after they returned from missions, and the first impression was to re-inforce the planes where the bullet holes appeared most commonly.

But a smart guy named Abraham Wald pointed out that if the planes were returning with those bullet holes, then the bullets that caused those holes were not critical to avoid. The planes that didn’t return had likely sustained the bullet holes that demanded mitigation.

Are we talking about Bomber Planes or Software here?

Focusing on the data we have is a common approach that supports iterative improvement efforts. The Lesson from the Bomber Plane story is: there may be “invisible data” that needs consideration.

Clayton Christensen World Economic Forum 2013

Mr Christensen looking stern and wise.

This is the central problem explored by Clayton Christensen, in his classic book, The_Innovator’s Dilemma. What Christensen found was that companies can tend to focus on feedback they get from existing customers, while ignoring the needs of prospective customers. And this is dangerous.

We have a project going on right now in which we’re looking at the UI usage data for a particular tool in a web app. The web app tool has for some time now been stable. It was never very complete, and it hasn’t been extended actively. You could say it was stagnant. We know the reasons for that – we had other priorities and this web tool just needed to stay in the backlog for a while. Also, there were, and there still are, other ways to accomplish what was exposed by this web app tool; there are public APIs that expose the underlying function and people have the option to build tactical apps on those APIs. Now we want to enhance the web tool, and the web tool has recently been enhanced to at least collect data on how people are currently using it. How much should we rely on this usage data?

With the bomber planes, the data they had were the patterns of bullet holes in the returned planes. The data they didn’t have were the patterns of bullet holes in bomber planes that didn’t return.

With the web tool, the data we have will be the UI usage patterns of current users. The data we don’t have will be the usage patterns of people who abandoned the stagnant tool some time ago, and are relying on the underlying APIs and independent tools to accomplish their goals. The former set of users will have a relatively higher proportion of novice users, while the latter will have a higher proportion of experts.

So what’s the right path forward? Should we invest in improvements in the tool, solely based on the data we have? Should we guess at the data we don’t have? In the bomber plane example, the data we don’t have was clear by deduction – those planes were most likely shot down. In the web tool example, the data we don’t have are truly unknown. We cannot deduce what, exactly, people are doing outside the tool.

I think the only way to resolve this is to combine three factors: analysis of the API usage, analysis of the UI usage, and finally, perspective on the behavior of the expert users who have abandoned the tool, perhaps formed by direct observation or by intelligent inference. API analytics, UI analytics, and some intelligence.

Then we’ll need to apply some sort of priority to the different scenarios – do we need to help the experts more with their expert problems? or do we need to invest more in making the simple things even easier for the more novice users? Which is the more important audience, the novices or the experts? Which one is more important to serve, at this time in the product’s lifecycle?

Interesting questions! There’s no one right answer for every situation. I know which option I favor in our current situation.

Web vs Native apps, part 12763

From Hacker News recently, Another posting, this one by Peter-Paul Koch, about the web-vs-native debate, dramatically entitled “Let’s concede defeat”!!

This is the latest installment in a long line of commentary on the topic. In short, the topic is: is it better to build and offer a native, platform-specific app, or to invest in building a web-based user experience? We’ve been having this debate for years, and the discussion just seems to spin round and round on the same points. readwrite.com is also chiming in on the “morass”. Everyone has a viewpoint. Web is better. Native is better. Hybrid is the future.

Lots of drama. A range of viewpoints. Many of the strongest opinions come from people with something to sell: they sell the vision of web as write-once-run-anywhere, or they sell a product that helps produce hybrid apps.

What’s the real story?

In my opinion, it’s not that complicated. Here you go: Web apps are super convenient to deploy, update, and access. There’s no “app store”, there’s no worry about if and when users will update. But, they’re not optimal for a crisp device-optimized experience. There are many web frameworks and choosing the right one is challenging. Attempting to “emulate” native app behavior can get you into trouble when trying to support N different device platforms.

The native app, on the other hand, is crisp and device optimized, and users like it. It’s easier to do offline-capable apps, and it works with system notifications. But, using native means you’re basically writing many apps, one version for each device. Also, the way users access the app is inconvenient, and there’s no guarantee users will update their apps (although there are mitigations for that).

But most apps will be hybrids, combining some web-app capability with some native capability.

Dzone has a nice table listing decision criteria between the three options. Kinlan has a nice, day-in-the-life journal that gives color to the tradeoffs, although looking for a web-based alarm clock to supplant the device-supplied one seems like tilting at windmills. No one needs a web-based alarm clock when every phone has several native apps to choose from. He made that point very effectively, though I don’t think he intended to do so.

We, as an industry, haven’t settled this issue yet. There’s still more talking we need to do, apparently.

Though there’s been much talk, there are still seemingly common perspectives that are just wrong.

  1. from ignorethecode, Native apps mainly benefit Apple and Google, not their users. It’s not in anyone’s interest to be locked into a specific platform, except for the platform’s owner.
    Wrong. This just completely disregards the platform-specific stuff developers can do with accelerometers, GPS, cameras, and screens. Baloney. Yes, you can get access to some of those things, some of the time, in web apps, but not in a reliably cross-platform way.
  2. from dzone, Native apps often cost more to develop and distribute because of the distinct language and tooling ecosystems,
    Wrong. This assumes the main cost is in the licensed tools, like Xcode and a deployment license from Apple. Sheesh. That is the tiniest component of app development and deployment cost.. C’mon now. Developers (people) and their time are expensive. Licenses are cheap.

I like what Mr Koch had to say in the quirksmode blog. Basically: different solutions will better suit different sets requirements, for commerce, news sites, and so on. Pragmatic. Realistic. Helpful.

Regardless whether people choose web apps, native apps, or hybrid apps, the backend is always APIs. Everybody needs APIs! And they need to be simple, easy to consume, well documented, secure, and managed and measured properly.

I don’t see the point in Revoking or Blacklisting JWT

I heard someone asking today for support for Revocation of JWT, and I thought
about it a little, and decided I don’t see the point.

Specifically, I don’t see the point of the process described in this post regarding “Blacklisting JWT in express-jwt“. I believe that it’s possible to blacklist JWT. I just don’t see the point.

Let’s take a step back and look at OAuth

For those unaware, JWT refers to JSON Web Token, which is a type of token that can be used in APIs. The format of JWT is self-describing.

Here’s the key problem tokens address: how does a server decide whether to honor or reject a request? It’s a matter of authorization. OAuthV2 has been proposed and is now being used by the industry as the model or framework for enabling authorization in API-oriented apps. Basically it says, “give apps tokens, then grant access based on the token.”

Often the way things work under the OAuth framework is:

  1. an app running on a mobile phone connects to a token dispensary (a server) to request a token
  2. the server requires the client (==app) to provide some credentials before generating and dispensing a token. Sometimes the server also requires user authentication before token delivering a token. (This is done in the Authorization Code grant or the password grant.)
  3. the client app then sends this token to a different server to ask for services.
  4. the API server evaluates the token before granting service. Often this requires contacting the original token dispensary to see if the token is good, and to see if the token should be honored for the particular service being requested.

You can see there are three parties in the game: the app, the token dispensary, and the API server.

One handy optimization is to put the API endpoint behind an OAuth-aware proxy server, like Apigee Edge. (Disclaimer: I work for Apigee). The app then contacts Edge for a token (via POST /token). If the credentials are good, Edge generates and stores an opaque token, which looks like n06ztxcf2bRpN42cDwVUNvroGOO6tMdt, and delivers it back to the app. The app then requests service (via GET /service, or whatever), passing the previously obtained token. Edge sees this request, extracts the token within it, evaluates whether the token is good, and either passes the request through to the API endpoint or rejects it based on the token status.

The key thing: these tokens are opaque. The app doesn’t know what that token is, beyond a string of characters. The app cannot tell what the token is good for, unless it asks the token dispensary, which is the final arbiter. Sometimes when dispensing the token, the token dispensary also delivers metadata about the token, like: expiry, scopes, and other attributes. But that is not required, and not always done. So, bearer tokens are often opaque, and they are opaque by default in Apigee Edge.

And by “Bearer”, we mean… an app that possesses a token is presumed to “own” the token, and should be granted service based on that token alone. In other words, the token is a secret. It’s like cash money – if you lose it, someone else can spend it. But not exactly like cash. An opaque token is more like a promissory note or an IOU; to determine if it’s worth anything you need to go back to the issuing party, to ask “are you willing to pay up on this note?”

How is JWT different?

JWT is a different kind of OAuth token. OAuth is just a framework, and does not stipulate exactly the kind of token that needs to be generated and delivered. One type of token is the opaque bearer kind. JWT is an alternative format. Rather than being an opaque string, JWT is a self-describing format for bearer tokens. Generally, a JWT includes an encoded payload that can be decoded and read by anyone, and that payload contains a bunch of claims. The standard set of claims includes: when the token was generated (“issued at”), who generated it (the “issuer”), the intended audience, the expiry, and other things. JWT can include custom claims, such as “the user is a good person”. But more often the custom claim is: “this user is authorized to invoke /serviceA at endpoint http://example.com”, although this kind of claim is shortened quite a bit and is encoded in JSON, rather than in English.

Optionally accompanying that payload with its claims is a signature, which can be verified by any party possessing the public key used to sign it, (or, when using secret key encryption, the secret key). This is what is meant by “self describing”. The self-describing nature of JWT is the opposite of opaque. [JWT can be unsigned, can be signed, or can be encrypted. The encryption part is an optional part of the spec.]

(Commercial message: I said above that Apigee Edge generates opaque bearer tokens by default. You can also configure Apigee Edge to generate signed JWT.)

Why Self-describing Tokens?

The main benefit of a model that uses self-describing tokens is that the API endpoint need not contact the token dispensary in order to determine if the token is good, not-expired, and if a request bearing such a token ought to be honored. In other words, JWT supports federation. One party issues the token, another party can verify it, without contacting the issuer. Remember, JWT is a bearer model, which means the possessor of the token is presumed to be authorized to get service based on what’s in the token. This is truly like cash money this time, because … when honoring a JWT, the API endpoint need not contact the issuer, just as when accepting a $20 bill, you don’t have to contact the US Treasury to see if the bill is worth $20.

So how ’bout Revocation of JWT?

This is a long story and I’m finally getting to the point: If you want JWT with powers to revoke the token, then you abandon the federation benefit.

Making the JWT self-descrbing means no honoring party needs to contact the issuer. Just verify the signature (verify the $20 bill is real), and then grant service. If you add in revocation as a requirement, then the honoring party then needs to contact the issuer: “I have $20 bill with serial number T128-DCQ-2872JKDJ; should I honor it?”

It means a synchronous call across the two parties. Which means federation is effectively broken. You abandon the federation benefit.

The corollary to the above is that you also still incur all the overhead of the JWT handling – the signing and verification. So you get all the costs of JWT and none of the benefits.

If revocation of bearer tokens is important to you, you could do the same thing with an opaque bearer token and eliminate all the fussy signature and validation stuff.

When you’re using an API Proxy server like Apigee Edge for both issuing and verifying+validating tokens, then there is no expensive additional remote call to check the revocation status. But you still lack the federation benefit, and you still incur this signing and verification nonsense.

I think when people ask for the ability to handle JWT with revocation, they don’t really understand what they’re asking.

Using the Drupal Services module for REST access to entities, part 3

What’s Going on Here?

In part 1 and part 2 of this series, I talked about Drupal REST services, and authenticating, and querying data. Be sure to review those before continuing with this post.

This article talks about how to create or update data on Drupal using REST APIs. It will use the same authentication foundation as described in Part 2.

Update All the Things!

What kinds of things can you create or update or delete with the Drupal REST API?

  • users
  • forum topics
  • articles
  • taxonomy categories
  • taxonomy terms
  • comments
  • and so on…

Pretty cool. Also, when creating entities, like users, all the normal drupal hooks will run. So that if you programmatically create a new user, and if you have a new-user hook that sends out an email…. then that hook will run and the email address for the newly-created user will get an email sent by Drupal. The API provides a nice way to provision a set of users all at one go, into Drupal, rather than asking each individual user to visit the site and self-register.

There are also special REST endpoints for doing things like resetting passwords or resending the welcome email.

So let’s look at some request payloads !

Modify an Existing Article

Request:

curl -i -X PUT \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H X-CSRF-Token:w98sdb9udjiskdjs \
  -H Accept:application/json \
  -H content-type:application/json \
  http://example.com/rest/node/4 \
  -d '{
  "title": "about multiple themes....",
  "body": {
    "und": [{
      "value": "how to demonstrate multiple themes?. ...",
      "summary": "multiple themes?",
      "format": "filtered_html",
      "safe_value": "themes",
      "safe_summary": "themes..."
    }]
  }
}'

Create a Forum Topic

To create a new Forum post (Drupal calls it a Forum topic):

Request:

curl -i -X PUT \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H X-CSRF-Token:w98sdb9udjiskdjs \
  -H Accept:application/json \
  -H content-type:application/json \
  http://example.com/rest/node \
  -d '{
    "type": "forum", 
    "title": "test post?", 
    "language": "und",
    "taxonomy_forums": { "und": "1" },
    "body": {
      "und": [{
        "value" : "This is the full text of the forum post",
        "summary": "this is a test1",
        "format": "full_html"
      }]
    }
  }'

This part…

        "taxonomy_forums": { "und": "1" },

…tells which forum to post to. Actually the “parent forum” is a taxonomy term, not a forum container. Nodes carry a taxonomy term on them, to identify which forum they belong to.

If you specify an invalid forum id, you will get this json error response:

406 Unacceptable
...
{
  "form_errors": {
    "taxonomy_forums][und": "An illegal choice has been detected. Please contact the site administrator.",
    "taxonomy_forums": "Select a forum."
  }
}

Here’s another “create forum topic” request, to a different forum:

curl -i -X POST \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H X-CSRF-Token:w98sdb9udjiskdjs \
  -H Accept:application/json \
  -H content-type:application/json \
  http://example.com/rest/node \
  -d '{
    "type": "forum", 
    "title": "test post #2", 
    "language": "und",
    "taxonomy_forums": { "und": "5" },
    "body": {
      "und": [{
        "value" : "This is a test post. please ignore.",
        "summary": "this is a test1",
        "format": "full_html"
      }]
    }
  }'

Notice the alternate forum id in that request, as compared to the prior one:

 "taxonomy_forums": { "und": "5" } 

Determine the available forums and ID numbers

Step 1: query the vocabulary that corresponds to “forums”:

curl -i -X GET  \
 -H accept:application/json \
 -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
 'http://example.com/rest/taxonomy_vocabulary?parameters\[machine_name\]=forums' 

Example Response:

[{
  "vid": "1",
  "name": "Forums",
  "machine_name": "forums",
  "description": "Forum navigation vocabulary",
  "hierarchy": "0",
  "module": "forum",
  "weight": "-10",
  "uri": "http://myserver/rest/taxonomy_vocabulary/1"
}]

The important part is the “vid” – which is the vocabulary ID.

Step 2: Query the terms for that vocabulary. This gives all forum names and IDs.

curl -i -X GET \
 -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
 -H Accept:application/json \
 -H content-type:application/json \
 'http://example.com/rest/taxonomy_term?parameters\[vid\]=1' 

Example response:

Response:

[{
  "tid": "8",
  "vid": "1",
  "name": "Getting Started",
  "description": "",
  "format": null,
  "weight": "0",
  "uuid": "7ff7ce10-0082-46f6-9edd-882410b7c304",
  "depth": 0,
  "parents": ["0"]
}, {
  "tid": "1",
  "vid": "1",
  "name": "General discussion",
  "description": "",
  "format": null,
  "weight": "1",
  "uuid": "dbf914e7-42c2-45f6-b77a-e66a0da72310",
  "depth": 0,
  "parents": ["0"]
}, {
  "tid": "4",
  "vid": "1",
  "name": "Security and Privacy Issues",
  "description": "",
  "format": null,
  "weight": "2",
  "uuid": "7496bfd7-2cb8-4f87-a1e4-f45b1956a01e",
  "depth": 0,
  "parents": ["0"]
}]

The tid in each array element is what you must use in the “taxonomy_forums”: { “und”: “4” }, … when POSTing a new forum node.

Delete a node

Deleting a node means removing an article, a forum topic (post), a comment, etc.

The request:

curl -i -X DELETE \
 -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
 -H X-CSRF-Token:w98sdb9udjiskdjs \
 -H Accept:application/json \
 http://example.com/rest/node/8

Example response:

  [true]

Weird response, but ok.

By the way, if the cookie and token has timed out, for any of these create, update, or delete calls you may see this response:

["Access denied for user anonymous"]. 

There is no explicit notice that the cookie has timed out. The remedy is
to re-authenticate and submit the request again.

Delete a taxonomy term

Deleting a taxonomy term in the taxonomy vocabulary for forums would imply deleting a forum.

curl -i -X DELETE \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H X-CSRF-Token:w98sdb9udjiskdjs \
  -H Accept:application/json \
  http://dev-wagov1.devportal.apigee.com/rest/taxonomy_term/7

Create a taxonomy term

Creating a taxonomy term in the taxonomy vocabulary for forums would imply creating a forum.

curl -i -X POST \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H X-CSRF-Token:w98sdb9udjiskdjs \
  -H Accept:application/json \
  -H content-type:application/json \
  http://dev-wagov1.devportal.apigee.com/rest/taxonomy_term \
  -d '{
    "vid": "1",
    "name": "Another Forum on the site",
    "description": "",
    "format": null,
    "weight": "10"
  }'

The UUID and TID for the forum will be generated for you. Unfortunately, the tid will not be returned for you to reference. You need to query to find it. Use the name of the forum you just created:

Request:

curl -i -X GET \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H Accept:application/json \
  'http://example.com/rest/taxonomy_term?parameters\[name\]=Another+Forum+on+the+site'

Example Response:

[{
  "tid": "36",
  "vid": "1",
  "name": "Another Forum on the site",
  "description": "",
  "format": null,
  "weight": "10",
  "uuid": "dcbe0118-c160-4556-b0b6-1813241bb851",
  "uri": "http://example.com/rest/taxonomy_term/36"
}]

Make sure you use unique names for these taxonomy terms.

Create a new user

curl -i -X POST \
    -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
    -H X-CSRF-Token:w98sdb9udjiskdjs \
    -H accept:application/json \
    -H content-type:application/json \
    http://example.com/rest/user -d '{
      "name" : "TestUser1",
      "mail" : "Dchiesa+Testuser1@apigee.com",
      "pass": "secret123",
      "timezone": "America/Los_Angeles", 
      "field_first_name": {
          "und": [{ "value": "Dino"}]
      },
      "field_last_name": {
          "und": [{ "value": "Chiesa"}]
      }
   }'

Response:

{"uid":"7","uri":"http://example.com/rest/user/7"}

Resend the welcome email

curl -i -X POST \
    -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
    -H X-CSRF-Token:w98sdb9udjiskdjs \
    -H accept:application/json \
    -H content-type:application/json \
    http://example.com/rest/user/7/resend_welcome_email -d '{}'

Reset a user password

curl -i -X POST \
    -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
    -H X-CSRF-Token:w98sdb9udjiskdjs \
    -H accept:application/json \
    -H content-type:application/json \
    http://example.com/rest/user/7/password_reset -d '{}'

Update a user

This shows how to set the user status to 0, in order to de-activate the user.

curl -i -X PUT \
    -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
    -H X-CSRF-Token:w98sdb9udjiskdjs \
    -H accept:application/json \
    -H content-type:application/json \
    http://example.com/rest/user/6 -d '{
      "status" : "0"
   }'

You could of course update any of the other user attributes as well.


That ought to get you started with creating and updating things in Drupal via the REST Server.

Remember, the basic rules are:

  • pass the cookie for each REST query call
  • Pass the cookie and X-CSRF-Token when doing create, update or
    delete
  • have fun out there!

Good luck. Contact me here if these examples are unclear.

Using the Drupal Services module for REST access to entities, part 2

Be sure to start with Part 1 of this series.

What’s Going on Here?

To recap: I’ve enabled the Services module in Drupal v7, in order to enable REST calls into Drupal, to do things like:

  • list nodes
  • create entities, like nodes, users, taxonomy vocabularies, or taxonomy terms
  • delete or modify same

Clear? The prior post talks about the preparation. This post talks about some of the actual REST calls. Let’s start with Authentication.

Authentication

These are the steps required to make authenticated calls to Drupal via the Services module:

  1. Obtain a CSRF token
  2. Invoke the login API, passing the CSRF token.
  3. Get a Cookie and new token in response – the cookie is of the form {{Session-Name}}={{Session-id}}. Both the session name and id are returned in the json payload as well, along with a new CSRF token.
  4. Pass the cookie and the new token to all future requests
  5. Logout when finished, via POST /user/logout

The Actual Messages

OK, Let’s look at some example messages.

Get a CSRF Token

Request:

curl -i -X POST -H content-type:application/json \ 
  -H Accept:application/json \ 
  http://example.com/rest/user/token  

The content-type header is required, even though there is no payload sent with the POST.

Response:

HTTP/1.1 200 OK
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
Content-Type: application/json
Etag: "1428629440"
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Fri, 10 Apr 2015 01:30:40 GMT
Vary: Accept
Content-Length: 55
Accept-Ranges: bytes
Date: Fri, 10 Apr 2015 01:30:51 GMT
Connection: keep-alive

{"token":"woalC7A1sRzpnzDhp8_rtWB1YlXBRalWMSODDX1yfUI"}

That’s a token, surely. I haven’t figured out what I need that token for. It’s worth pointing out that you get a new CSRF token when you login; see below. So I don’t do anything with this token. I never use the call to /rest/user/token .

Login

To do anything interesting, your app needs to login; aka authenticate. After login, your app can invoke regular transactions, using the information returned in that response. Let’s look at the messages.

Request:

curl -i -X POST -H content-type:application/json \
    -H Accept:application/json \
    http://example.com/rest/user/login \
    -d '{ 
     "username" : "YOURUSERNAME",
     "password" : "YOURPASSWORD"
    }'

Response:

HTTP/1.1 200 OK
Content-Type: application/json
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Fri, 10 Apr 2015 01:33:35 GMT
Set-Cookie: SESS02caabc123=ShBy6ue5TTabcdefg; expires=Sun, 03-May-2015 05:06:55 GMT; path=/; domain=.example.com; HttpOnly
...
{
  "sessid": "ShBy6ue5TTabcdefg",
  "session_name": "SESS02caabc123",
  "token": "w98sdb9udjiskdjs",
  "user": {
    "uid": "4",
    "name": "YOURUSERNAME",
    "mail": "YOUREMAIL@example.com",
    "theme": "",
    "signature": "",
    "signature_format": null,
    "created": "1402005877",
    "access": "1426280563",
    "login": 1426280601,
    "status": "1",
    "timezone": null,
    "language": "",
    "picture": "0",
    "data": false,
    "uuid": "3e1e948e-940e-4a05-bd7a-267c6671c11b",
    "roles": {
      "2": "authenticated user",
      "3": "administrator"
    },
    "field_first_name": {
      "und": [{
        "value": "Dino",
        "format": null,
        "safe_value": "Dino"
      }]
    },
    "field_last_name": {
      "und": [{
        "value": "Chiesa",
        "format": null,
        "safe_value": "Chiesa"
      }]
    },
    "metatags": [],
    "rdf_mapping": {
      "rdftype": ["sioc:UserAccount"],
      "name": {
        "predicates": ["foaf:name"]
      },
      "homepage": {
        "predicates": ["foaf:page"],
        "type": "rel"
      }
    }
  }
}

There are a few data items that are of particular interest.

Briefly, in subsequent calls, your app needs to pass back the cookie specified in the Set-Cookie header. BUT, if you’re coding in Javascript or PHP or C# or Java or whatever, you don’t need to deal with managing cookies, because the cookie value is also contained in the JSON payload. The cookie has the form {SESSIONNAME}={SESSIONID}, and those values are provided right in the JSON. With the response shown above, subsequent GET calls need to specify a header like this:

Cookie: SESS02caabc123=ShBy6ue5TTabcdefg

Subsequent PUT, POST, and DELETE calls need to specify the Cookie as well as the CSRF header, like this:

Cookie: SESS02caabc123=ShBy6ue5TTabcdefg
X-CSRF-Token: w98sdb9udjiskdjs

In case it was not obvious: The value of the X-CSRF-Token is the value following the “token” property in the json response. Also: your values for the session name, session id, and token will be different than the ones shown here. Just sayin.

Get All Nodes

OK, the first thing to do once authenticated: get all the nodes. Here’s the request to do that:

Request:

curl -i -X GET \ 
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \ 
  -H Accept:application/json \ 
  http://example.com/rest/node 

The response gives up to “pagesize” elements, which defaults to 20 on my system. You can also append a query parameter ?pagesize=30 for example to increase this. To repeat: you do not need to pass in the X-csrf-token header here for this query. The CSRF token is required for Update operations (POST, PUT, DELETE). Not for GET.

Here’s the response:

[{
  "nid": "32",
  "vid": "33",
  "type": "wquota3",
  "language": "und",
  "title": "get weather for given WOEID (token)",
  "uid": "4",
  "status": "1",
  "created": "1425419882",
  "changed": "1425419904",
  "comment": "1",
  "promote": "0",
  "sticky": "0",
  "tnid": "0",
  "translate": "0",
  "uuid": "9b0b503d-cdd2-410f-9ba6-421804d25d4e",
  "uri": "http://example.com/rest/node/32"
}, {
  "nid": "33",
  "vid": "34",
  "type": "wquota3",
  "language": "und",
  "title": "get weather for given WOEID (key)",
  "uid": "4",
  "status": "1",
  "created": "1425419882",
  "changed": "1425419904",
  "comment": "1",
  "promote": "0",
  "sticky": "0",
  "tnid": "0",
  "translate": "0",
  "uuid": "56d233fe-91d4-49e5-aace-59f1c19fbb73",
  "uri": "http://example.com/rest/node/33"
}, {
  "nid": "31",
  "vid": "32",
  "type": "cbc",
  "language": "und",
  "title": "Shorten URL",
  "uid": "4",
  "status": "0",
  "created": "1425419757",
  "changed": "1425419757",
  "comment": "1",
  "promote": "0",
  "sticky": "0",
  "tnid": "0",
  "translate": "0",
  "uuid": "8f21a9bc-30e6-4232-adf9-fe705bad6049",
  "uri": "http://example.com/rest/node/31"
}
...
]

This is an array, which some people say should never be returned by a REST resource. (Because What if you wanted to add a property to the response? Where would you put it?) But anyway, it works. You don’t get ALL the nodes, you get only a page worth. Also, you don’t get all the details for each node. But you do get the URL for each node, which is your way to get the full details of a node.

What if you want the next page? According to my reading of the scattered Drupal documentation, these are the query parameters accepted for queries on all entity types:

  • (string) fields – A comma separated list of fields to get.
  • (int) page – The zero-based index of the page to get, defaults to 0.
  • (int) pagesize – Number of records to get per page.
  • (string) sort – Field to sort by.
  • (string) direction – Direction of the sort. ASC or DESC.
  • (array) parameters – Filter parameters array such as parameters[title]=”test”

So, to get the next page, just send the same request, but with a query parameter, page=2.

Get One Node

This is easy.

Request:

curl -i -X GET \ 
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \ 
  -H Accept:application/json \ 
  http://example.com/rest/node/75 

Response:

HTTP/1.1 200 OK
Content-Type: application/json

...
{
  "vid": "76",
  "uid": "4",
  "title": "Embedding keys securely into the app",
  "log": "",
  "status": "1",
  "comment": "2",
  "promote": "0",
  "sticky": "0",
  "vuuid": "57f3aade-d923-4bb5-8861-1d2c160a9fd5",
  "nid": "75",
  "type": "forum",
  "language": "und",
  "created": "1427332570",
  "changed": "1427332570",
  "tnid": "0",
  "translate": "0",
  "uuid": "026c029d-5a45-4e10-8aec-ac5e9824a5c5",
  "revision_timestamp": "1427332570",
  "revision_uid": "4",
  "taxonomy_forums": {
    "und": [{
      "tid": "89"
    }]
  },
  "body": {
    "und": [{
      "value": "Suppose I have received my key from Healthsparq.  Now I would like to embed that key into the app that I'm producing for the mobile device. How can I do this securely, so that undesirables will not be able to find the keys or sniff the key as I use it?",
      "summary": "",
      "format": "full_html",
      "safe_value": "

Suppose I have received my key from Healthsparq. Now I would like to embed that key into the app that I'm producing for the mobile device. How can I do this securely, so that undesirables will not be able to find the keys or sniff the key as I use it?

\n", "safe_summary": "" }] }, "metatags": [], "rdf_mapping": { "rdftype": ["sioc:Post", "sioct:BoardPost"], "taxonomy_forums": { "predicates": ["sioc:has_container"], "type": "rel" }, "title": { "predicates": ["dc:title"] }, "created": { "predicates": ["dc:date", "dc:created"], "datatype": "xsd:dateTime", "callback": "date_iso8601" }, "changed": { "predicates": ["dc:modified"], "datatype": "xsd:dateTime", "callback": "date_iso8601" }, "body": { "predicates": ["content:encoded"] }, "uid": { "predicates": ["sioc:has_creator"], "type": "rel" }, "name": { "predicates": ["foaf:name"] }, "comment_count": { "predicates": ["sioc:num_replies"], "datatype": "xsd:integer" }, "last_activity": { "predicates": ["sioc:last_activity_date"], "datatype": "xsd:dateTime", "callback": "date_iso8601" } }, "cid": "0", "last_comment_timestamp": "1427332570", "last_comment_name": null, "last_comment_uid": "4", "comment_count": "0", "name": "DChiesa", "picture": "0", "data": null, "forum_tid": "89", "path": "http://example.com/content/embedding-keys-securely-app" }

As you know, in Drupal a node can represent many things. In this case, this node is a forum post. You can see that from the “type”: “forum”, in the response.

Querying for a specific type of node

Request:

curl -i -X GET \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H Accept:application/json \
  'http://example.com/rest/node?parameters\[type\]=forum'

Request:

curl -i -X GET \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H Accept:application/json \
  'http://example.com/rest/node?parameters\[type\]=faq

Request:

curl -i -X GET \
  -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
  -H Accept:application/json \
  'http://example.com/rest/node?parameters\[type\]=article

The response you get from each of these is the same as you would get from the non-parameterized query (for all nodes). The escaping of the square brackets is necessary only for using curl within bash. If you’re sending this request from an app, you don’t need to backslash-escape the square brackets.

Logout

Request:

curl -i -X POST \
    -H content-type:application/json \
    -H Accept:application/json \
    -H Cookie:SESS02caabc123=ShBy6ue5TTabcdefg \
    -H X-csrf-token:xxxx \
    http://example.com/rest/user/logout -d '{}'

Notes: The value of the cookie header and the X-csrf-token header are obtained from the response to the login call! Also, obviously don’t call Logout until you’re finished making API calls. After the logout call, the Cookie and X-csrf-token will become invalid; discard them.

Response:

HTTP/1.1 200 OK
...
[true]

Pretty interesting as a response.

More examples, covering creating things and deleting things, in the next post in this series.

Using the Drupal Services module for REST access to entities, part 1

drupal-logo

This is Part 1. See also, Part 2 and Part 3.

I’m not an expert on Drupal, but I do have some extensive experience designing and using APIs. (I work for Apigee.)

Recently I’ve been working with Drupal v7, and in particular, getting Drupal to expose a REST interface that would allow me to program against it. I want to write apps that read forum posts, write forum posts, read or post pages, create users, and so on.

Drupal is a server than manages entities, right? This function is primarily exposed via a web UI, but that UI is just a detail. Drupal should be able to expose an API that is similarly capable. Should be!

The bazaar is alive and well with Drupal. It seems that regardless what you want to do with Drupal, there are 13 different ways to do it. And exposing Drupal entities as resources in a RESTful interface, is no different. There are numerous modules designed to help in this task, some of which are complementary to each other, some of which are overlapping, and most of which are poorly documented. Every module has multiple different versions, and every module works with multiple different versions of drupal. So figuring out the best way to proceed, for a Drupal novice like me, is not easy.

Disclaimer: What follows is what I’ve found. If a Drupal expert reads this and thinks, “Dude, you’re way off!” I am very willing to accept constructive suggestions. I’d like to know the best way to proceed. This is what I tried.

The Services Module

I used the Services module. There are other options – restws is one of them. I didn’t have a firm set of criteria for choosing one versus the other, except that I fell into the pit of success more easily with the Services module. It seems to be more popular and has more examples available that I found via Google search.

Services 3.0 is now available. … Note that currently there is no upgrade path for Services 3, and it is not backwards compatible with older implementations of the API. Therefore some existing modules like JSON Server and AMFPHP will not work with it. …

Not that there aren’t problems with it. The lack of backwards compatibility on a programmable interface is a really bad sign (See the blockquote). That reflects poor planning on the part of the designers of that module. And then there is the lack of clear documentation for how to do most things.

Setup

The first thing: you need to obtain and activate the Services module. There’s a straightforward guide for doing this. I installed the module, then went to the Admin panel to insure the Rest Server was enabled. A screenshot is below.

screenshot-20150317-092348

More Setup

Next, you need to create a REST endpoint. To so so, still logged in as Admin, select Structure > Services. Click Add. Then specify rest, REST, and rest. Another screenshot.

screenshot-20150410-135352

That’s it. Your Drupal server is now exposing REST interfaces. You then need to click on “resources” to enable specific access to things like users, nodes, taxonomy, taxonomy terms, and so on. And you’re all set.

Retrieving Nodes is Easy

Once you have the Rest server enabled, getting an index of the nodes in a Drupal system is probably the most basic thing any programmer will want to do. And beyond that, creating a new node (posting a page or article), creating a user, and so on. For the Services module, there is a nice page that gives examples for doing this sort of basic thing. I’m not really a fan of the layout of that page of documentation; it seems to be all over the place, providing basic REST principles, describing REST testing tools, and then finally giving samples of messages. Those things seem like they all belong on separate, hyperlinked pages. But again, it’s the bazaar, and someone contributed that doc all by himself. If I think it could be better I am welcome to edit that page, I guess.

Here’s one example request from that page:

POST http://services.example.com/rest/user/register
    Content-Type: application/json
    {
        "name":"services_user_1",
        "pass":"password",
        "mail":"services_user_1@example.com"
    }

This is something I can understand. Many of the other doc pages give jQuery example code. Ummmm…..I don’t write in jQuery. Why not just show the messages that need to be sent to the Drupal server? and then let the jQuery people figure out how to type in their ajax methods? ….

The basic examples given there are good but you’ll notice there is nothing there about authentication. Nothing that shows how a developer needs to authenticate to Drupal via the Services module. That ought to be another hyperlinked page, no?

Authentication

There are multiple steps involved to authenticate:

  1. Obtain a CSRF token
  2. Invoke the login API, passing the CSRF token.
  3. Get a Cookie and new token in response – the cookie is of the form {{Session-Name}}={{Session-id}}. Both the session name and id are returned in the json payload as well, along with a new CSRF token.
  4. Pass the cookie and the new token to all future requests
  5. Logout when finished, via POST /user/logout

More detail on all of this in the next post.

Adopting Microservices means speed

“it’s crucial to adopt a four-tier application architecture in which applications are developed and deployed as sets of microservices.”

I love this article from the nginx.com website, courtesy of Hacker News. Very good overview of the “microservices” meme that is currently rippling through the industry. This is stuff we’ve all known – heck, we’ve been doing SOA for 10+ years – but the new term is helping to organize thoughts and understanding about why services as a metaphor is important, why services need to be lightweight, why service contracts (APIs!) need to be loose and forward-compatible, why the development of cooperating services must be done independently.

“It’s becoming increasingly clear that if you keep using development processes and application architectures that worked just fine ten years ago, you simply can’t move fast enough to capture and hold the interest of mobile users who can choose from an ever-growing number of apps.”

Oh yeah! Preach it!

The article discusses Netflix and their adoption of the microservices architecture.

I really respect Netflix as a company that moves quickly and constantly adjusts, seeking optimized architectures to address business problems. All the talking and proselytizing they’re doing around microservices is just the latest reason to really like them. I also really hate Netflix as my children seem to be unable to resist the service for even 15 minutes. Me and Netflix – It’s complicated.

Independent of Netflix and the microservices topic, nginx is also really cool. I found it super easy to configure to accomplish some nifty things in some of my work.

Pretty psyched about Swagger Editor for APIs

I’m pretty excited about the Swagger editor. But to understand why, you first need to know what Swagger is all about.

Let’s take a step back. As of August 2014, total activity on smartphones and tablets accounted for ~60% of digital media time spent in the U.S. This unabated growth in mobile is driving the growth in enabling technologies: tools for developing apps, managing app communications, measuring app and data usage, analyzing usage and predicting behavior based on that usage. APIs are a key connective technology, allowing innovative mobile apps use APIs to access data and services from companies like Uber or Twitter, or from government bodies like the State of Washington. APIs provide the linkage.

APIs are not solely about mobile apps. They can be used to connect “any app” to “any service”; indeed this website uses code running on the server to invoke the Twitter API to display tweets on the right hand side of this blog. But mobile is the driver. Web is not driving the growth, nor is the Internet-of-Things; not in APIs, nor the growth in any of the other enabling technologies. In the 2000’s it was Web. Tomorrow will be IoT. Today, it is mobile.

Ok, so What is Swagger? Swagger is a way to define and describe APIs. A language for stating exactly what an API offers. The description language is analogous to Interface Definition Languages going back to Sun’s RPC IDL, Corba IDL, DCE IDL, or SOAP’s WSDL. Many of you reading this won’t recognize any of those names; it doesn’t matter. We don’t use most of those technologies any longer, more importantly we don’t utilize the metaphors those technologies imply: function shipping, remote procedure call, or distributed objects. While moving away from the tight coupling of binary protocols and towards technologies like REST and JSON and XML that enable more loosely-coupled interactions, we still recognize that it’s helpful to be able to formally describe programmable interfaces.

OK, so Swagger is at it’s heart, a way to describe a RESTful API. Many of you are Java developers and may be familiar with Swagger Annotations, which allows you to mark up JAX-RPC server application code, which then allows you to generate a Swagger definition from an implementation. Something like doxygen. This is cool, but is sort of a backwards approach. Getting the description of the API from the implementation is analogous to getting the blueprint for a building by taking pictures of the finished framing. Ideally you’d like to go in the other direction – first build the design (or blueprint, if you will) of the API, and then generate the implementation. My friend and colleague Marsh Gardiner discussed the design-first approach last year.

This is what Swagger can do. How does one produce a Swagger document? Well if you’re an old codger like me, you might resort to using a text editor like emacs and its yaml-mode to hand-code the yaml. But a much easier approach is to use The Swagger Editor.

The API Description is basically “a model” of the API. And with that model, one can do all sorts of interesting things: generate a client-side library in one of various languages. Generate a server-side implementation stub. Generate a test harness. Generate documentation. In fact the Swagger project has had a doc-gen capability, named swagger-ui, since the early days of the project.

So what’s the upshot? The result of better enabling tooling around APIs, tooling including Swagger Editor and Swagger UI, as well as an API management layer as provided by Apigee Edge (Disclaimer! I work for Apigee!), means that it is easier for companies to expose capabilities as easy-to-consume APIs, and that it is easier for developers to code against those APIs to build compelling experiences that run on mobile devices. So I’m pretty excited about the new tooling, and I am even more excited about the integration we will soon see between these new modeling tools and the existing API Management tools already available.

Good stuff!

Loving the simple API Design Guidelines from GoCardless

See here.

I like this for several reasons:

  • I like the simplicity and clarity of the guidelines.
  • I agree with all of their guidelines; nothing feels controversial there. Such as: Use JSON, and pretty print it. Be explicit with error messages. Use plural nouns for containers. Etc.
  • I like the fact that it is open sourced for the world to see, share and fork.

ps: My employer, Apigee, is still looking to hire SEs, and other API geeks.

Evernote’s argument for delivering a REST-less API leaves me unimpressed.

The Evernote API is notable because it is not based on REST. The defense of that decision leaves me unimpressed.

When the world is going to REST, fully open and usable APIs, why would Evernote go the other way? They ought to have a good reason. Evernote’s VP of Platform Strategy Seth Hitchings has something to say about it. According to the article on ProgrammableWeb,…

Hitchings concedes that compared to the RESTful APIs, developers have to endure a bit of a learning curve to make use of the SDKs’ core functionality; to create, read, update, search, and delete Evernote content. But then again, according to Hitchings, Evernote is a special
needs case

OK, so it’s more work for the consuming developers. It’s also more work for the company, because they have to support all the various “SDKs”, as they call them. [Evernote delivers libraries for various platforms including iOS, Android, C#, PHP, JavaScript, and more. They call these things “SDKs”, but they’re really not SDKs. An SDK is a Kit, that includes a libraries, documentation, example code, tools, and other stuff. When Evernote uses the word “SDK” they mean “library.”] So… why? Why do it if everyone has to do more work?

Seeking the least compromise to data-transfer performance, Evernote needed a solution that could shuffle large quantities of data with minimal overhead. Despite its superior efficiency over XML, REST still wasn’t good enough.

Whoa. REST has “superior efficiency over XML”? That’s just nonsense. REST is not a data format. REST is an architectural approach. REST does not mean “not XML”. If you want to transfer XML data using the REST approach, go ahead. That’s why Roy Fielding, Tim Berners-Lee, and Henrik F. Nielsen invented the Content-Type header. That’s what MIME types are for. You can transfer XML, or binary, or any sort of data with REST.

The implicit and incorrect assumption is that REST implies JSON, or that REST implies not binary. That’s false. There is no need to avoid REST in order to attain reasonable data transfer performance.

According to the article, that faulty reasoning is why Evernote selected Apache Thrift. Furthermore, as a benefit!! Thrift has tools to generate libraries for many platforms:

Thrift’s code-generating ability to write-once-and-deploy-to-many is also the reason Evernote is able to offer SDKs for so many platforms.

Yippee! But guess what! If you used REST, you wouldn’t need to generate all those libraries. And you’d have even broader platform support.

Just for fun, let’s have a look at the API that is being generated via Thrift. The Evernote API Reference looks like this:

OMG, the horror. Look at all that stuff. The reason people like REST is that they can figure out the data model just by perusing the URLs. It’s obviously not possible to do so in this case.

Evernote’s is not a modern API. It is a mass of complexity.

Not impressed.

Not Impressed