Naïve Data analysis leads to incorrect conclusions – WWII Bomber Plane edition

Raid by the 8th Air Force

Here’s a good story showing us how focusing on the data we have in front of us may lead us to incorrect conclusions.

The summary is this: In World War II, Allied bomber command undertook an analysis effort to determine how to optimally reinforce bomber planes to protect them from German anti-aircraft guns. They studied the bullet hole patterns in planes after they returned from missions, and the first impression was to re-inforce the planes where the bullet holes appeared most commonly.

But a smart guy named Abraham Wald pointed out that if the planes were returning with those bullet holes, then the bullets that caused those holes were not critical to avoid. The planes that didn’t return had likely sustained the bullet holes that demanded mitigation.

Are we talking about Bomber Planes or Software here?

Focusing on the data we have is a common approach that supports iterative improvement efforts. The Lesson from the Bomber Plane story is: there may be “invisible data” that needs consideration.

Clayton Christensen World Economic Forum 2013

Mr Christensen looking stern and wise.

This is the central problem explored by Clayton Christensen, in his classic book, The_Innovator’s Dilemma. What Christensen found was that companies can tend to focus on feedback they get from existing customers, while ignoring the needs of prospective customers. And this is dangerous.

We have a project going on right now in which we’re looking at the UI usage data for a particular tool in a web app. The web app tool has for some time now been stable. It was never very complete, and it hasn’t been extended actively. You could say it was stagnant. We know the reasons for that – we had other priorities and this web tool just needed to stay in the backlog for a while. Also, there were, and there still are, other ways to accomplish what was exposed by this web app tool; there are public APIs that expose the underlying function and people have the option to build tactical apps on those APIs. Now we want to enhance the web tool, and the web tool has recently been enhanced to at least collect data on how people are currently using it. How much should we rely on this usage data?

With the bomber planes, the data they had were the patterns of bullet holes in the returned planes. The data they didn’t have were the patterns of bullet holes in bomber planes that didn’t return.

With the web tool, the data we have will be the UI usage patterns of current users. The data we don’t have will be the usage patterns of people who abandoned the stagnant tool some time ago, and are relying on the underlying APIs and independent tools to accomplish their goals. The former set of users will have a relatively higher proportion of novice users, while the latter will have a higher proportion of experts.

So what’s the right path forward? Should we invest in improvements in the tool, solely based on the data we have? Should we guess at the data we don’t have? In the bomber plane example, the data we don’t have was clear by deduction – those planes were most likely shot down. In the web tool example, the data we don’t have are truly unknown. We cannot deduce what, exactly, people are doing outside the tool.

I think the only way to resolve this is to combine three factors: analysis of the API usage, analysis of the UI usage, and finally, perspective on the behavior of the expert users who have abandoned the tool, perhaps formed by direct observation or by intelligent inference. API analytics, UI analytics, and some intelligence.

Then we’ll need to apply some sort of priority to the different scenarios – do we need to help the experts more with their expert problems? or do we need to invest more in making the simple things even easier for the more novice users? Which is the more important audience, the novices or the experts? Which one is more important to serve, at this time in the product’s lifecycle?

Interesting questions! There’s no one right answer for every situation. I know which option I favor in our current situation.