Naïve Data analysis leads to incorrect conclusions – WWII Bomber Plane edition

Raid by the 8th Air Force

Here’s a good story showing us how focusing on the data we have in front of us may lead us to incorrect conclusions.

The summary is this: In World War II, Allied bomber command undertook an analysis effort to determine how to optimally reinforce bomber planes to protect them from German anti-aircraft guns. They studied the bullet hole patterns in planes after they returned from missions, and the first impression was to re-inforce the planes where the bullet holes appeared most commonly.

But a smart guy named Abraham Wald pointed out that if the planes were returning with those bullet holes, then the bullets that caused those holes were not critical to avoid. The planes that didn’t return had likely sustained the bullet holes that demanded mitigation.

Are we talking about Bomber Planes or Software here?

Focusing on the data we have is a common approach that supports iterative improvement efforts. The Lesson from the Bomber Plane story is: there may be “invisible data” that needs consideration.

Clayton Christensen World Economic Forum 2013

Mr Christensen looking stern and wise.

This is the central problem explored by Clayton Christensen, in his classic book, The_Innovator’s Dilemma. What Christensen found was that companies can tend to focus on feedback they get from existing customers, while ignoring the needs of prospective customers. And this is dangerous.

We have a project going on right now in which we’re looking at the UI usage data for a particular tool in a web app. The web app tool has for some time now been stable. It was never very complete, and it hasn’t been extended actively. You could say it was stagnant. We know the reasons for that – we had other priorities and this web tool just needed to stay in the backlog for a while. Also, there were, and there still are, other ways to accomplish what was exposed by this web app tool; there are public APIs that expose the underlying function and people have the option to build tactical apps on those APIs. Now we want to enhance the web tool, and the web tool has recently been enhanced to at least collect data on how people are currently using it. How much should we rely on this usage data?

With the bomber planes, the data they had were the patterns of bullet holes in the returned planes. The data they didn’t have were the patterns of bullet holes in bomber planes that didn’t return.

With the web tool, the data we have will be the UI usage patterns of current users. The data we don’t have will be the usage patterns of people who abandoned the stagnant tool some time ago, and are relying on the underlying APIs and independent tools to accomplish their goals. The former set of users will have a relatively higher proportion of novice users, while the latter will have a higher proportion of experts.

So what’s the right path forward? Should we invest in improvements in the tool, solely based on the data we have? Should we guess at the data we don’t have? In the bomber plane example, the data we don’t have was clear by deduction – those planes were most likely shot down. In the web tool example, the data we don’t have are truly unknown. We cannot deduce what, exactly, people are doing outside the tool.

I think the only way to resolve this is to combine three factors: analysis of the API usage, analysis of the UI usage, and finally, perspective on the behavior of the expert users who have abandoned the tool, perhaps formed by direct observation or by intelligent inference. API analytics, UI analytics, and some intelligence.

Then we’ll need to apply some sort of priority to the different scenarios – do we need to help the experts more with their expert problems? or do we need to invest more in making the simple things even easier for the more novice users? Which is the more important audience, the novices or the experts? Which one is more important to serve, at this time in the product’s lifecycle?

Interesting questions! There’s no one right answer for every situation. I know which option I favor in our current situation.

The Quiet Revolution in Software Development

There’s a natural human resistance to change. Everyone has it, everyone is subject to it. Some of us are more aware than others of our own tendencies to resist change unconsciously.  But by and large, all of us like to minimnize surprises, like to feel that we are in control.  We have enough going on, right? Especially in a work environment, where compensation is dictated by achievement and performance is judged and weighed, we don’t like to push the envelope lest we fail. We might lose that pay raise, we might even lose our jobs.

So when a new approach to project management comes along, it’s not surprising to find resistance.  It’s the conservative approach, and there’s a lot to be said for being consciously conservative in business.

On the other hand software project management is just screaming for a new approach. The domain is novel enough that the analogues we’ve tried to apply – Software as system design, Software development as building architecture and design, distributed systems development as city planning – have always been less than satisfactory.  Yes, software development is a little bit like those things, but it is a lot unlike them too.  If we blindly attempt to lay models from those domains into software development, we’ll fail.

Not only is software unique, it is also evolving rapidly. This is cliche, but the implications are sometimes overlooked. Developing a software project today is much, much different than developing a software project 15 years ago, even in the same industry. In 1997, the web was hot, and everyone wanted to figure out how to web-enable their business systems. These days, the web is the platform.  Where before we were delighted to be free of green screens, now we demand integration with mobile consumer-oriented devices. Building inspectors want to bring their ipad’s to jobs to fill out forms, take pictures, and submit their reports over the cell network. These use cases were firmly in the realm of miracle only a few years ago. Now they are de rigueur.

And the ever-expanding list of demands – for more and more connections, more integration, front-ends, back-ends, reporting systems, feedback systems – this explosion of possibility has implications for how we execute software projects. Not only is the list expanding, but it is also ever-shifting.  This is why the building analogy fails: buildings last for years, while we design software expecting to re-design it or extend it in 4 months. We expect it!  There is a demand for constant change, a demand for more or less continuous evolution of business systems.

The waterfall – the comfortable, conservative, well-known approach where there are clear handoffs, lots of documents describing exactly what is happening when, lots of reports, formalized requirements documents, many review meetings – that model simply cannot work any longer, not with the changes in software we’ve seen. This is a model that made sense in projects where testing was expensive and slow, driven by humans. With those economics, it made sense to make sure the plan was rock solid and air tight before we took the first step.

But that model no longer serves us. There’s been a slow but undeniable revolution in software development processes, driven not by hype or synthetic demand driven by vendors, but by a real improvement in results. I’m talking about Scrum and Agile methods. Iterative approaches that favor learn-as-you-go approach, with lots of automated testing that drives many small corrections, rather than a rigorous lengthy planning process upfront.  Software projects  that use these methods are more likely to succeed today than projects using the old-school waterfall methods, if we judge success as on-time, meeting requirements, and on-budget.

Software companies, like Google, Microsoft, games companies, and other organizations that make their money mostly or wholly from software, know this. They’ve been steadily and quietly increasing their commitment to test-driven developments, sprints, Scrummy project management. This isn’t about new products – it’s about new practices.

But larger companies that aren’t in the software business – the ones that think of themselves as manufacturing companies, or financial services companies, or healthcare providers, or telecom – some of these have been slower to adopt these practices. Conservative business people run these companies and they have good reason to tread carefully.

But I’ve got news for you: Scrum is now conservative. It just works better. It’s not hard to do, though it does require some new thinking.  You don’t need a squad of A players to pull this off. You don’t need to raid Microsoft’s dev teams. You can do this with competent developers and competent project managers; with B and C people, the people most companies in the world are stocked with.  In light of this, any software project manager or CIO who prefers to lean toward Waterfall methods for  new development efforts, is taking on unnecessary risk.

Yes, there’s a hesitancy to embrace new things when large sums of money are at stake. Rightly so.  But Agile and Scrum are no longer new.  They are no longer unproven.  You’ve been standing by the side of the pool long enough.  It’s time to jump in the water.