Accepting Uncertainty: The Problem of Predictions in Software Engineering
Why our predictions continually fail and how to improve our results with learning-based approaches
- The software industry has a dismal track record when it comes to predicting and planning in the face of uncertainty.
- There are significant biases preventing us from learning, including cognitive biases and compensation structures.
- Statistical approaches to predictions can be successful if we expend the effort to create learning-based models such as Monte Carlo simulations.
- Highly uncertain environments are best exploited using the iterative learning models inherent to Agile methods.
- Extremely uncertain, non-deterministic environments are best exploited by the incremental learning model of hypothesis testing (Hypothesis-Driven Development) and learning to accept the discomfort associated with uncertainty.
The Best Laid Plans…
“Prediction is very difficult, especially about the future.”
This quote is often attributed to physicist Neils Bohr. It’s also variously attributed to one or all of the following: Mark Twain, Yogi Berra, Samuel Goldwyn, politician Karl Kristian Steincke, or simply an old Danish proverb. That’s a healthy warning that things are rarely as simple as they seem, and they usually get more complicated the deeper we go into them. If we can’t even determine who cautioned us about the difficulty of predictions, perhaps it’s an indicator that predictions themselves are even trickier.
The software industry operates on predictions at nearly every turn. Yet our track record of success with them is dismal. Depending on which metrics we choose to quote, between 60–70% of software projects are delivered over-budget, over-schedule, or canceled outright , often after large amounts of money have been spent.
What causes this, and what can be done to increase the odds of success? Before we get into that, let’s review the reasons for making software predictions.
Businesses need to budget and plan. Capital, the lifeblood of a business, must be allocated sensibly toward efforts providing the best return. We need to answer questions such as: How much should we spend? Which projects get approval to proceed? Over what time horizon are we allocating? Is it the two weeks of the next sprint or the twelve months of the next fiscal year?
Traditionally, allocation questions were answered by first creating an estimate of long-range project scope and cost, formulating a plan around them, and deciding whether the plan was worthy of capital. For many reasons, some of which we discuss below, a plan often started failing even before its ink was dry.
With the advent of Agile methods such as Scrum, planning cycles are as short as two weeks. But even shortened release cycles still result in disappointment at least 60% of the time . So what’s wrong? Why aren’t we improving things? Why can’t we learn to make things better?
Why We Don’t Learn
Let’s examine what mechanisms drive us to continue with strategies that often fail and leave us stuck in a non-learning cycle. If we can understand the motivations for our actions, it might make it easier to change them and learn something along the way.
We Get Paid Not To
In much of the corporate world, job security and incentives are tied to making accurate predictions, formulating a plan to achieve them, allocating scarce capital, and delivering on the plan. Additionally, there are often financial incentives to deliver at less than the predicted cost and completion date. As long as pay incentives are tied to predicting and planning the future, predictions and plans will be the bread and butter of business activity, whether they produce the desired outcome or not. Worse, the incentives often motivate us to see validity in this activity, regardless of whether it exists.
Unfortunately, the solution to failed predictions is often the alluringly plausible, “We’ll do better next time.” But it’s natural to wonder: how many attempts does it require to achieve success? At some point, we should realize a strategy isn’t working, and we should try something else. What prevents us from realizing this? It’s simple: we’re paid to make predictions, not to understand the problem with them. Upton Sinclair captured the essence of this when he wrote (in gender-specific terms):
“It is difficult to get a man to understand something, when his salary depends on his not understanding it.”
If we want to improve our outcomes, we need to change our compensation structures, so they reward us for learning and move away from structures rewarding us for not understanding things.
An anecdote: Once, when describing to an executive how uncertain completion dates are in non-deterministic systems, he turned to me in exasperation and, holding his thumb and forefinger a fraction of an inch apart, said, “You’re telling me you don’t know when you will finish until you are this close to being done? That’s nonsense.” It’s hard to say who was more disappointed in the conversation. The executive, because to him, I seemed to lack a basic understanding of how business works, or me, because the executive seemed to lack a basic understanding of the mathematics of his business. In reality, we were both right, at least from our respective viewpoints. The real problem lay in the architecture of our compensation system that drove us to incompatible beliefs.
The Allure Of Simplicity
No matter how we practice it, software engineering is a difficult business undertaking. It was always thus. Fred Brooks, writing decades ago, concluded there was “No Silver Bullet”  that would eliminate the inherent complexity and difficulty of developing software. And yet here we are, so many years later, still seeking a solution to the complexity, something that will make it simple. This desire for simplicity drives us to create over-simplified plans discounting the likelihood of the unknowns that will derail our project when they suddenly appear, often late in the plan after considerable sums have been spent.
When it comes to predictions, it’s alluring to believe there’s a universal methodology that always succeeds when we adhere rigidly to its practices. From there, it’s a short leap to believing that failure is due to insufficient rigor in applying the methodology, and not the methodology itself.
Yet history shows us a never-ending parade of methodologies come and go, falling into and out of fashion with regular occurrence. What this suggests is the software industry deals with complex problems, and there’s no simple, “Silver Bullet” solution that will solve them. From that wondrous wit, H.L. Mencken, we have this admonition to warn us about the allure of simplicity:
“… there is always a well-known solution to every human problem — neat, plausible, and wrong.”
The Sunk Cost Fallacy
Once we have invested time and money to create a prediction, the sunk cost fallacy rears its head. The sunk cost fallacy boils down to this: money already spent cannot be recovered, but we’re often unable to see that and spend additional money seeking a return on our initial outlay. We’re prone to this because our job security usually requires us to prove we’re making wise financial decisions that turn out to be profitable. Worse, the more we spend, the more we feel the need to justify our investment, putting us on a spiraling path of ever greater cost. All of this means we will too often spend money to defend a failed prediction long after it would be better abandoned and the money reallocated to a more sensible idea.
There’s an instructive example in the natural world, which has no sunk costs. If something doesn’t work, if it’s maladaptive to its environment, it’s dealt a swift and pitiless blow that ends its reign, and a new idea quickly replaces it. It’s an example worth remembering next time we find ourselves believing our prediction will come true if we just invest a bit more into it.
The Dogma Trap
Any successful business has, deliberately or accidentally, discovered a set of practices allowing it to exploit a niche in its environment. These practices are codified into a system of rules and organizational hierarchies intended to perpetuate the success of the business. The longer the success, the easier it is for these practices to become dogma. If any of these successful practices involve predictions, then belief in the efficacy of predictions may become dogma as well.
Of course, the cardinal problem with dogma is that, by definition, it’s unchanging, thereby blinding us to a changing environment, which has no such definition. And when the environment changes but the dogma doesn’t, then it’s a short step to extinction. Avoiding this requires us to reject dogma.
But rejecting dogma is often an unwelcome practice in an organization. Those who question it often are labeled as someone who “isn’t a team player” or needs to “get with the program.” Sidelining or removing such people is the typical response. After all, dogma exists in a business because it codifies a strategy that led to success and protecting that success is a principal mission of the organization. When it comes to predictions, however, a reasoned approach suggests thoughtfulness, not dogma, should guide decision making.
The Cruelty Of Randomness
Prediction typically has two troubling beliefs inherent to it. One, the future will proceed much like the past and, two, all past events were expected. In reality, at the moment they were occurring, events that reshaped the future were often entirely unexpected. By forecasting the future, we’re often assuming there will be no unexpected future events. The cognitive trap is that new endeavors seem to be similar to those in the past, making us believe we have advance knowledge of events that would otherwise surprise us. But each new endeavor unfolds according to its own internal, and usually unknowable, set of unique characteristics whose complexities are revealed to us only after we’re deep in the work. Complex problems, such as those found in software, never perfectly repeat themselves, no matter how similar they may seem to prior problems, and no matter how much our wishful thinking tries to make them so.
If we know what we don’t know, then we can simply apply an appropriate fudge factor to account for it and proceed on our way, satisfied our plan accounts for unknowns. Unfortunately, we’re too often unaware of our own ignorance, much less how to plan for it. Additionally, we’re incentivized to find reasons it “failed last time because of X, but we have accounted for that this time.” While we may have accounted for X in the latest prediction, it’s never X that surprises us the next time. It’s Y or Z or any other unknown. While there are a finite number of alphabetic characters such that we can eventually account for all of them, there’s no such upper limit in the possible range of real-world unknowns.
But what if we get lucky and are rewarded with a random success for one of our predictions? If we don’t realize it’s random, it will inevitably reduce our inclination to try a new strategy because of our natural belief that success was due to our skill instead of luck. That makes it more likely that random successes will be elevated to the status of perfect execution and repeated failures will be rationalized as poor execution. Unfortunately, the randomness of the reward we get from a lucky prediction causes us to try ever harder to reproduce it. Studies show the humble pigeon will quickly learn the pattern required to peck a lever to release food . And if no food ever arrives, they will quickly give up. But if the reward is random, if there’s no discernible pattern to when pecking the lever releases food, then the pigeons soon are driven into a superstitious frenzy of pecking in trying to determine the pattern. This behavior doesn’t seem terribly dissimilar from repeated attempts to make our predictions come true.
Add in another human bias: we like confident and charismatic people. Confident, certain, and assertive people are promoted quickly and rise to positions where they influence companies. From there, they orchestrate activities to show they can predict the future, formulate a plan, and execute on it. When faced with an unknown, they have a certain answer and a plan of action at the ready, even if the plan represents a mismatch between their confidence and their competence. So we marshal resources under their leadership and move ahead full of certitude. Contrast that to those who are uncertain and when asked about an unknown, shrug their shoulders and reply, “I don’t know. Let’s do an experiment and see if we can figure it out,” leading us to turn to the charismatic individuals instead of the cautious ones.
Overconfidence bias also comes into play. Charismatic and confident people are likely to be imbued with a sense of superior predictive ability over their compatriots. Rationally, we might look at the 70% failure rate of predictions and decide we’re better off avoiding them because we stand only a 30% chance of success. Highly confident people are instead likely to take a different view, discount years of statistics from many projects, and believe their efforts will somehow succeed where others failed.
An anecdote: Many years ago, at the tail end of the dot-com bubble, I worked in a startup as a software developer. We were led by a young, energetic, and charismatic CEO who exuded confidence and certainty. At the time, we leased office space in a building that had been shedding software tenants one after the other as each one fell like so many dominoes. There were only a few of us left, and the nearly-empty building and parking lot had the eerie feel of a post-apocalyptic setting. It was in this environment that our CEO called an all-hands meeting to reassure the anxious staff that our future was promising. I recall him making an eloquent and impassioned case, filling the room with the belief we would make it.
In the end, of course, we were no different than any other of the innumerable dot-coms that failed in the wake of the bubble’s bursting. Indeed, our denouement arrived shortly after our CEO’s rousing speech when we failed to receive another round of financing and joined the exodus of the building’s tenants.
Blinkered by confidence and faith in a charismatic leader, many in the company were unable to see what was obvious: we couldn’t survive if we weren’t profitable. This was clear in hindsight, but beforehand it seemed reasonable to believe we were different and would succeed where so many others recently had failed. It was an instructive lesson in maintaining a reserve of skepticism around charisma.
Being Mistaken, Usually
“Well, we won’t make that mistake again. We even fired some people to make sure it never recurs.” That’s probably true. We won’t make the same mistake because we’re prepared for it on the next attempt. The problem is the first mistake was unknowable before it occurred, and the same thing will happen again, but this time with a different set of mistakes. The set of new mistakes, to which we will fall victim, is an inexhaustible supply because they’re always unknowable in advance. Winston Churchill perfectly described this scenario while addressing Parliament at the dawn of World War II. Members were concerned about repeating the mistakes of World War I and wanted assurance they would be avoided. Churchill replied:
“I am sure that the mistakes of that time will not be repeated; we should probably make another set of mistakes.”
We’re often mistaken and simply don’t yet know it. And being wrong and not knowing it feels just like being right . Actually, being right and being wrong are indistinguishable until the moment we’re proven wrong. That should sound a note of caution about the inevitability of mistakes.
There’s an expression often heard in management meetings and boardrooms: “failure is not an option.” While this is usually intended to discourage half-hearted efforts, it excludes learning and discovery because failure is a necessary ingredient in learning. It also suggests to admit a mistake means to admit incompetence and possibly lose one’s job. Once this belief system is in place and cemented by financial incentives, it can lead to the idea that failure indicates we simply need to redouble our efforts, and we’ll succeed, even if the real lesson is we need to change course. Under these conditions, admitting an error and changing course is a difficult thing to do because we’re irreversibly invested in our belief system. History is filled with examples of businesses that failed to learn and continued to feed ever greater amounts of precious capital into failed strategies, even as those strategies drove them right off a cliff. A moment’s reflection will disabuse us of the notion that we’re somehow immune to such folly.
Strategies That Use Learning
That’s a rundown of some of the reasons why we’re often unable to learn and continue with strategies that fail us. But what if we can avoid these pitfalls? Are there strategies that focus on learning? Yes, and we’ll cover those now.
A Deterministic Approach
Historically, software projects used a Waterfall model of development. Requirements were gathered, estimates were made from the requirements, and schedules were created from the estimates. This approach is based on a deterministic view of software projects, and that with enough upfront data and analysis, we can make accurate predictions about cost and delivery dates. These projects often began failing early, usually due to inadequate requirements and inaccurate estimates. In the latter case, estimates were often faulty because they weren’t based on statistically rigorous methods but instead gathered from methods that were little more than guessing.
It turns out, though, a deterministic view can succeed by using calibrated statistical models gathered from a company’s historical software projects. One common statistical method is a Monte Carlo analysis  . The underlying mathematics are rather complicated, but it boils down to this: we gather a set of historical data, typically including parameters like effort and duration. We then run scenarios thousands of times, randomly varying input parameters to produce a probability distribution that a given amount of work will be completed in a given amount of time. For example, we might derive a distribution indicating a certain amount of staff effort has a 25% probability of being completed within a month, a 50% probability within two months, and a 90% probability within five months. The key point is we use historical data, unique to our organization, to calibrate our model and produce probability ranges for outcomes instead of single-point values. Notice how rigorous this approach is compared to someone’s unsubstantiated claim, “I can do that in a week.”
With this approach, we’re also choosing to learn. We gather data over time and use it iteratively to teach us about our organization’s capabilities and the cost and time required to perform work. Of course, our model is only as good as the data we use to calibrate it. Vague requirements specifications, poor record-keeping for completed work, and other such shortcomings will yield disappointing results.
A Pseudo-Deterministic Approach
A fully-deterministic approach as described above works well if requirements can be specified in advance and are not subject to frequent revision. But this type of project is rarely seen. What if we’re working on more typical projects with unclear goals, uncertain specifications, and unknown market needs? Deterministic predictions under these conditions are unlikely to succeed.
Enter Agile methods.
Agile methods take a pseudo-deterministic approach to software delivery. Born out of the frustration with repeated failures in traditional Waterfall projects, Agile methods abandon the belief in long-term predictions and planning. They instead focus on short-term delivery of working software and adapting to change as it occurs. By using Agile methods, we adopt the philosophy that requirements cannot be determined far in advance but must instead emerge over time.
One of the more popular Agile methods is Scrum . Its two-week sprint minimizes error accumulation by shortening release cycles. We reprioritize with every sprint, and in so doing effectively reset our project clock, giving us the flexibility to adapt to change.
We can still use Monte Carlo-type methods to predict the volume of stories we can produce , but we surrender our belief in one aspect of determinism: that we can generate long-term plans determining project schedules. Instead, we once again focus on learning by iteratively discovering what we need to deliver.
But have we actually solved the problem of predictions and plans? Or have we just lessened the impact of being wrong about them? It seems we might still carry with us the same problem but on a smaller scale.
An Evolutionary Approach
We have progressed from the long-term release cycles of traditional methods to the much shorter cycles of Agile methods. We also abandoned the belief in long-term, fixed requirements and chose instead to focus on smaller stories. Both of these changes help us iteratively discover requirements and produce better results. This leads to an obvious question: if a little discovery is a good thing, is more discovery an even better thing?
Enter hypothesis testing.
Hypothesis testing (also called Hypothesis-Driven Development) takes its cues from the greatest experimental laboratory ever devised: evolution. Evolution makes no pretense at being able to predict what the future holds. It simply responds to change by constant experimentation. An experiment producing a better outcome is rewarded with longevity. A worse outcome is quickly subjected to an ignominious end. If we’re willing to surrender our predictive beliefs, then evolution has a lot to teach us.
With hypothesis testing, we take a slightly more deliberate approach than the pure randomness of evolution. We proceed as scientists do when faced with the unknown: formulate a hypothesis and subject it to measurement and failure in the real world. If it’s falsifiable and can’t be proven false, at least not yet, then it has merit.
There are many ways to implement hypothesis testing   , but here’s a simple example. We formulate a hypothesis such as, “We believe our customers want a left-handed widget feature on our data portal. We declare our hypothesis to be true if traffic to our portal increases by 5% in one week.” If our hypothesis is correct, then we should see at least a 5% bump in traffic within a week. If not, we were wrong and reject our hypothesis and possibly remove the feature. We then reformulate our hypothesis or move on to another one. It’s beyond the scope of this article to provide a detailed how-to of hypothesis testing, but the references provide links to articles with instructive examples and best-practices.
With hypothesis testing, we surrender our predictive beliefs envisioning how the future will unfold. Instead, we build incrementally, testing each small piece as we go, minimizing the risk to capital, and cutting losses early. In effect, we make ourselves intellectually humble and admit we have little knowledge of the future. We accept we don’t know what we don’t know and are unlikely to ever really know much in advance. We can only discover it through experimentation.
Most importantly, hypothesis testing minimizes the biases described above that slow our learning. With it, we get paid to learn and use objective data to validate or falsify our ideas. We minimize sunk costs, thereby making it less likely to cling to a failed idea. We use randomness to help us learn instead of fooling us into seeking a reward where none is to be found. Charismatic personalities have less sway when objective data is the measuring tool. And finally, being wrong is accepted as the normal state and part of the experiment. In short, we’re using an evidence-based decision system over one based on omnipotence and superstition.
We can further inoculate ourselves against bias by placing strict, consistent limits on the amount of capital allocated to hypotheses and requiring short time-frames for proving them true. Otherwise, we’re right back to endeavors needing “just a little more time” or “just a little more money” to be proven right. Evolution allows no such exemptions. Ultimately, we need to decide if we want to be “right” or make money. We sometimes seek the former while claiming to seek the latter.
There’s one further bias to address. Humans are first, and foremost, confirmation-bias machines. If we’re compensated for producing successful ideas, then we inevitably will seek only confirmatory evidence for our hypotheses. We reduce this effect when, in the manner of scientists, we deliberately seek evidence to refute our hypotheses. In short, it’s easy to find confirming evidence for a given hypothesis, but it’s much harder to argue for its validity when we have evidence contradicting it.
Admittedly, hypothesis testing doesn’t yield a particularly motivating rally cry like the predictive approach’s “Full speed ahead!” By contrast, “Let’s run an experiment” is hardly as energizing. But it has the potential to be more profitable.
A Common And Misguided Strategy
“The fault, dear Brutus, is not in our stars,
But in ourselves…”
Julius Caesar (Act 1, Scene 2)
Perhaps we have a biased sample set in our industry and hear only the stories of predictive planning nightmares and not the successes, making us believe the nightmare scenario is the common one. But given so many stories, from so many people, over so many years, it seems the scenario is probably representative of many work environments. It contains the worst possible choices and almost always leads to failed outcomes.
Here’s how it occurs: We have a generic, somewhat vague goal like “increase revenue from our website by ten percent next year.” Or maybe it’s more specific like “add a left-handed widget to our data portal because customers will buy it.” Whatever it is, it typically isn’t well-specified, and the assumptions underlying the premise are just that: assumptions. And hidden details will surely appear as we begin work. We’ve done similar features in the past but, crucially, we’ve never done exactly the same thing before. But that should be “good enough” for the basis of our prediction. We then have a developer provide a prediction that’s little more than an off-the-cuff guess. And then we’re off to the races. It often goes like this in predictive environments:
Manager: “How long will it take to write the Widget feature?”
Programmer: “I don’t know, maybe a month.”
Manager: “What? That’s ridiculous! There’s no way it will take that long!”
Programmer: “Well, OK, I can probably do it in a week.”
Manager: “That’s more like it. I’ll put it in the schedule. Do it next week.”
In an Agile environment, it might look like this:
Manager: “How many story points are you estimating for the Widget story?”
Programmer: “I don’t know, maybe it’s a thirteen.”
Manager: “What? That’s ridiculous! There’s no way it’s that big!”
Programmer: “Well, OK, it’s probably a three.”
Manager: “That’s more like it. I’ll update the backlog. Do it in the next sprint.”
This is little more than random guessing under extreme duress and creates the worst possible conditions: vague specifications, no rigorous collection of historical data upon which to draw for a careful, statistical analysis, off-the-cuff guesses from one programmer, and turning the guess into a commitment to deliver according to a schedule. To this mix, add incentives for managers to “hold developers accountable” for failing to deliver what they never realized was a promise instead of a guess, and the understandable fear of punishment for being wrong about their guess once it becomes a commitment. Is it any wonder failure is an inevitable outcome? The only way it’s delivered is by cutting features, heroic overtime, and sacrificing quality. And yet, the lesson is rarely “this isn’t working so we need to try something else.” Instead, it’s often, “we need to get better at predictions.”
We get what we pay for. If we’re required to use predictions to derive plans, then we must invest the time and money to do it right. If we use Agile methods, the delivery of working software must take precedence over predictions. To do otherwise is wishing to get something for nothing. As the Second Law of Thermodynamics makes clear, “There’s no free lunch.”
Know Thine Environment
It’s imperative to know the environment in which our businesses are operating. If we work on large, contract-driven projects where timelines are extended and the specifications are well-defined in advance, then quantitative prediction is usually a required skill to survive. On the other hand, if we operate in a more common environment where specifications are vague or non-existent, the market needs are unclear or unknowable, timelines are short and urgent, and competition for market share is fierce, then we should consider a hypothesis-driven approach.
A key problem is we often misunderstand the mathematical underpinnings of our environment. We often believe we operate in a deterministic world where more effort will reward us with a successful result. In fact, often we’re operating in a non-deterministic, highly empirical world with an unstable foundation changing with time. Statisticians call this a problem with “a non-stationary base” where the mathematical foundation is not stable, and there’s no base upon which to anchor our assumptions. Under these conditions, fixed, deterministic methods will not succeed outside of sheer, random luck. For all of the biases listed above, it’s nearly irresistible to believe we can predict and plan even when we can’t.
Unfortunately, if we’re not operating under stable conditions, then greater effort put into a prediction has a higher chance of increasing our confidence in its accuracy than it does in improving the accuracy itself. We then become certain of our wisdom, making us prone to commit ever more capital to prove we’re right, instead of cautiously guarding our resources and applying them when the data tell us we’re on the right path.
Knowing the environment in which we operate means pay incentives are aligned with methods producing successful outcomes for that environment. We’re incentivized to learn, in whatever form it may take for our environment.
One of the key difficulties with predictions lies in our natural human reluctance to accept uncertainty. Being in a state of uncertainty and doubt is extremely uncomfortable. So we’re much more inclined to embrace those who are full of confidence than we are those who shrug and prefer to run an experiment to verify a hypothesis.
The external reality is this: the business environment is often governed by uncertainty, unknowable unknowns, and darkness we must navigate with only the faintest of lights. Our challenge is to accept the disquieting nature of that environment instead of clinging to the comfort of a belief system that provides us with a reassuring but misleading picture.
The road to knowledge is paved with the acceptance of uncertainty. If we can learn to live with its discomfort, then we open the path to learning. To paraphrase a famous saying: The price of learning is eternal unease.
 Standish Group 2015 Chaos Report.
 “No Silver Bullet — Essence and Accident in Software Engineering”, by Fred Brooks.
Proceedings of the IFIP Tenth World Computing Conference: 1069–1076.
 “ ‘SUPERSTITION’ IN THE PIGEON”, B.F. Skinner, Journal of Experimental Psychology, 38, 168–172.
 “On Being Wrong” by Kathyrn Schulz.
 “A Gentle Introduction to Monte Carlo Simulation for Project Managers” by Kailash Awati.
 “Web-Based Monte Carlo Simulation for Agile Estimation” by Thomas Betts.
 “The Scrum Primer” by Pete Deemer and Gabrielle Benefield.
 “How to Implement Hypothesis-Driven Development” by Barry O’Reilly
 “Why hypothesis-driven development is key to DevOps” by Brent Aaron Reed and Willy-Peter Schaub
 “Hypothesis-driven development” by Adrian Cho
About the Author
J. Meadows is a technologist with decades of experience in software development, management, and numerical analysis.
A preliminary version of this article was published at https://www.infoq.com on 2019.11.29