Utilization Considered Harmful
Why It’s Costly Keeping Everyone Busy in Software Organizations
Introduction
In 1968, Edsger Dijkstra authored a letter, Go To Statement Considered Harmful, that tectonically shifted the ground under software engineering [1]. For years, programmers had written code making liberal use of the GoTo statement, necessitating that anyone reading the code had to make random leaps to different sections of the code to follow the logic flow. These leaps made it profoundly more difficult to decipher the code’s intent. At the time, few thought this was anything that should be addressed. It wasn’t until Dijkstra’s letter that most programmers realized the burden GoTo statements imposed and began structuring their code into the modern format we take for granted today.
This paper proposes that we have a similar problem now, one that we rarely stop to consider and that inflicts needless and avoidable costs on us all. What’s this problem? Utilization. Specifically, the belief that we must keep our software engineering staff fully busy at all times or we are wasting money. This belief causes many adverse side effects, which we rarely consider because the utilization mantra is ubiquitous and entrenched. Indeed, utilization is so pervasive that we often fail to notice its existence. As the first fish said to the second, “Isn’t water great?” To which the second replied, “What’s water?” Like the fish, utilization is the water in which we all swim.
Utilization in Depth
First, we must define utilization in this paper’s context. Accordingly, utilization is defined as the belief that software engineering staff must always remain 100% fully tasked because anything less wastes expensive people. This belief often extends further to being more than 100% tasked through uncompensated overtime.
How common is this belief? Remarkably, and a simple example will suffice. Consider the case of the typical programmer, working independently on some task, when she’s suddenly blocked because she needs someone’s help who’s too busy at the moment to assist. What’s her manager’s inevitable response? “Find something else to do.” Rather than fixing the problem of needing help, utilization requires that our programmer stay busy while she waits.
What’s wrong with utilizing the people we have? Nothing, of course. The whole point of hiring someone is to use them. The problems arise when we believe that we only get our money’s worth when everyone is 100% busy and fail to see the cost burden this imposes.
Utilization is rarely questioned, and the moment it is, there is often the knee-jerk Appeal to Extremes [2] response: “Why should we pay people to sit around and do nothing?”
From where do these beliefs come?
The Ghost of Frederick Taylor
In 1911, Frederick Winslow Taylor published his famous work, The Principles of Scientific Management, which described how to improve the efficiency of everyday workers [3]. Today, we know his methods as Taylorism. Fundamentally, Taylorism decrees that there are inefficiencies and idle time in daily work, but managerial oversight can make workers efficient through better utilization. More specifically, companies measure the ratio of task time to total time paid while targeting a value of 1.0 [4]. From there, it’s a short step to believing that keeping everyone 100% busy is the best use of our resources.
Although Taylorism has positive aspects helpful to workers and it’s often unfairly maligned, some of its underlying principles are in opposition to today’s self-organizing teams. For example:
- There is only one “right way” to do a job, as determined through scientific principles that management alone understands. Workers shouldn’t be trusted to apply their own rules of thumb for doing their work.
- Managers should decide who is best for given jobs and not allow workers to decide for themselves.
- Managers should do the planning and training, and workers should simply perform their assigned tasks.
Taylor’s methods were often based on his rather dim view of workers. Describing a worker who handled unprocessed iron, Taylor wrote: “He is so stupid that the word ‘percentage’ has no meaning to him, and he must consequently be trained by a man more intelligent than himself…” [3]
Ouch.
With such a dim view of workers, it’s understandable that Taylor’s methods placed their faith in enlightened management and disregarded workers’ opinions about how best to do their jobs. Accordingly, it was up to management, not workers, to decide how to keep those workers utilized to their optimum level, something we see echoes of today.
The question we must ask ourselves is whether we should allow the ghost of Frederick Taylor to haunt our modern work environments. As we’ll see, we aren’t well-served by methods grounded in a low opinion of workers or believing managers’ duties include keeping workers 100% utilized.
Cost versus Throughput Accounting
For over a century, organizations have used a form of Accounting called Cost Accounting. Only more recently has a newer form of accounting emerged, called Throughput Accounting. A full accounting (sorry) of the details of these methods is far beyond the scope of this article, given that entire dissertations are written on them. However, a brief synopsis will help us understand their role in utilization. See the references for a more in-depth treatment of the subject [4] [5] [6] [7].
Vastly oversimplifying things, Cost Accounting focuses primarily on producing as many work items as possible while reducing the costs required to create them. Throughput Accounting also considers costs but additionally focuses on how long it takes for a work item to begin producing revenue, attempting to shorten this time as much as possible.
In Cost Accounting, staff hours are considered variable and are added to the cost of producing a work item. A primary emphasis is on reducing the labor cost component of making items. In this variable-cost approach, a momentarily idle worker is a cost that must be prevented by keeping the worker fully utilized on any work item, whether or not that item is related to the current work in process.
By contrast, Throughput Accounting considers staff hours as fixed because staff must be paid whether they are working on something or not. If staff costs are fixed, it matters little if a worker is momentarily idle while waiting for a work item’s arrival. What matters instead is throughput, that is, keeping the work item moving toward the customer so it can sooner produce revenue [7].
Oversimplifying even more, Cost Accounting focuses on making money by reducing costs, whereas Throughput Accounting focuses on making money by speeding the delivery of revenue-producing items.
What does all this mean? Because Cost Accounting considers staff costs but not revenue lost to waiting, it leads to a focus on keeping staff fully utilized. However, this utilization comes at the expense of revenue-producing work items sitting idle while waiting for someone to become available. Given the throughput blindness that Cost Accounting causes, it’s easy to see why most organizations would optimize utilization.
Tradition
“Tradition!” as Tevye sang in Fiddler on the Roof, allows us to follow a simple if unthinking path for our work. But it’s also how we continue with practices that should long ago have been replaced with something better.
Tradition doesn’t mean that something is an unequivocal truth. It simply means, “That’s how we’ve always done it.” And how we’ve always done it isn’t necessarily the best way. Tradition lulls us into doing the same thing without questioning it, trapping us in the tar pit of dogma. We should instead continuously experiment with finding better ways, resisting the cognitive miserliness that dogma invites.
Worse, those who choose to use their intellectual agency to counteract dogma are often regarded as apostates, non-believers who must be punished or exiled to distant outposts. Dogma thus becomes self-sustaining, meaning that we never rethink our utilization beliefs.
The Problems Utilization Causes
Let’s now turn our attention to the problems that utilization causes, most of which are rarely considered by typical organizations. We’ll begin with the problem of queues.
Invisible and Unmanaged Queues
A queue is a line of items, each waiting their turn to exit and continue moving. Queues consist of arrivals, the waiting line, a service process, and an exit. Typical examples include a checkout line at a store, a traffic light, and a ticket counter [8].
In software organizations, queues occur when something is worked on and then waits to be used by someone else. Examples include stories waiting for developers, code waiting for testing, and tested code awaiting deployment to customers.
Queues are shockingly pervasive, invisible, and unmanaged in most workplaces. They are also economically damaging. How much? In his book, The Principles of Product Development Flow [9], Donald Reinertsen states:
“Invisible and unmanaged queues are the root cause of poor economic performance in Product Development.”
Donald Reinertsen
Queues result in lengthened delivery times, increased risk, lower quality, and poor morale. They also delay crucial feedback on our work until long after we lose its intellectual context, requiring us to spend time and money restoring it when we’re called to revisit the work. Additionally, this late feedback means we waste time and money pursuing fruitless paths before discovering a better course [8].
Queues are the inevitable byproduct of utilization. When workers are always busy, there’s no slack in the system to absorb surges, so work items must wait until someone has time to address them.
The chart below shows the dramatic effect of running our software teams at increasing utilization levels. The solid line shows that from 80 to 90% utilization, wait time doubles. From 90 to 95%, it doubles again. At 100%, the wait time is infinite. And this is the best case. In highly variable environments, such as software, the dotted line shows that the curve shifts up and left, meaning wait times increase at much lower levels of utilization [8]. Put simply, the busier everyone is, the longer it takes to get things done. This simple conclusion warns us to rethink utilization.
When creating software, we inescapably have questions or difficulties that we can’t answer, but someone else can. As stated above, when that someone is too busy to help at the moment, managers under the spell of utilization will direct their workers to “find something else to do,” which by definition is something of minor value. After all, if the “something else” were important, we would be working on it instead of the item being set aside. Unfortunately, this tactic causes two additional problems instead of solving the first one. The first problem is that the work item goes into a queue. The second is that it adds to another invisible problem, namely, inventory.
Inventory
In software engineering outfits, inventory can be defined as any item where thinking has been done and the item is stored while waiting for further work. Examples include user stories sitting in a backlog and undeployed code sitting in a repository.
We rarely consider inventory in software engineering circles. A simple experiment quickly proves this. Merely ask anyone how much money their organization has invested in inventory at any given moment. You’ll likely get an answer like, “We don’t have inventory. We’re a software firm.” Such a response isn’t realistic. Just because software inventory isn’t visible by piling up on the shop floor, as in manufacturing, that doesn’t mean it’s not inventory. It is, and its lack of physical presence means it’s too often “out of sight, out of mind.”
So what’s wrong with inventory? After all, in manufacturing businesses, inventory is considered an asset on the balance sheet. However, it can definitely be argued that in software organizations, inventory is a liability that incurs management costs until it becomes deployed software producing revenue, at which point it’s converted into an asset. Indeed, in Lean Manufacturing principles, inventory is classified as waste [10]. With this viewpoint, it’s easy to see that inventory is something expensive that we should minimize.
Additionally, inventory’s usefulness decreases with time because markets may shift and no longer reward our work by the time it’s finally deployed. But utilization doesn’t consider these problems, instead leading us to optimize for busyness rather than rapidly deploying work items. The result is the accumulation of inventory.
Lastly, if our focus is on utilization, then we’re incentivized to keep churning out inventory, regardless of whether it’s needed, all the while ignoring the cost of managing it. At no point does utilization incentivize us to push our inventory to paying customers, where it can earn money, a loss we’ll discuss next.
Foregone Earnings
There’s another hidden cost to utilization that few organizations consider. As discussed above, software becomes queued inventory while it waits to be deployed. During this waiting period, it isn’t earning money, and it’s useless to paying customers despite our investment in it. How much money is lost while we wait? That is, how much did we fail to earn while we waited to deploy?
Calculating this lost revenue requires estimating the Cost of Delay. This metric quantifies how much it costs when valuable work items are delayed on their way to customers. The chart below graphically depicts this cost.
Surprisingly, few companies have any idea of their Cost of Delay. Reinertsen [9] states that 85% of companies don’t bother calculating theirs.
Calculating the Cost of Delay is fraught with uncertainty, given that we must estimate not only when our software will be completed but also how much it will earn when deployed. Arguably, these are impossible to estimate with confidence, especially completion dates. Still, even an orders-of-magnitude guess typically shows significant economic losses incurred by delaying the release of valuable software. After all, if the software wasn’t valuable, we wouldn’t have built it (although not always, but that’s a subject worthy of an entire book). So why should we wait to deploy it?
The essential difficulty here is that there’s no line item on financial statements for “foregone earnings while waiting to deploy software because we’re too busy being utilized.” As a result, it becomes another “out of sight, out of mind” problem. But make no mistake, these are actual costs. Additionally, not only do we forego earlier earnings, but we also miss out on the compounding interest on those earnings. So, we penalize ourselves twice.
Unfortunately, this double-penalty situation may be worse than described. It’s predicated on the belief that the delayed software reaches the same earnings peak as software deployed earlier. It’s just shifted to a later time, as the chart above shows. But that may understate financial impacts because the delay might cause a lowered revenue peak. In this case, we forego even more earnings. The chart below shows how much worse the situation could be.
The Capability Trap
There’s a cruel, self-reinforcing loop to utilization. By definition, this loop is nearly impossible to see when caught in it, making it unlikely that we’ll pull ourselves out. It works like this: When we’re 100% utilized, we spend our time just trying to keep the system running. We don’t have time to step back, find the source of frequent problems, and make improvements to alleviate them. In effect, we’re so busy firefighting we don’t have time to clear the undergrowth causing all the firefighting. In software, this typically means we’re too busy for time-saving activities like refactoring messy code, writing unit tests that make it safer to change code, or automating our build and deployment process.
This phenomenon is so common across human systems that there’s a name for it. It’s called the Capability Trap [12]. In workplaces where staff are under the utilization mantra, there’s rarely time to improve our capability. The result is that it becomes ever harder to do our jobs, and we are forced to work overtime to keep up with demand. This overtime only makes the problem worse because exhausted workers are prone to error, and fixing the errors means there’s even less time to make things better. It’s a frustratingly self-reinforcing pattern.
Making things worse, the first rule of the Capability Trap is that when we’re caught in it, we seldom know that we’re caught in it [12]. This diabolical blindness to our predicament means we spend our time working harder and harder with no apparent gain or realization that we need to extricate ourselves.
No competent, modern manufacturer runs their factory at 100% utilization. They know that’s a recipe for extended downtimes and excessive maintenance, leading to the Capability Trap. It’s a lesson we could learn in software environments.
The Myth of Divide and Conquer
There’s a common, if not universal, belief in software engineering that the best way to tackle our work is to divide our teams, assign each member a task, have everyone work in parallel, and then integrate everyone’s work. The assumption is that we’re getting multiple things done simultaneously and being more productive. It’s the strategy of divide and conquer.
From where is this approach derived? Possibly from Taylorism, where Taylor wrote that “…the work can be done better and more economically by a subdivision of the labor…” [3]. Once again, the ghost of Frederick Taylor still haunts our workplaces.
Utilization is the underlying rationale for this approach. If the goal is to keep everyone busy, it makes sense to separate everybody and merge their work to produce a finished product. But, there are profound problems with this methodology. Here’s a partial list of conditions that must be true for divide and conquer to succeed as envisioned.
- No one ever needs to coordinate with anyone else when doing their work.
- No one’s work depends on anyone else’s.
- Integrating all the individual pieces requires little, if any, effort.
This situation is seldom the case. The reality is that we typically aren’t getting multiple things done simultaneously. We’re getting multiple things worked on simultaneously whose complex interdependencies result in queued inventory waiting for other work items to finish. We then create even more inventory when blocked workers are told to “find something else to do.”
Also, there are costly and time-consuming reviews, testing, and rework during the integration phase, usually multiple rounds. Such late testing means problems are discovered late in the cycle when they’re more expensive to fix.
Additionally, separating everyone requires frequent meetings and emails to keep everyone aligned and report on their status, which means profitable code isn’t written.
Finally, because everybody works independently and lacks visibility into everyone else’s solutions, the same problems are solved in slightly different ways, resulting in duplicated code. This duplication is rarely discovered. Also, bad code sits in the codebase because it isn’t reviewed in real-time. It’s refactored only when it’s later reviewed (assuming there’s time, and often there isn’t), which is wasteful and requires follow-up reviews, which add to the waste.
The upshot is that it’s never as simple as doing multiple things simultaneously that seamlessly work together when everyone is finished. Arguably, the divide and conquer strategy creates more problems than it solves, but utilization makes that hard to see.
The Alternative to Utilization
If utilization is so problematic, what should we do instead? In a single sentence, this:
Focus on the flow of the work instead of keeping everyone fully utilized.
That is, everything should be subordinated to keeping the work moving toward the customer at all times and maximizing throughput. After all, that’s how we’ll make money off our work, when customers use it. We also get fast feedback, allowing us to rework items as needed to serve market demand better. It seems like a simple enough concept, but it leads to some exceptionally disquieting practices. For example, check your blood pressure when you read the following sentence. People should wait on work instead of work waiting on people.
Most of us will instinctively respond with thoughts like, “That’s ridiculous! People who aren’t always busy? Talk about wasting money! Why should we pay people to sit around and do nothing?” This idea is so deeply antithetical to our workplace sensibilities (“Tradition!”) that Eli Goldratt [4] wrote:
“It offends our work ethics to see a worker standing idle, so workers are either activated or laid off.”
Eli Goldratt
Goldratt isn’t the only one who noticed the visceral response to under-utilized workers. Donald Reinertsen [9] saw it as well but from a different angle.
“In product development, our greatest waste is not unproductive engineers, but work products sitting idle in queues.”
Donald Reinertsen
Let’s address the objection, “Why should we pay people to sit around and do nothing?” The obvious reply is that they aren’t sitting around doing nothing. They’re engaged with their teammates in delivering profitable software as rapidly as possible, and they’re always ready to pounce on the next work item to keep it moving toward the customer. Most importantly, they’re not adding queued inventory by “finding something else to do.” They’re instead adding throughput.
A less obvious reply, but one leading to new ways of thinking, is to answer the objection with the inverse question:
“Why should we pay to have expensive work items sitting around doing nothing when they could be earning money?”
Seen from this perspective, we might realize that idle work items are a significant and expensive problem that should be addressed. We may also wonder why we’ve never before concerned ourselves with it. The unmistakable answer is utilization, which hides the problem from us.
Combine and Conquer
There’s a readily available alternative to the problematic divide-and-conquer strategy. Rather than dividing our forces, we concentrate them and bring their collective power to bear on our work. We group the people into teams who physically work together, all addressing the same task, at the same time, on the same computer. This strategy is the Software Teaming (Mob Programming) approach [8].
When we combine our forces, not everyone will be 100% needed at every moment, but when needed, there’s no delay for someone engaging with the team. The focus is on the flow of the work, keeping it moving toward the customer and profit instead of the maximum utilization of individuals. While it may be unnerving to realize that some team members will have moments of idle time, the benefits we obtain can help us overcome our unease. These benefits include
- The number and length of queues are reduced.
- Less inventory is created.
- With less waiting, code is more quickly delivered to customers, increasing the throughput of revenue-producing products.
- Quicker delivery provides an earlier return on investment. In the finance world, sooner is better than later.
- Quicker delivery allows rapid market feedback, which provides powerful economic leverage, paving the path to greater profit. Few organizations take advantage of this feedback, but those who do gain a competitive advantage [9].
- Code is constantly reviewed and refactored in real-time, reducing duplicate code, improving designs, and minimizing expensive defects.
- Technical knowledge is disseminated across the team instead of held within individuals. There’s less risk of a single point of failure when someone crucial quits. And in utilization-based companies, there’s usually someone crucial who managers fear will quit.
- Fewer meetings, emails, documentation, and other such items are necessary to coordinate everyone’s work, thus allowing more time to construct profitable software.
Leaving Room for Improvement
When changing our focus from utilization to throughput, we purposefully build slack into our systems. This slack allows us to absorb work surges and gives workers time to learn, improve their skills, discover better ways of working, and invent new features and products. This slack also helps avoid falling into the Capability Trap. Leaving room for improvement allows us to steadily improve our systems, saving us from the ills that utilization produces.
We can also consider our team structures from the customers’ perspective. Do customers want maximum utilization or maximum team engagement? It’s a safe bet that they prefer the latter because they get better quality products delivered more quickly. In short, they receive value, something not provided by utilization. If it’s true that “the customer is always right,” why not give them what they want? Everyone benefits in such a scenario.
Closing Thoughts
It’s easy to see why we focus on utilization in our industry. It has a long history, and it’s an obvious, easily-grasped metric. That makes it appealing to incentive structures that reward a simplistic view of our work instead of a more complex systems view. Once we’re in the grip of utilization, it’s tough to see deeper into the problems it causes, making it likelier to endure.
The goal of a business is to make money. Accordingly, we should measure things that indicate if we’re achieving our goal or straying from it [4]. Utilization, alas, doesn’t do either of these. It simply measures how busy we are and ignores whether that helps us make money or is detrimental to our goal.
What if we allow ourselves to be cognitively curious and step outside the mental ruts tightly constraining us? What if we choose to question how we’ve always done it? We might discover that we become more effective at our jobs.
Utilization’s prevalence won’t be easy to change. It’s been with us for over a century, and it’s naive to think we’ll banish it anytime soon. But, in a larger sense, we don’t necessarily need to decide about changing our utilization beliefs. Instead, we only need to decide what it means if our competitors change theirs. Is that to our advantage or theirs? Reflective and intellectually curious organizations will almost certainly be uneasy to discover that competitors have changed their beliefs. In contrast, innovative organizations may make the change first, disrupting the market in their favor and leaving their competitors scrambling to catch up.
We must each decide when our belief in utilization serves us well and when it doesn’t. Only then will we reduce the harmfulness of utilization.
References
[1] Dijkstra, Edsger, 1968, “Go To Statement Considered Harmful,” in Communications of the ACM
https://dl.acm.org/doi/10.1145/362929.362947
[2] Logically Fallacious, Appeal to Extremes
https://www.logicallyfallacious.com/logicalfallacies/Appeal-to-Extremes
[3] Taylor, Frederick Winslow, 1911, “The Principles of Scientific Management”
https://archive.org/details/principlesofscie00taylrich/page/n5/mode/2up
[4] Goldratt, Eliyahu, Cost Accounting: The Number One Enemy of Productivity, in APICS — The Educational Society for Resource Management, 1983
[5] Martin, James, Comparing Traditional Costing, ABC, JIT, and TOC, Management and Accounting Web
https://maaw.info/TradABCJITTOC.htm
[6] Anderson, David, Agile Management for Software Engineering: Applying the Theory of Constraints for Business Results
https://www.informit.com/articles/article.aspx?p=169495&seqNum=12#
[7] Jassem, Suaad, Alternative to Distortions Created by Traditional Cost Accounting Throughput Accounting, in Academy of Strategic Management Journal, Research Article: 2021 Vol: 20 Issue: 2
[8] Zuill, W. and Meadows, K., 2022, “Software Teaming, A Mob Programming, Whole-Team Approach. Second Edition”
[9] Reinertsen, Donald G., “The Principles of Product Development Flow: Second Generation” Lean Product Development. Celeritas Publishing, 2009.
[10] Lean Manufacturing Tools, Waste of Inventory; causes, symptoms, examples, solutions
https://leanmanufacturingtools.org/106/waste-of-inventory-causes-symptoms-examples-solutions/
[11] von Scheel, Henrik & Clausager, Mads. Maersk Line — Accelerators of innovation and transformation, 2012
[12] Landry, E. and Sterman, J. “The Capability Trap: Prevalence in Human Systems.” In 35th International Conference of the System Dynamics Society (pp. 963–1010), July 2017.