Idempotence and the Discipline of DevOps
If you’ve spent a bit of time coding, you will come across the interesting and unusual idea of idempotence. The word not lending itself well to either spelling or pronunciation, many like me become immediately curious.
A concept found deep at the heart of mathematics and computer science, idempotence is a highly valued attribute of operations that relates to efficiency and effectiveness.
An idempotent operation sends a message to a receiver that can be performed again and again without negatively impacting the receiving system. Put more simply, an operation is idempotent if you can repeat it over and over again without causing any unwanted side effects or harm to a system.
A non-idempotent operation, however, will put strain on a system with each repeated occurrence. It will require additional cycles and energy to cope with the work it creates. Taking an absurd example, repeatedly asking a waiter in a restaurant for the status of your order forces them to stop what they are doing and tell you that status. Doing this continuously will delay your order indefinitely. Here, the system suffers at the expense of the operation.
How could such a uniquely important property of operations have escaped me for so long? And how does this rather obscure concept relate to my life and my work?
The Game
The answers to these questions came to me during a recent workshop. Thirty technology professionals from organizations across my city spent an afternoon participating in a business simulation.
We started the day by organizing into business units — Strategic Management, Business Management, Service Desk, Service Desk Management, IT Dept, and so on. In a simulation running on a screen projected overhead, problems occurred. To resolve the problem, each business unit had an action to perform.
The business managers needed to determine what when wrong; then the Service Desk used that to figure out what systems were affected. The Service Desk managers decided how to fix those systems, and finally the IT Dept. needed to fix the problem, simulated by solving a puzzle in a Mensa book.
The IT Dept. then presented the fix to the game proctor, who either resolved the problem or penalized us if it was incorrect. The longer the problem was outstanding, the more money the business lost.
There were many opportunities for things to go wrong. At first, timing was terrible, the wrong systems were getting fixed, the wrong problems were getting worked on, the proctor was getting incorrect answers to the wrong puzzles — it was pure chaos. So after the initial 30-minute simulation, the group sat down to discuss what went wrong, what changes we can make, and how we can implement those changes effectively.
Over the course of three iterations, tightening things up each time, we implemented management systems and processes to control the chaos. Quickly we found ourselves at the helm of a simulated business that was actually earning money. Meanwhile, back in reality, we were learning tangible lessons by being forced to improvise on our ideas for operational improvements.
Improving operations
The main issues we experienced involved the pain of not knowing what other teams were doing without interrupting them, and the overhead involved in handing off work to other teams. The work itself was easy but moving the work between teams was very difficult.
We didn’t have a clear understanding of how to get the most value with the least amount of interruption. Incomplete handoffs would cut off work completely, but every handoff was causing work to slow down painfully.
In short, the flow of information through the system was broken, and the result was that the wrong work kept getting completed way past the allotted time.
With each successive iteration, however, we applied basic principles of Lean and systems thinking to balance the flow of work across the system. This meant optimizing the relationships between teams — implementing operations that would provide visibility and allow work to be handed off without creating strain on the system.
By the third iteration, we had fully invented systems that allowed work and information to flow across the system without interruption. We created signals for other teams that didn’t prevent them from focusing on their work in progress. We created methods to make sure teams were working on the right things. We established roles and responsibilities so that not everyone on a team had to be involved in every single step.
In other words, the system became optimized to allow cross-team operations to be performed idempotently; that is, without causing interruptions, rework, distraction, or confusion.
It is no wonder that idempotence is such a coveted goal of any system — mathematics, computer science, business, or other. The ability to work effectively across boundaries requires designing operations that can be performed reliably without causing unwanted side effects.
The connections between the internal boundaries of a system present the greatest opportunity for the success for an organization, as well as the greatest risk for failure. Get these right and you have a system that produces an order of magnitude more than the sum of its parts. Get them wrong, and the system will be unable to deliver even the value intended from each of its components — waste abounds, and eventually systemic failure occurs.
And so, this must always be our goal — to bridge the gaps between functions in our systems, to apply systems thinking, and focus efforts on improving the relationships. It is the fidelity of these relationships that are the limits of possibility for a system and focusing on optimizing these will provide us with the greatest opportunity for success.