My hope is that you walk away with something useful for your situation: Better understanding of Technical Debt for yourself. A specific tool you can use to better your situation Key insight you can share with your team or your management to create a sense of urgency to address your technical debt. How we can become masters of it. First step: awareness of the situation.
How are we going to make this time worth your while, Today? I thought we’d address three key questions We need to be clear about what “Technical Debt” is. There’s a couple of parts: the debt itself, the interest payments and the underlying cause. If it is such a bad thing, we need to find out how we allow it to happen? Let’s see it happening as it is happening and not discover it. And knowing all this, what can we do?
This is J. Wellington Wimpy ( http://en.wikipedia.org/wiki/J._Wellington_Wimpy) from the Popeye cartoons. His best-known line is, “ I ‘ ll gladly pay you Tuesday for a hamburger today! “ He ‘ s illustrating the essential trade-off: take out a loan on our production capacity to be apparently more productive. When we do this prudently, there ‘ s a significant opportunity on the upside. But unfortunately, more often than not, it ‘ s just hamburgers. It ‘ s not worth the trade-off. It ‘ s a juicy Big Mac, but oh the calories, oh the fat. And if we keep grubbin ‘ down on fast food, we might end-up lookin ‘ like this guy. Whimpy ‘ s imprudence is his ignorance and impatience: rushing in without considering the consequences. To put it in the simplest terms: haste makes waste.
There are two parts: Technical Debt are the defects that sit there, latent, like a minefield. We experience the “pain” There’s a cause-effect relationship here: the presence of these defects are primary causes of the time wasted. Interest payments are symptoms, the debt they arise from are an underlying cause. So, if we can prevent those defects from occurring in the first place, then we gain back that time that would have been wasted.
Technical Debt is not a root problem. Technical Debt is a symptom of a deeper cause. If we have any hope in being the master of this situation, we need to understand the true root cause. To get a clue, we’ll start with frogs…
Frogs are amphibians. They are cold-blooded which means that they draw their heat from the environment they are in. In short, they adjust to the conditions around them. More generally, this phenomenon is known as the mere exposure principle. The idea is this: a net negative experience, if tolerable in the beginning, will be accepted as the norm merely by being exposed to the condition long enough. In other words, this situation keeps happening to the point where it starts to feel normal. Turns out this behavior is not limited to amphibians. Humans also adjust to their environment. On software projects, the “heat” we experience is the level of urgency. This is a real boon: both the frog and the software team gain from this mere exposure. The frog gets the warmth it needs, the team gets the motivation they need. However, there’s a threshold: keep turning up the heat on the frog or keep ratcheting up the pace on the software team … we’re no longer being warmed…
… we’re slowing being cooked. If we don’t focus, we will miss a key opportunity (in some cases, it’s survival). Have you ever experienced the focusing effect of a deadline? That’s lighting a fire under us… But if we don’t keep the sense of urgency in balance with the team’s ability to absorb that rate of change, it overwhelms us. And we start to compromise our discipline under the banner of “Get it Done.” At that moment, we’ve tipped from “high performing” into “reckless”. What was a constructive force for delivering…
… becomes a destructive force, eating away at our capacity to delivery for tomorrow. Is this really where we want to end up? (think of all the resources being wasted here)
That overwhelming pressure is really the spark of a vicious loop. There are a lot of elements, here, but the key decision was to forgo (or not even take on) quality control Either we’re not aware of how important quality control is, or we forget in the heat of the moment.
There’s actually a lot in play, here. So, we’ll focus just on the predominant elements, here… We’re going to build-up two equation to help us get a sense for the “fundamental physics” Capacity = “our potential for progress” And in a perfect world, we achieve all of what we are able, so progress = capacity Imagine the ideal team: one that does nothing but write perfect production code: Scrum Team; 2 week sprints; 140 pt / sprint. It doesn’t take long, however, to notice that we’re not just writing code… every so often, we’re running and playing around with it… That reduces our overall progress because in those moments, we’re not writing production code… And if we’re checking to make sure that it’s working… every so often, it isn’t. So we have to stop and spend time… Let’s get a little more precise about what we mean by “progress”… When you consider how this plays out in the real world, you’ll notice we have direct control over … … but we can only influence whether we have a defect or not… So, to make this model meaningful, we have to answer the question, how likely is it? To determine the probability that we’ll encounter a defect, we have to consider the primary actors The actual source of defects are the humans. Every time we make a change, we risk doing it wrong… And the chances that this change will result in a defect is proportional to how complex the existing system is… But to the extent that we install checks… feedback loops that tell us when this little assumption over here is now invalid… that goes a long way to mitigating our injecting a defect. I don’t know about you, but I’m a visual person, so here’s a graph to illustrate these two equations.
What you’re looking at are two graphs superimposed: The area graph is progress being made. That was our first equation. The line graph is the probability of defects emerging. That is our second equation. First, we consider the case where few quality controls are in place: we’re under the gun and the emphasis is on gettin’ ‘ er done, not “wasting” time writing unit tests or reworking the shape of the code: these things don’t produce features. What is this telling us? Notice that, in the beginning of the project, there’s initially no software. How complex is that? So the chance of running into a defect are very low. So, focusing all our efforts on just crankin’ out code are paying off: lots of productivity in the beginning. Smart move, right? But as time marches on, we start to see the effect of a lack of quality control Do software systems get more or less complex as we build them up? As complexity increases, with each change, what’s happening to the chance of injecting a defect? There’s an inflection point that happens, somewhat quietly in the middle of the project where the interest payments of the technical debt starts to shoot-up. And what happens in real projects is not that we suffer the full set of defects, but that we simply don’t implement certain key features and tediously spend an inordinate amount of time putting in the simplest changes.
Now let’s see what the equations show us when we do put in a concerted effort to maintain quality. In this case, our team collaborates on the work, testers identifying vague requirements, Developers are writing unit tests, first where appropriate, And we’ve completely automated our build and deploy process. One key difference here is that the defect probability curve (it was the red line in the previous graph, blue, here) is dampened: we are able to keep it under check. This is what I meant when I mentioned keeping the sense of urgency in balance with the team’s ability to absorb that rate of change. Teams absorb change through reaffirming their confidence that the software is true by keeping the discipline of quality control. We see the effect of that by a dampened probability of defects.
When we put the two situations together, we now can see how the two philosophies play out: In the beginning, the team emphasizing progress over quality control (grey and red) got a jump start and quickly pulls out ahead. And if this was the whole story, then that’s the prudent move. At this point, we’re not fighting significant defects so the Blue and Green team’s emphasis on quality management really appears to be just waste. But as the story continues, depending on the circumstances, some where's between 6 and 18 months we start to see the multiplying force of unchecked complexity. Our grey and red team have started to experience some really crazy-making problems: where sessions hang attempting to access the shared cache or every once in a while a user is spontaneously logged-out, killing conversion. The Blue and Green are far less likely to run into such problems. They’ve caught many of the little one-off kinds of errors that might snowball into sessions spontaneously timing out or missteps in logic that would have created a race condition. And because they are not spending time tracking these issues down, they are starting to overtake their competitors in terms of new innovations to their product. Three years into the product development: Our grey and red team are limping along, weighed down by the mass of technical debt they are carrying; spending loads of time just keeping the lights on. Now, this company is faced with having to re-write the whole platform… Meanwhile the Blue and Green are well-poised to tackle that 4.0 release. And it isn’t going to be a huge affair: it’s business as usual, chugging along. If we were to sum the area under the curves, we’d see that in the end, the Blue and Green are more likely to be productive and have far more capacity to continue to do be so.
So, how do we realize the reality of the green and blue team? What do we do with this sharper awareness of how Technical Debt happens? To be the master means to shirk of the role of of the victim (like our Grey and Red team). The first order of mastering technical debt, is realizing that maintaining our discipline of quality control and keeping the lid on complexity are essential. That’s all well and good when we’re starting out. But how do we keep our eye on it? Or even more relevant, if we’re clearly steeped in Grey and Red, what can we do? The short answer is: find and mitigate the true root cause. It’s a three step process:
It really depends on where you are in the organization: If it’s something that’s in your control, it’s about going to that key decision and making the other choice. E.g. Instead, of favoring progress, regardless; invest in build automation, learning how to write better unit, functional, and end-to-end tests. If it’s something that you can influence but you don’t have total control (meaning that you need the collaboration of at least one other person), find a way in which this root choice affects both of you… and in that way, you connect the effort to fix the root cause to a common goal. If it’s something that you have no control at all over, then your path is to determine how to make the impact of the symptoms visible and in terms that the decision maker values and show how mitigating the root cause will make a difference. … and of course, no self-respecting fortune-making process would be complete without a step #3.
.... So, let’s take an example situation
Let’s say we’re developing a time keeping system called TPS-9000. And one of the essential features in TPS-9000 is attributing time to job codes. So, we’ve just completed the third Sprint and we’re showing of this shiny new feature to our users. But as we show them the time-entry form, there’s a problem. There’s just one hitch: they need to be able to add multiple job codes to a single line-item. But the way we wrote it, each line item has exactly one job code. Our users are trying to be understanding, they look for ways to be able to do proper time entry with just one job-code per line, but they can’t. In fact, the meeting gets really touchy when Sal, the business analyst, (who’s known for calling a spade a spade) blurts out: how could you even think that this was the way to do it? This is incredibly obvious! But turns out, this is a significant change and the team goes back to redo the line-item management logic. What happened here? During the internal retrospective, the team talks it out. … So, the interest payment is in time wasted reworking the feature, The technical debt, in this case, is the vague requirement (here, not stated) And what caused that debt was the fact that the team has been working in silos.
… and of course, we should never forget we need to …
All kidding aside, “Collapsing the Org” is just one of a collection of really powerful moves you can make to mitigate Technical Debt….
What is the source of technical debt? … People. Colleague, Jonathon Golden, wrote an article for the Cutter IT Journal about the source of technical debt. As we saw, collapsing the organization is about breaking down the social and process barriers to face-to-face collaboration. We’ve also seen a hint of Putting “Quality First”. This is really about, as a team, adopting the mindset that quality is an essential part of our product. There’s obviously a balance that needs to be struck here, least “perfection become the enemy of good.” And that’s part of what Active Product Ownership is about. Jonathon calls-out the fact that having a strong central vision and ongoing hands-on guidance throughout the process helps keep the whole team aligned. And a by-product of alignment is the early discovery and correction of misunderstandings. Jonathon didn’t stop there. He also points out that if we put all this effort into re-organizing ourselves into cross-functional teams: those where we work on stories together… wouldn’t we want to make sure that those that we bring into the fold have the same mindset? It’s surprising how often we don’t think to make this adjustment. Especially in larger organizations. “ Encourage Communication” is Jonathon’s quip about shaping the environment: for example, if the team is all in offices, create a more open space for them to work together… to allow for more accidental conversations: the really valuable incidental chats. And while this list isn’t exhaustive, another key element comes in the form of command-and-control impositions. The classic example is the technical mandate from the Architect: thou shalt use message queues for all inter-system communication. In some cases, it just may not be appropriate. But if the Architect isn’t a member of the team (think “Collapse the Org”), then she’s not there to contextualize her otherwise sound advice.
As the craftsmen on the scene, Developers we really need to take responsibility for knowing how to dial-up the quality control to an appropriate level. There’s nothing wrong with NOT writing unit tests for a chunk of code if it doesn’t warrant it. But the difference is in knowing that’s the trade-off and consciously making the choice (as opposed to not knowing in the first place). To this end, we should expect to have some level of mastery of the craft of software development. I have two suggestions to get to the next level.
But this is just one example of a number of skills we should be honing… There are four texts I recommend that if you haven’t studied that you rush out and pick up: First, I’ve really enjoyed Chris Sterling’s “Managing Software Debt.” If today’s presentation resonated with you, you’ll enjoy his expert elaboration on these ideas in his superb treatment of not just what software debt is, but detailed practices that the team can use to avoid it. As we’ve seen, an essential element is keeping the lid on complexity. The way to do that, nuts and bolts is Refactoring: the art and science of tweaking the shape of your code in increments toward a better design. Martin Fowler, long ago, wrote the seminal treatment on this topic and it’s absolutely relevant, today. When you get the book, jump to Chapter 3 and see how many code smells you recognize in your own code base. If Refactoring is making small changes in the design of your code, then having a good grasp on the patterns that you will refactor towards is the next natural step. While the “Gang of Four” book is a terrific academic treatment of the topic, I highly recommend the more accessible “Head First Design Patterns”. If you don’t know your destination, how do you know you’re going in the right direction? Knowing design patterns gives your refactoring a destination. Finally, For me, the pragmatic programmer was a game changer. It raised the bar for me of what specific skills I should develop and what mindsets amounted to true professionalism as a programmer. If you buy no other book, this is the one.
Draw a model of your architecture or major application components on a whiteboard. Write down areas of software debt within and between across the model on post-its and stick them close to their respective area. Add the category of software debt that they represent to the post-it (technical, quality, configuration management, design, or platform experience) If any item seems to cross types of software debt try breaking it down into smaller parts Identify the potential value to the software’s users, the business, or the team’s capability to deliver faster or more confidently once it has been addressed sufficiently. You may also find that there are opportunities to break them down into smaller parts here with a smaller win having greater impact. Vote on which of the areas of software debt are most menacing and impact the most near-term development. Prioritize the areas of software debt in stack rank order so that they can be discussed with the customer or Scrum Product Owner (if they are not already in the room, which I do recommend)
What is Technical Debt? “ Technical Debt” is the structural defects in our system: requirements, software and manual processes are examples. “ Interest Payments” are the symptoms we experience, caused by Technical Debt (which is further caused by choices that we make). How does it happen? Through a vicious cycle of degrading quality fueled by applying more pressure to the team than they can absorb. If we are building a software product that is intended to last longer than a year and we don’t employ rigorous quality control, we run the serious risk of “getting cooked”. What can we do? Investing in Quality Control and managing complexity are the keys to ensuring teams can maintain productivity over time. If we’re suffering from Technical Debt, the “trick” is to dig to the root cause of it and mitigate that; the downstream problems will fall away. Product Owners: understand and apply the Six Golden Transformation Patterns. Developers: invest in your professional portfolio and get better at your craft. Scrum Masters: make Technical Debt manageable by guiding the team to harvest “Quality Improvement Stories”