Defects On Sale!
Aug 28th, 2007 by klimek
Today after our planning game I did a short poll on how the guys perceive test driven development and pair programming. We’re trying to do both for some time now, and since I take the blame for introducing both practices, I feel I’m somewhat – um – preoccupied on that matter. A few days ago, I was caught totally off guard when Richard told me that, well, he doesn’t believe programming in pairs is more productive. Bummer. And I had believed my show to be grand circus.
Coming down to earth from my alien space shuttle of imaginative knowledge in the face of uncertainty I realized that I didn’t really have a clue what my teammates thought about our recent process improvement tactics. I figured the easiest way to find out would be to ask them. So I did a short poll. There were six people. Including me. I asked four questions:
| Question | Yes! | No! |
| Do you think TDD makes you more productive? | 3 | 3 |
| Do you think TDD leads to better quality? | 6 | 0 |
| Do you think pair programming makes you more productive? | 3 | 3 |
| Do you think pair programming leads to better quality? | 6 | 0 |
Now this is an interesting bite from the apple of knowledge: while we all seem to agree that pair programming and TDD increase code quality, half of the guys thinks that this raise in quality comes with a cost in overall productivity. Unfortunately shooting them with my nerf gun didn’t help to teach them reason, so I concluded that the half I am in may be wrong. Perhaps.
But since I usually don’t give in that fast I pondered over this anomaly of perception during our two-years-wedding-anniversary-dinner. While I munched down a deliciously flavorsome tenderloin, Anna proposed that maybe if you believe that TDD and pair programming don’t increase productivity you don’t expect to make any errors. While the implication would be true, the poll’s data seems to suggest that all of the guys think that the practices improve quality – which implies that they expect to make errors.
So when we arrive at a point where we are self-conscious enough about our code to expect ourselves to err frequently, a simple question remains:
What Is The Relation Between Quality And Effort?
This is where a little math may help… Let’s define the overall effort of a feature as the effort it takes to produce a certain function in lines of code (how crude!) plus the effort to fix the expected errors. The oversimplified measure of programming tasks in lines of code is, of course, questionable to the degree of calling it excrement of horned mammals. On the other hand it allows me to do a quick-and-dirty wort-case pi times thumb calculation.
effort(feature) ->
codingEffort(linesOfCode(feature)) +
expectedFixingEffort(linesOfCode(feature))
Let’s further simplify (yuk) that the coding effort is defined as directly proportional to the lines of code of the feature:
codingEffort(numberOfLines) ->
codingEffortPerLine * numberOfLines
Excessive googling (and IEEEing) informs us that the defect rate is normally defined as defects per thousand lines of code. So without test driving my functions I’d expect the expected fixing effort to be something along the lines of:
expectedFixingEffort(numberOfLines) ->
fixingEffortPerDefect * (defectRate / 1000) * numberOfLines
But where does this lead? Good question. My answer is even more assumptions: Perhaps we can agree that if we make errors (and we do, don’t we) introducing practices that increase quality allows us to exchange coding effort (up-front-effort) with fixing effort. If you read carefully, perhaps you ask whether I may exchange effort for cost arbitrarily… well, technically, no, but since I’m a software developer the Flying Spaghetti Monster may smile forgivingly onto my unworthy soul.
For example, when I do pair programming and my partner finds an error that I didn’t see, the effort of this lapse is about:
- “hey, shouldn’t that read ‘>=’ instead of ‘>’?”
- “oh, yeah, ‘course”
- *clickety-click*
– 3 seconds –
When such a defect is not found until the product is in the field, the effort of fixing the error is:
- Cost of the error for the customer (lost money, lost customers, being angry, beating up the pup)
- Reporting the error to the provider
- Checking the error logs and dealing with the customer
- Reporting the error to our hotline
- Checking the error at our site and finding out what the error really is
- Reporting the error to our development
- Prioritizing the error
- Trying to reproduce the error and find out what the customer really did
- Finding the error
- Fixing the error
- Building a new patch-release
- Testing the patch-release
- Getting the patch-release approved by the customer
- Updating the life-units with a certain probability of update-death
- (More indirect cost due to loss of trust, etc)
– um, more than 3 seconds, definitely –
I think it is not presumptuous to claim that increasing quality may also increase overall productivity if the expected effort to fix an error is high enough with regards to the expected decrease of errors due to better quality. The refined question is
What does a worst case error effort scenario look like in the break-even point of quality against productivity?
Let’s assume we know a practice that increases our coding effort by a factor (additionalEffort > 1) and improves our error rate by a different factor (defectRateImprovement in [0;1[). For the practice to be effort efficient the overall effort without implementing this practice must be greater than the overall effort when using the practice. Using the already defined formulas this yields:
(codingEffortPerLine * numberOfLines) +
(fixingEffortPerDefect * (defectRate / 1000) *
numberOfLines)
>
(additionalEffort * codingEffortPerLine * numberOfLines) +
(fixingEffortPerDefect *
(defectRate * defectRateImprovement / 1000) *
numberOfLines)
Tackling this equation with a load of 7-th grade mathematics gives:
fixingEffortPerDefect * (defectRate / 1000) *
(1 - defectRateImprovement)
>
codingEffortPerLine * (additionalEffort - 1)
Should this innocent looking inequation be close enough to reality to make any sense, we could conclude that
- After you cut down the defect rate by a factor of two, cutting it by yet another factor of two would require twice the opportunity cost. Which means that halving your defect rate gets more and more expensive with regards to the opportunity cost of letting the defect go wild.
- If you know your current defect rate and your current price per defect, you can guess whether the defect reducing effort spent for a certain practice will be cost efficient. Of course a practice may and probably will have other impacts. But that’s a different bed-time story. Featuring a hungry gorilla and a beautiful princess.
Now that we’ve got a nice equation we can torment it with some values, fed to our greedy mouths by the power of the Flying Spaghetti Monster. Let’s assume that we have a defect rate of 20 defects per 1000 lines of code (which a google search reveals to be considered somewhat “normal”). Let’s now assume that our practice increases coding effort by a factor of 2 (which is the worst case for pair programming, obviously). Let’s further assume that this will find one tenth of all errors directly when they’re implemented (fixing the errors in this phase is covered easily by the effort factor of 2). Watch and behold 3rd grade maths:
fixingEffortPerDefect * (20 / 1000) * (1 - 0.9) > codingEffortPerLine * (2 - 1)
… or …
fixingEffortPerDefect > codingEffortPerLine * 500
This means that for a defect rate of 20 errors per 1000 lines of code using a practice that doubles your coding effort and finds a tenth of the errors during coding will save you some bucks if the expected effort of fixing an error is more than 500 times the effort of writing a single line of code.
If you want even more numbers, let’s further assume that in C++ you need 60 lines of code per function point (now we get really braggy) and that you can somehow earn $200 per function point, this means that our practice lowers overall cost if the expected price per defect is greater than about $1600.
It all boils down to this: If you work in an environment where the average price per defect found outside the holy halls of your development team is greater than 2000 bucks, introducing a technique that doubles the coding effort to prevent a tenth of the errors will reduce development cost and thusly increase productivity. Well, if I really did a worst case analysis and didn’t mess up the seventh grade maths up there, that is.
Do you think a total expected cost of $2000 per defect is a lot? Does this apply to your work environment? Do you actually have any clue how much your favorite defect is today?

People new to agile methods often assume that practices like TDD and pairing reduce productivity. It may be helpful to ask what they mean by “productivity.”
Often, developers understand “productivity” to be a measure of the quantity of source code they can produce in a given time. If we take “productivity” to mean the amount of fully-tested, working code (that meets the acceptance criteria) they can produce in a given time, then the effects of TDD and pairing become more evident.
In a traditional approach to programming, a developer types as much code as possible as fast as possible, and then spends a certain amount of time testing (usually manually and informally) and debugging the code. It’s normal for the testing and debugging time to be much greater than the original coding time.
Using TDD and pairing, the initial testing (as well as the detailed technical design) occurs concurrently with coding. In accordance with the well-known rule of thumb that it is cheaper and easier to fix errors when they are found early in the process than when they are found late, the original code delivered in this way tends to have a lower defect density than code delivered in the traditional way.
There have been a few controlled studies of the effects of TDD and pairing. George Dinwiddie has kindly set up a wiki where people can post links to studies that can help us explain the value of agile methods. The wiki is here: http://biblio.gdinwiddie.com/biblio. You might find information there that will help your team understand the costs and benefits of TDD and pairing in an objective way.
One of the best-known studies of pairing was conducted at the University of Utah in 2000. Researchers found that the initial development time increased by about 15%, and the number of defects was reduced by 85%. A 1975 study by the US Army of so-called “two-person team” programming found a three-fold increase in “productivity” at no cost in development time.
Taking the 15% figure as a worst-case baseline, let’s compare what happens in traditional and agile development, and see which approach takes less time to deliver fully-tested, working code that meets acceptance criteria ( = “is more productive”).
For purposes of illustration, I will use the adjusted defect density rate found in this study http://www.idi.ntnu.no/grupper/su/publ/pdf/ericsson-qa-icse04-final.pdf for non-reused code, 0.631 per KLOC. I will also accept the result from the Utah study of pairing, that this practice adds 15% to the original coding time.
Traditional: Four developers work separately (no pairing) and write code first (no TDD). Each individual produces 10,000 lines of code in the amount of time scheduled for development. After that each, still working separately, performs manual, informal testing of their own code. Next, the four developers integrate the modules they have written. This integration step adds perhaps 25% to the time spent in development.
Agile: Four developers use the agile techniques of promiscuous pairing (they switch with each other during development) and TDD, including disciplined test-first development with incremental refactoring. The four collectively produce 40,000 lines of code in the same fixed amount of time as the traditional team. Thus, the two teams produce the same raw quantity of original source code.
Notice that there is no need for after-the-fact integration of the code developed in the agile way. In effect, then, the traditional approach cost 25% more than the original coding time, and pairing cost only 15% more than solo coding.
Now the two teams are ready for formal QA testing. They submit their software to the company’s QA testing group for formal testing.
The QA group finds about 25 defects in the traditional team’s code. The development team is unaware of these bugs, and must fix each one as the QA group discovers and reports it. This may take 3x the original coding time.
Ignoring TDD for the moment and taking the 85% reduction in defect rate that results from pairing, the QA group finds 4 defects in the agile team’s code. The agile team took 15% longer to produce the original code, and required a small amount of time to fix 4 defects. In total this may have been 2x the original coding time of the traditional team. In a nutshell, counting after-the-fact integration time, the traditional team has taken more than 2x as long as the agile team to deliver the same solution.
Note that a combination of agile techniques used together may have eliminated all defects before the software was submitted to the QA group. A clear definition of “done” and frequent feedback from customers complement practices like TDD and pairing to improve quality still further. It is not unusual for agile teams to deliver code to QA that has zero defects.
All of this ignores the effect of technical debt on the cost of a project and on the cost of supporting the resulting product. Because of incremental refactoring, TDD prevents the accumulation of code debt. That means modifications and support are much easier and cheaper for a product that is developed in this way than for a product developed in the traditional way.
The question becomes, Which is more costly: 15% additional time for the original coding, or 2.25 X the time-to-market, plus increased support costs over the production lifetime of the product?
Which team has really been more “productive?”
Dave,
thank you for your interesting links. I believe that all you say is true.
I was trying to make a different point, though. I tried to extract the effect of one practice with regards to the defect rate and the expected cost of a defect.
I include the worst-case up-front additional cost for introducing a new practice in the coding effort factor. I use a worst-case defect-rate improvement to extract a pi times thumb number for the worst case cost of a defect for which the practice would be cost efficient /just by looking at it’s impact on the defect rate/.
If it is already cost efficient when I look at the defect rate, then I don’t need to look further. The effects I ignore can only /improve/ the practice’s pay-off, this is why I ignore them – because I believe that in many environments a small impact on the defect rate alone improves productivity.
The interesting point you make is the idea of time-to-market. One could argue that this cannot possibly be included in the “worst-case” part of my crude model. On the other hand I don’t yet believe that you really “loose” time to market with a lower defect rate, since most probably the customer will find some of the errors and demands that you fix them before going life.
Cheers,
Manuel