Design Article
Lean coding
Jack Ganssle
12/1/2008 12:00 AM EST
What's the cheapest way to get rid of bugs? Don't put them in in the first place.
That seemingly trite statement is the idea behind the entire quality revolution that reinvented manufacturing during the 1970s. Design quality in rather than try to fix a lot of problems during production.
Most people under 40 have no memory of the quality problems U.S. automotive vendors inflicted on their customers for many years. I remember my folks buying cars in the 1960s. With five kids on a single-income engineer's salary, my dad's primary decision parameters (mom was never consulted on such a purchase) were size (a big station wagon) and price. Choices were mostly limited to the Big Three. With the exception of the even-then ubiquitous VW Beetle, foreign manufacturers had made few inroads into this market. But Detroit's offerings were always plagued with problems, from small nuisance issues to major drivetrain troubles. Consumers had no recourse since all of the vendors offered the same poor quality. Perhaps foreshadowing today's low expectations about commercial software, car buyers 40 years ago accepted the fact that vehicles were full of problems, and many trips to the dealer to get these cleared up was simply part of the process of acquiring a new car.
About the same time, Japanese products had a well-deserved bad reputation. "Made in Japan" and "junk" were synonymous. But Japanese managers became a student of quality guru W. Edwards Deming, who showed how a single-minded focus on cost at the expense of quality was suicidal. They eventually shifted production to a low-waste system with an unyielding focus on designing quality in. The result: better autos at a lower cost. Detroit couldn't compete. (Of course, many other factors contributed to the U.S. firms' 1970s' woes. But cash-strapped American buyers found the lure of lower-cost high-quality foreign cars irresistible.)
U.S. vendors scrambled to compete using, at first, marketing rather than substance. "Quality is Job One" became Ford's tagline in 1975. Buyers continued to flock to less self-aggrandizing manufacturers who spoke softly but carried few defects. But by the very early 1980s, Ford was spewing red ink at an unprecedented rate. A division quality manager hired Deming to bring the Japanese miracle to Detroit. Eventually the quality movement percolated throughout the automotive industry, and today it might be hard to find much of a difference in fit and finish between any manufacturer. Tellingly, Ford abandoned "Quality is Job One" as a mantra in 1998. The products demonstrated their success and marketing slight of hand was no longer needed to dodge an inconvenient truth.
Lean manufacturing perhaps got its name from a 1989 book (Lean Thinking by James Womack and Daniel Jones) but its roots trace back to at least Ben Franklin and later to Henry Ford.1 Waste means there's a problem in the process, whether the waste is from rework (errors) or practices that lead to full garbage pails. Wastage is a sure indicator that something is wrong with any process. And it's an equally vital red flag that a software development group is doing something wrong.
For some reason, the lean revolution by and large hasn't made it into software engineering. Bugs plague our efforts, and are as expected as any other work product. Most projects get bogged down in a desperate debugging phase that can consume half the schedule. I guess that means we can call the design and coding line items the "bugging" phase.
When in 1796 Edward Jenner rubbed cowpox on eight-year-old James Phipps' arms, he wasn't fixing a fever; the boy was perfectly healthy. Rather, Jenner knew some 60% of the population was likely to get smallpox and so was looking for a way to prevent the disease before it occurred. That idea was revolutionary in a time when illness was poorly understood and often attributed to vapors or other imaginary effects of devils or magic.
The pre-Jenner approach parallels software engineering with striking similarity. The infection is there; with enough heroics it might be possible to save the patient, but the toll on both the victim and doctor leaves both weakened. Jenner taught us to anticipate and eliminate sickness. Lean manufacturing and the quality movement showed that defects indicate a problem with the process rather than the product. Clearly, if we can minimize waste the system will be delivered faster and with higher quality.
In other words, cut bugging to shorten debugging. The best tool we have to reduce bugging is the code inspection.
Over the last 10 years, I've mentioned code inspections in passing in this column some 33 times, yet haven't written anything substantive about them since August, 1998.2 Many of our readers were still in high school back then!




jeremybennett
12/1/2008 3:57 AM EST
Good article, but code inspections are just one tool in the armoury. The problem is that good code inspection is phenomenally resource intensive and very tedious. The latter is the killer - software engineers are human beings, and there's only so much tedium you can force them through. My experience is that they can only be used sparingly - which means inevitably AFTER a crisis - thereby negating the point of designing quality in.
A technique that does work is buddy programming - effectively code inspection on-the-fly. That too is very demanding (I tend to believe the statistics that 5 hours/day is the most anyone can take), but it does produce very good code. Trouble is only a relatively small proportion of programmers are temperamentally suited to buddy programming.
So in practice we fall back on the one technique that seems to work, which is to write the tests before the code. That's a task where the reviewing process is tractable. Combined with tracking tools like Bugzilla, a rigorous regression testing environment (tools like DejaGNU can help), and a strict requirement that the tester and coder are different people, this can deliver good code efficiently.
It has the added merit that there are at least two engineers who understand every part of the code. Essential protection against one being run over by a bus (or just getting a new job).
This is not new or rocket science. The basic idea is in "The Mythical Man-Month" (Fred Brooks Jr, 1975), based on experience of software engineering in the 1960s. It's taught in every University program. Yet I am always amazed at how many organizations get this basic software engineering wrong.
Jeremy
Sign in to Reply
Tippers
12/4/2008 4:55 AM EST
I guess I'm pre-empting the next article here, but the acceptability of reviews can depend very much on their form. The very intensive kind (Fagan reviews) involve several people and several hours, and they do indeed find a lot of bugs (assuming they are there to find), but it takes a brave manager to try and educate the non-softie management to use these. In a previous job, we did most of our reviews very simply: one engineer writes the code to the specification, another one reviews it, alone, against its specification. We found that the act of writing the code against the spec often picked up errors in the spec - these were always fixed before the code was released for review. Then the single engineer review found most of the simpler bugs in the code. We tried to make it so that the more experienced engineers reviewed the work of the less experienced engineers, as this worked well as additional training. And reviewing the other way round (less experienced engineers reviewing the more experienced engineer's work) also works as training, although quite often that required a bit of dialogue to explain why what was written was correct, and would work.
Any changes requested during the review were implemented by the original author, and re-reviewed. Nothing went out without a "pass" or a "will do" on the review. The three possible outcomes for a review were "pass" (everything is fine), "will do" (there are some problems with comments or layout or headings, but the code is ok), or "fail" (there's a problem with the code). The middle category was introduced to expedite code that worked, but might have maintenance issues. If a module failed, all the "will do"s were also fixed.
Once reviewed, the code was also tested on an ICE, using an auto-generated harness and a set of tests created by auto-combining input data values to give 100% MC-DC coverage - but our modules were typically only 100 lines of assembler, with only a few up to 500 lines.
The testing we did may not apply so well to today's systems, but the reviewing certainly would - I heartily recommend it!
Sign in to Reply
xorbit
12/9/2008 11:50 AM EST
This article makes a very strong case for open source software. Major open source projects have thousands of eyeballs looking at the code, all reviewing it, whether intentional or not. The result is that bugs are found quickly and fixed quickly.
For example, you mention the famous Apache web server function with a complexity score of 725. While it might be a good idea to rewrite it to be simpler, it obviously hasn't broken in any major way. Why? Many people have looked at it, including yourself.
As part of a small development team, being able to use open source code in my projects is priceless. The code is better reviewed and tested than anything I could ever do myself. I also believe it is of better quality than what any commercial company can sell me. Developers at those companies have the exact same problems and pressures as we all do, and their managers are exactly as willing to sweep problems under the rug in order to have a product to ship.
Open source code has bugs too of course, but at least they can be found and fixed, so the next guy won't have the same problem. It is hard to find code that is more reviewed, and more tested in more different circumstances and applications than that.
Sign in to Reply
Larry Martin
12/15/2008 9:41 AM EST
There is no overall correct answer to this problem. There is probably a good solution for each group out there, but it's different for each group, task and budget.
In a former life, I taught Fagan inspections at NASA Langley. They work extremely well in that environment. The key is that the project or facility be overstaffed relative to a profit driven entity doing a project of similar complexity.
Years later, I tried to introduce some of the Fagan inspection's formality to my software team at a startup. It didn't last because we had a small, very senior, team that made fairly few mistakes. The reviews just weren't necessary.
On that same team, we had one guy who was sold on code-for-test. I "let" him run with it, and found that he spent more time than I think he should have spent, writing trivial tests for things he would probably have gotten right anyway. At integration time, his code ended up having one significant deviation from spec., in a place that he had not coded a test because it was "too complex." His tests in other parts of the code reinforced his deviation. We ended up changing the spec, and the code of other devices interfacing with his, to match his implementation, because changing his code-and-test complex would have been really expensive. So code-for-test is not necessarily a good deal either.
The best thing I've seen so far for embedded systems is something my company did for a Rabbit machine control client. We used a Gumstix to simulate the machine their Rabbit was supposed to control, then developed automated functional tests around that combination. The bug count went down by a factor of 10, and shifted from core implementation issues to spec issues and "corner cases." This approach is obviously not good enough for safety critical systems, but there are now about 100 machines out there that were developed this way, running ok.
Sign in to Reply