Design Article

Taming software complexity

Jack Ganssle

3/3/2008 5:15 PM EST

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

Folk wisdom, as reflected in Brian Kernighan's clever riff on clever code, is that insight we all know but generally ignore. But how can we measure "cleverness?" Supreme Court Justice Potter Stewart, trying to characterize pornography, memorably lamented that it somewhat defies definition, but "I know it when I see it." Likewise, most of us can sense excessively clever code but may disagree when examining specific code chunks.

We do know complicated functions are tough to understand, an agony to debug, and usually require excessive amounts of maintenance.

One of the tragedies of software engineering is that, in industry at least, we've successfully avoided the metrics-based quality revolution that transformed manufacturing. Few companies know their pretest bug rates in, say, defects per line of code. Or productivity in lines of code per hour. Or which are the worst functions in a project. In an experiment I conducted surreptitiously last year, only a third of correspondents could answer the seemingly simple question "how big is your code" in any metric they wanted to use (MB, lines of code, lines they developed, value in dollars, or, heck, the weight of the listings).

A management mantra states "if you can't measure it, you can't manage it." Though surely true for building widgets, it's a lot harder to apply to engineering. But some objective measures provide valuable insight into the art of developing firmware. Since we know complex functions are problematic, it seems logical that we measure and control complexity in functions.

And it turns out that's pretty easy to do.

But first, there are two kinds of complexity: essential, which is a quality about the problem being solved, and incidental, which is the complexity introduced by our engineering approaches. Here we're talking about incidental complexity, as that's the only kind we as developers can manage.

Thirty-one years ago, Thomas McCabe proposed a metric called "cyclomatic complexity" to attach a quantitative measure to a function's complexity. It's one of many such measures but is by far the one most used. It measures the number of "edges" and "nodes" in a function, where a node is a computational statement and an edge represents the transfer of control between nodes. Cyclomatic complexity is then:

v(G)=edgesnodes+2

Note that many different formulations of this equation are in the literature; this one is the simplified calculation that's usually used and was given in McCabe's original paper ("A Complexity Measure," Thomas J. McCabe, IEEE Transactions on Software Engineering, Volume SE-2, No. 4, December 1976, p 308-320).


Next:




like_hawk

3/5/2008 4:46 AM EST

good chapter.

Sign in to Reply



fabby

3/5/2008 10:05 PM EST

I am fascinated by the modern programming languages and their complexity. Having mastered Basic and flow charts nearly 30 years ago and saved my labor to cassette, I am astounded at the sheer size of operating systems. The OS on a Mac was about 1.3Mb at that was one of the original 'windows' type system.

My point is - has programming complexity been driven to an extent by the complexity of the language in which it is written and not just the demands of the end users for more fully featured software?

Guy

Sign in to Reply



BSweet

3/6/2008 8:10 AM EST

Jack, Thanks for yet another great article!

The "edges" and "nodes" definition was always too complex for me ... ;o)

I think that a MUCH easier formula for Cyclomatic Complexity is:
v(G) = Number of Predicate Nodes + 1 ; where a "Predicate Node" is a decision or branch point in the code (If, Else If, Case, For, Do While, Do Until)

Note that the default "Else" and "Default" case are NOT counted as predicate nodes since there is no decision made for "default" action.

Example:
if (condition 1) {...}
else if (condition 2) {...}
else {...}

This example has two predicate (decision) nodes, and so has a Cyclomatic Complexity, v(G) = 2 + 1 = 3.

This means that the MINIMUM number of test cases to exercise all possible paths through the routine is 3.


~Ben

Sign in to Reply



CGates

3/6/2008 12:47 PM EST

Jack,

You're articles are always the most thought provoking; no wonder I always read them first before looking at anything else at Embedded.

I know right now I have at least 3 tools installed on my development PC that will compute v(G), so why is it that I never use it?

Well I think your article says it all, but rather indirectly...

You state that the index is not a measure of the code's "goodness", but I believe that McCabe's index is not a usable figure to establish ANYTHING about the code in question. Assuming complexity is equal to "bad code" is a little too simplistic.

It ranks Apache as more complex then intentionally bad code. I would guess that Apache is probably serving up the web page that this article is on, and most probably does so billions of times a day all over the world, with an error rate that results in up times measured out to several places. (i.e. 99.999999%, etc).

It tells me that I should waste hundreds, if not thousands, of "white box" tests on routines that are not causing a problem in production code that is being exercised and tested constantly all over the world, in real world scenarios, far more accurately than any Verification test suite could ever possibly do.

I doubt that even a pure transactional application would benefit from this tool, but in the world of embedded software, where there are timing constraints, program size constraints, data size constraints, synchronous and asynchronous tasks, communications, direct hardware control and assembly language (see Jack’s previous column) this index would be of no value.

IMHO that is why this index has been in existence for 31 years and it is rare to actually find Engineers who use it (even though most Embedded Software Engineers are obsessed with the holy grail of "code quality" tools). If it worked, we would be using it!

And lastly the old adage that "you can’t test in quality" applies here, even if v(G) actually told me something about the quality of the code it is way too late, as the code has already been written and this should have been addressed in the coding standards and design phase.

Sign in to Reply



Swampie

3/6/2008 12:59 PM EST

Jack;
I have enjoyed your column for years!

You captured the essence of the problem. Human nature is however against you. Think of the IRS tax rules. When they started out, they were simple vs. the 50,000+ pages now. Anytime anything stays around for long that has a human interaction associates with it, then complexity is guaranteed to rush in and stay,-take over, cross pollinate, and eventually get government funded. It does not matter if you have a
routine 10,000 lines long, or 10,000 routines one line long. The level of complexity comes with the SIZE. If your project grows to doing a billion things, it's going to be complicated no matter how you program it. The other problem is as humans, we have 5 or more thinking styles, never to converge. When a person with thinking style #1 programs for 5 years and is succeeded by a person with style #2, person #2 is going to make changes for their own ease of working that could be completely out of scope of what #1 intended.

Measuring complexity is just the statement of the problem. It is not the answer. Personal preferences ( read : territory marked out with much urine) will always be a battle because nobody wants to work under the constraints of somebody elses thinking style


thanks,

The swampalator

Sign in to Reply



rfc

3/6/2008 1:54 PM EST

I have to admit to not using metrics very often...
But if Borland or Microsoft would build them into their IDE -- running all the time in the background as I code -- not that would be helpful.

Sign in to Reply



Ray Keefe

1/7/2009 5:46 PM EST

Hi Jack,

always a hot topic in my experience. How do we really know the code always works, and what does it take to find out.

RSM now allow 20 files in their free version. We use both RSM and PC-Lint as both are inexpensive and give useful insight into problem areas. I would be confident in predicting most people will get their money back on time saved in test and debug on the first project they apply these tools to.

I also find that exception handling is a big topic. It is one thing for individual functions or even module to detect exceptions and let the caller know somethign went wrong, but this is a long way from handling the exception so the system's overall response deals with the problem adequately. We do a lot of smaller projects and often are doing just a module or 2 in a larger system. So we can write our module so it returns a meaningful status to each of the calls, but that doesn't guarantee anything is done with this information.

I also agree with The Swampalator. To fully document somethign so it is maintainable is very hard. I come back to what I considered well documented code of my own from 5 years ago and it takes a while to get back into the thinking framework I was in when I wrote it. And that's me thinking about my own code! And of course, hopefully we are all growing as developers and so our new thinking should be an improvement on our old thinking. Documenting something so it makes sense to someone whose personality, cultural background and thinking styles are very different to my own is always going to fall short of the mark with the tools and methods used today.

Vernor Vinge touched on this point in his book "A deepness in the sky". One of the main characters is reviewing and sometimes debugging code written over a period of 5000 years. Vernor obviously understands the issues and although it is a good read as a story, the technical insights in the book are fascinating from a programmers perspective.

Thanks for another great topic Jack,

Ray Keefe
Melbourne

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)