Design Article

Software for dependable systems

Jack Ganssle

11/1/2009 12:00 AM EDT

The average developer reads one trade journal a month and but a single technical book a year. This implies that most of us feel the industry isn't changing much or that we don't care to stay abreast of latest developments. I shudder to think the latter and know for a fact the former just isn't true.

Another excuse is that too many tomes are so dry that even a gallon of Starbuck's most potent brew won't keep the eyes propped open. And sometimes wading through hundreds of pages yields very little new insight.

But recently I came across one of the most thought-provoking software books I've read in a long time. At 131 pages, it's not too long, and a PDF is available on the web (www.nap.edu/catalog.php?record_id=11923). Titled Software for Dependable Systems--Sufficient Evidence?, the book was edited by Daniel Jackson, Martyn Thomas, and Lynettte Millett and is the product of the Committee on Certifiably Dependable Software Systems. Not only is this volume utterly fascinating, it's incredibly well-written. I had trouble putting it down.

I often rant about poor quality code haunting us and our customers. Yet software is one of the most perfect things humans have created. Firmware, once shipped, averages a few bugs per thousand lines of code. But software is also one of the most complex and fragile of all human creations. In school, and in most aspects of life, a 90% is an A. In software, a 99.9% is, or may be, an utter disaster.

Perfection is not a human trait. Software is made by people. So can software ever be perfect?

Maybe not. But there's no question it has to meet standards never before achieved by Homo sapiens. Software size has followed a trajectory parallel to Moore's Law, and complexity grows faster than size. The projects we'll be cranking out in even a decade will dwarf anything made today and in many cases will be in charge of much more dangerous systems than now. Think steer-by-wire in hundreds of millions of cars, or even autonomous driving down Route 95.

Software for Dependable Systems tackles the question of how can we know if a system, and in particular the software, is dependable? When we let code loose on an unsuspecting public, how much assurance can we offer that the stuff will run correctly, all of the time?

Currently, engineers building safety-critical applications typically use standards such as DO-178B or IEC 61508 to guide the development process. These are prescriptive approaches that mandate certain aspects of how the software gets built. For instance, at the highest level of DO-178B MC/DC (Modified Condition/ Decision Coverage) testing is required. MC/DC hopes to ensure that the code is totally tested. Seems like a great idea, but there's little evidence about how effective it really is.

So why has software for avionics been so successful? Some believe that the safety culture engendered by companies employing very detailed and difficult processes leads to a company-wide intense focus on making things right.

The agile community promotes people over process. Most certification standards take the opposite tack. Software for Dependable Systems stresses the importance of both process and people. But the book goes further and expresses the conviction that software will always be of unknown quality--which is scary for a safety-critical application--unless there's positive proof it is indeed correct.

The book makes a number of suggestions, all of which are valuable. But its most important message is a three-pronged strategy about evaluating the system. Note the use of the word "system;" continually stressed is the idea that software does not exist in isolation, it's part of a larger collection of components, both hardware and human. A program that functions perfectly is utterly flawed if it demands superhuman performance from the user--or even human-level performance in a high stress situation. I was reminded David Mindell's Digital Apollo, which describes how the spacecraft, ground controllers, and astronauts were a carefully designed single integrated system, and one of the biggest problems faced by the engineers was balancing the role of each of those components in the larger Apollo structure.


Next:




krwada

11/2/2009 12:53 PM EST

99.9% ???? That is way to high a figure! According to the Boeing Corporation, There were approximately 18 million flights in the year 2000. Of these, only 20 resulted in fatal accidents. Of the 20, I am most certain only a very few were due to controller failure.

Therefore, I would think something on the order of parts-per-million or parts-per-billions is a better metric. It is way way better to have folks not ever know what we do than otherwise ... in general, any publicity, in the general media, about embedded controllers is because of a failure.

You can see the statistics ... provided by Boeing ... here " target="_blank" style="">href="http://www.boeing.com/commercial/safety/pf/pf_howsafe.html">here

... or course The Boeing company will have a particular bias on the safety of flying.

Sign in to Reply



K1200LT Rider

11/5/2009 2:08 PM EST

krwada,

You're talking about the number of times a particular product (aircraft) has been used (flights). The 99.9% in the article is referring to the code in a given item (a particular model of aircraft, etc.). These numbers may be worlds apart (apples and oranges). :-)

Sign in to Reply



Scottish Martin

11/5/2009 6:10 PM EST

To set the record straight on DO-178B:

Aviation incidents caused by software systems are thankfully rare, even given the exponential growth of software density in aircraft systems – perhaps this is a consequence of prescriptive standards (guidelines) such as DO-178B, and their enforcement by certification authorities (e.g. FAA Designated Engineering Representatives). A major criticism here of the book ‘Software for Dependable Systems’ is it is not prescriptive enough!

DO-178B is far from perfect, but it has known strengths, such as its bias toward Requirements and their satisfaction. The guideline contains more than 60 objectives. Many people concentrate on a select number of Code based objectives (e.g. MC/DC coverage) – but this is a gross misinterpretation and misrepresentation of DO-178B.

DO-178B is limited in scope, and it fails to address the holistic System aspects of safety. Still, this article falls into a similar trap, with too much emphasis latterly on coding languages (e.g. Ada and SPARK). Note: systems containing perfectly functioning Code have contributed to fatal accidents (e.g. Cali, Colombia).

In terms of safety, gazing at the Code is analogous to organizing the deckchairs on the Titanic as you steam toward the ‘Requirements Iceberg’ – at least it keeps you occupied.

The most pertinent statement in the article is, “Finally, expertise is demanded.” Competence and professionalism should not be implicit or assumed when the systems are high-dependability. Interestingly, standard IEC 61508 provides guidance on competence assessment.

For interested readers, there is a DO-178B group on LinkedIn

Sign in to Reply



Lundin

11/10/2009 11:18 AM EST

Halfways through this book, I don't find it particulary thought-provoking or revolutionary. If you are like me, namely a software engineer with a bit of experience in designing safety-critical systems, you will find a lot of preaching to the choir. At least the book satisfies my ego...

The things preached such as modular design to avoid "tight coupling", simplicity etc are taught for any decent computer engineering degree, even if you don't specialize in safety-critical systems. That most of the accidents happen because of poor specifications is well-known as well. Or so I thought...

I like that they emphasis to regard the user of the software as a system component, there are plenty of standard bureaucrats preaching the opposite: ie if you leave supervising of the system to a human, it ain't your problem any longer.

As comment to this article, note that the book explicitly advises against C as whole, but it also explicitly recommends the subset MISRA-C. So if you have a C application, there is no need to rush off and learn SPARK. Instead, introduce MISRA-C, if you aren't already following it. It is very much possible to test MISRA-C with a static analyzer: in fact MISRA-C enforces the use of a static analyzer.

---

While I have never worked with DO-178B and can't comment it, I am very sceptical against vauge "functional saftety" standards such as IEC 61508. I have seen several cases where a manufacturer claim SIL 3 level of their system, with notified body approval and everything. And then we tried to short-circuit one of the emergency stop relays, and the whole emergency stop function was disabled. Apparently this utterly fundamental error was unnoticed by the "SIL 3" system itself, the pile of papers produced by 61508, and by the notified body.

The spectacular is that I've seen the same on two different SIL 3 systems by different manufacturers, certified by two different notified bodies. One notified body was a particulary famous 3-letter german test house, so I don't think less serious test houses is the cause. Apparently there was no real evidence as preached by the book, but only false evidence in the form of the pile with irrelevant papers required by 61508.

But at least 61508 keeps a lot of standard-writers, bureaucrats and "non-technical engineers" busy, while steaming towards the very same iceberg that was already mentioned.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form