Break Points
A Million Lines of Code
Jack Ganssle
1/14/2008 12:41 PM EST
A million lines of code printed out would be 18,000 pages. That's a stack six feet tall (on typical 20 pound paper). Ironically, the listing weighs in at 180 pounds while the actual operating code is mass-free; it'll live in a fraction of a gram of silicon. Like DNA, code's human-readable description requires tremendously more mass than its actual instantiation.
A million lines of code is probably on the order of 20 million instructions, or 600 million bits. That's not far off of the 3 billions base pairs in human DNA. Unlike DNA, which has redundancies and so-called "junk" sequences, every single bit in the code must be perfect. A single error causes greater or lesser failure.
Since a typical atom is around 0.3 nm in diameter, if one had as many atoms lined up as the number of instructions needed for a million lines of code, they would stretch 10 cm. That many Ebola viruses would stretch 15 meters.
A million lines of code is as long as 14 copies of War And Peace, 25 of Ulysses, 63 copies of The Catcher in the Rye, or 66 copies of K&R's C Programming Language.
A million lines of code is not ten times more than 100,000. It's well-known that schedules grow faster than the code. Barry Boehm estimates the exponent is around 1.35 for embedded software. So the schedule for developing a million lines of code is 22 times bigger than for 100,000 LOC.
In the March, 1996 issue of Computer Watts Humphrey published crude rules of thumb for estimating software projects. Though hardly scientific, they do give a sense of scale. Using his estimates:
A million lines of code require 40,000 pages of external documentation.
A million lines of code will typically have 100,000 bugs pre-test. Best-in-class organizations will ship with around 1k bugs still lurking. The rest of us will do worse by an order of magnitude.
A million lines of code will occupy 67 people (including testers, tech writers, developers, etc) for 40 months, or 223 person-years. Darwin needed just 1.5 person-years to write The Origin of the Species. Scale that to the 26 copies equal in length of a million lines of code, and it appears writing code is some 6 times more time-consuming than writing a revolutionary scientific tome.
A million lines of code costs $20m to $40m. That's one or two 60s-era F-4 fighter jets (in today's dollars), a tenth of an F-22, a thousand cars or more (in America), nearly 20,000 Tata Nano cars, ten million gallons of gas, seven times the inflation-adjusted cost of the Eniac, and a million times the cost of the flash chips it lives in.
Think about that last analogy: A million times the cost of the flash chips. Yet accounting screams over each added penny in recurring costs, while chanting the dual mantras "software is free," and "hey, it's only a software change."
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com. His website is www.ganssle.com.





pteryx
1/17/2008 5:14 AM EST
MS has written 50 M lines of Vista code, but the functionality is not much better then WXP ... Most companies stick to WXP, because there are backward compatibility with software and drivers. In fact, the main reason for writing Vista is to prevent anyone from being able to create a competing windows compatible OS and put an end to the Microsoft monopoly. The more chaotic system they create, the better (for them).
So, the question is, wether the millions of lines are really needed. The same functionality is implemented and debugged again and again and again. This is reinventing the wheel. What is the positive effect for mankind?
Sign in to Reply
Tony Garland
1/17/2008 1:33 PM EST
"A million lines of code is probably on the order of 20 million instructions, or 600 million bits. That's not far off of the 3 billions base pairs in human DNA. Unlike DNA, which has redundancies and so-called "junk" sequences, every single bit in the code must be perfect. A single error causes greater or lesser failure."
The inference that the code in the average cell phone is closing in on the complexity of DNA, or that it has to be more precise than DNA--inferring even greater complexity--I find absurd. I would suggest that the real complexity of DNA is far beyond what we know today and that we'll find out that what we thought was "junk" is incomprehensibly sophisticated--which is why we remain clueless regarding its function.
We engineers are no doubt clever, but let's keep our feet on the ground :-)
Sign in to Reply
j.cline
1/17/2008 5:38 PM EST
We are lucky it is a million lines of C. What happens when it's a million lines of Java and the environment lines up to run garbage collection on the hundreds of thousands of mallocs all at once? The sound you hear is the screeching of silicon.
Sign in to Reply
BobDJr
1/18/2008 4:19 PM EST
Jack:
You wrote that "... every single bit in the code must be perfect. A single error causes greater or lesser failure."
My thinking is that there are some kinds of bit "errors" that don't have much effect on the overall operation of the system. For example, changes in bits in unused memory are tolerable under a single-failure mindset, which is to say that you assume that the processor never goes there if it shouldn't. I have seen flash memory bit failures that were only picked up by a continuous memory checksum/CRC self-test routine, and, other than that test reporting a failure, they did not affect the system's operation.
I also think that single bit errors in some kinds of data, like strings, might not do anything more than annoy you.
Sign in to Reply
theclapp
1/20/2008 4:38 PM EST
Just by the way, Darwin's book is titled _The Origin of Species_, not _The origin of *the* Species_. (Please feel free to update your blog and delete this comment, which adds nothing of real value. :)
Sign in to Reply
Tiger Joe
1/22/2008 6:45 PM EST
"A million lines of code is probably on the order of 20 million instructions, or 600 million bits. That's not far off of the 3 billions base pairs in human DNA. Unlike DNA, which has redundancies and so-called "junk" sequences, every single bit in the code must be perfect. A single error causes greater or lesser failure."
Obviously our creator was not a computer programmer otherwise every one of us would have birth defects. Who knows, maybe it was tried before arriving at the DNA solution? Or, if God created us in his image, then it is up to us to figure out a much better way to get programming done.
Sign in to Reply
Nkscorpion
1/24/2008 11:05 AM EST
I wander, what if there's a curious bug? And, how to make sure there's none? ...
Sign in to Reply
Code Monkey
11/7/2011 1:22 PM EST
We've created a monster.
Sign in to Reply