Break Points

A Trillion Lines of Code?

Jack Ganssle

9/21/2008 4:05 PM EDT

An article on Dr. Dobb's site claims that in 1997 the Gartner Group estimated there were about 240 billion lines of Cobol running active business applications worldwide.

That's a lot of code.

If those quarter-trillion lines were written at a constant rate over the 40 year history (as of the study's 1997 date) of the language, which is hard to believe, that means developers cranked 6 billion lines of Cobol a year. Add to that all of the other Cobol apps that no longer exist; maybe the world has been producing 10 billion Cobol lines a year.

I have no idea how many Cobol programmers exist; certainly they're an increasingly-rare breed today. But to pick a number that seems wildly high, suppose for that 40 years a million Cobol developers were employed every year. That's 10k LOC/person/year, or 800/month, an unusually high productivity figure.

Cobol is wordy, but has roughly the same density as C " that is, programs with the same functionality will be about the same length in both languages. (See Backfiring: Converting Lines of Code to Function Points, by Capers Jones, IEEE Computer, November 1995).

So I started to wonder how much software is extant in all languages. Surely in the decade since the Gartner study the base of Cobol applications must have grown. And though Cobol might be the most popular business language, it's merely one of some 700 computer languages, some of which have huge constituencies, like C and C++.

How much PC code exists? Or non-Cobol business, government and military code? I have no idea.

What about embedded? About 118k embedded projects start each year in the US. Maybe double that for the world-wide figure. Multiply that by nearly 40 years of embedded history, cut it in half to account for a lower figure in the early years, and it appears some 5 million embedded projects have been built.

It's unreasonable to talk about "average" sizes of embedded programs as they span the gamut of a few hundred LOC for many tiny apps to the increasingly-common multi-million line products. But, for the sake of playing with the math, we assume there are a trillion LOC of firmware, then that means each of those 5m embedded apps has 200,000 lines. That's certainly high but it's probably within an order of magnitude or better.

Combine Cobol, PC code, web Java, firmware and all the rest it's not unreasonable to assume there's a trillion lines of code mediating the electronic hum that powers the world. That's truly a staggering number. A single one-million line app is baffling in its complexity, but a trillion is a million millions, something beyond any of us to comprehend.

The news this week is full of the cost of nationalizing the financial system ah, I mean, stabilizing the economy. People far smarter than me at economics put around a trillion dollars, more or less.

A trillion here, a trillion there, pretty soon you're talking about real money, and a really, really, big software base.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com. His website is www.ganssle.com





vocaro

9/22/2008 10:42 AM EDT

Typo:

"Has anyone counted how much code [[we've]] produced?"

Sign in to Reply



krwada

9/22/2008 12:43 PM EDT

billions and billions???
And to think ... most of those lines are redundant too!

Sign in to Reply



BobDJr

9/24/2008 3:16 PM EDT

Yeah, and the most redundant lines are probably something silly like "i++" or "if(i>0)"! They must account for at least 20% of all the code out there. Sometimes don't you wish we could just get _rid_ of them all? Wouldn't it be great if there was just one line of code that could do everything?!? ;^)

Sign in to Reply



speldrong

9/25/2008 12:34 AM EDT

A trillion lines of code?

That's what I wrote last week.

At least it feels like it...

;o)

Sign in to Reply



Arcs_n_Sparks

9/25/2008 12:39 AM EDT

This is like saying the M.E.s have assembled 10^27 atoms of iron in a particular construction project. Not a very good measure of what is useful to society.

Sign in to Reply



Tippers

9/25/2008 5:46 AM EDT

@Arcs_n_Sparks:
The difference is, the M.E.s can get an awful lot of those atoms wrong (wrong place, wrong element even) and no-one will notice. But just one wrong line of code and the whole thing collapses - sometimes.

@Jack:
About 118k embedded projects start each year in the US. Maybe double that for the world-wide figure.
Really? Do you have anything that backs up the assertion that half the embedded projects in the world are started in the US? Despite the output of places like Japan, Germany, France, even the UK? I find it hard to believe, but if you show me the numbers, I'll accept it.
And possible a more important question (also very relevant to the numbers) is how many of those projects complete...

Paul.

Sign in to Reply



g01d4

9/25/2008 11:47 AM EDT

"which is hard to believe"

It is and I don't, at least w/o some clue as to their methodology. For that matter where did the 118K (+/- 500 is pretty accurate) figure come from?

Sign in to Reply



DKC

6/10/2010 9:27 PM EDT

Since a lot of code is copy-&-pasted I'd be interested to know how much of the code is actually the same and/or has the same functionality.

I have a theory that the code required to run systems actually grows logarithmically rather than exponentially (and system data grows exponentially), and part of the reasoning is that a lot of code (and good code) is reusable.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Jobs sponsored by

Feedback Form