Design Article
Back to the Basics - Practical Embedded Coding Tips: Part 1
Jakob Engblom
4/19/2009 12:16 PM EDT
By carefully controlling how data is shared, we create reentrant functions, those that allow multiple concurrent invocations that do not interfere with each other. The word "pure" is sometimes used interchangeably with "reentrant."
Reentrancy was originally invented for mainframes, in the days when memory was a valuable commodity. System operators noticed that a dozen or hundreds of identical copies of a few big programs would be in the computer's memory array at any time. At the University of Maryland, my old hacking grounds, the monster Univac 1108 had one of the early reentrant FORTRAN compilers.
It burned up a breathtaking (for those days) 32 kW of system memory, but being reentrant, it required only 32 k even if 50 users were running it. Everyone executed the same code, from the same set of addresses. Each person had his or her own data area, yet everyone running the compiler quite literally executed identical code. As the operating system changed contexts from user to user it swapped data areas so one person's work didn't affect any other. Share the code, but not the data.
In the embedded world a routine must satisfy the following conditions to be reentrant:
Rule # 1. It uses all shared
variables in an atomic way, unless each is allocated to a specific
instance of the function.
Rule # 2. It does not call
nonreentrant functions.
Rule 3. It does not use the
hardware in a nonatomic way.
Atomic Variables
Both the first and last rules use the word "atomic," which comes from
the Greek word meaning "indivisible." In the computer world "atomic"
means an operation that cannot be interrupted. Consider the assembly
language instruction:
mov ax,bx
Since nothing short of a reset can stop or interrupt this instruction it's atomic. It will start and complete without any interference from other tasks or interrupts. The first part of Rule #1 requires the atomic use of shared variables. Suppose two functions each share the global variable "foobar." Function A contains:
temp=foobar;
temp+=1;
foobar=temp;
This code is not reentrant, because foobar is used nonatomically. That is, it takes three statements to change its value, not one. The foobar handling is not indivisible; an interrupt can come between these statements, switch context to the other function, which then may also try and change foobar.
Clearly there's a conflict, foobar will wind up with an incorrect value, the autopilot will crash, and hundreds of screaming people will wonder, "Why didn't they teach those developers about reentrancy?" Suppose, instead, function A looks like:
foobar+=1;
Now the operation is atomic, an interrupt will not suspend processing with foobar in a partially changed state, so the routine is reentrant.
Except . . . do you really know what your C compiler generates? On an x86 processor the code might look like:
movax,[foobar]
incax
mov[foobar],ax
which is clearly not atomic, and so not reentrant. The atomic version is:
inc[foobar]
The moral is to be wary of the compiler; assume it generates atomic code and you may find 60 Minutes knocking at your door.
The second part of the first reentrancy rule reads " . . . unless each is allocated to a specific instance of the function." This is an exception to the atomic rule that skirts the issue of shared variables.
An instance is a path through the code. There's no reason a single function can't be called from many other places. In a multitasking environment it's quite possible that several copies of the function may indeed be executing concurrently. (Suppose the routine is a driver that retrieves data from a queue; many different parts of the code may want queued data more or less simultaneously.) Each execution path is an "instance" of the code. Consider:
int
foo;
void some_function(void){
foo++;
}
foo is a global variable whose scope exists beyond that of the function. Even if no other routine uses foo, some_function can trash the variable if more than one instance of it runs at any time. C and C++ can save us from this peril. Use automatic variables. That is, declare foo inside of the function. Then, each instance of the routine will use a new version of foo created from the stack, as follows:
void
some_function(void){
int foo;
foo++;
}
Another option is to dynamically assign memory (using malloc), again so each incarnation uses a unique data area. The fundamental reentrancy problem is thus avoided, as it's impossible for multiple instances to stamp on a common version of the variable.




Rod Spade
4/20/2009 1:39 PM EDT
Beware that even a single assembly language instruction does not necessarily do an "atomic" read-modify-write cycle in a multiprocessor environment.
Sign in to Reply
LArt
4/21/2009 3:11 AM EDT
It can be even worse.
There are single core microcontrollers where an assembly instruction are non atomic when they operate on SFRs.
Example:
1. HW timer overflows and sets interrupt flag in SFR
2. ISR is fired and starts clearing interrupt flag
3. During 2 timer reports compare interrupt and sets another flag (in the same) SFR
4. Step 2 is finished
In step 2 SFR content is fetched, step 3 modifies SFR, step 4 also modifies SFR but it is ignoring what was changed in step 3.
As a result the interrupt flag "introduced" in step 3 is lost (then interrupt is also lost).
In such architectures there are usually special sequences for clearing (setting) flags in SFRs.
Regards,
Sign in to Reply
ffch1996
4/22/2009 3:11 AM EDT
Hi Jack,
This is a very good article. Could you explain or perhaps give an example about the following statement you made, mainly the meaning of "dangerous manner".
"The machine's context is changed, probably in a very dangerous manner."
Felix
P.S.
The most common approach is to disable interrupts during nonreentrant code. With interrupts off the system suddenly becomes a single-process environment. There will be no context switches. Disable interrupts, do the nonreentrant work, and then turn interrupts back on.
Most times this sort of code looks like:
long i;
void do_something(void){
disable_interrupts();
i+=0x1234;
enable_interrupts();
}
This solution does not work. If do_something() is a generic routine, perhaps called from many places, and is invoked with interrupts disabled, it returns after turning them back on. The machine's context is changed, probably in a very dangerous manner.
Sign in to Reply
Chris_Morton
1/19/2010 1:25 PM EST
For thread processing, even if you don't have a test and set instruction, if you have a variable that must be protected, then a test/set function can still be used if the rule is that the test/set function must be called before accessing the variable and a similar clear function must be called to clear it. Of course, it cannot be called from the interrupt level, since the interrupt level cannot wait for the main level process to clear it.
Sign in to Reply
SamurAchzar
1/23/2010 8:49 PM EST
@QL
He's bringing up the example of using the same program bit to serve multiple threads. The most natural implementation would be with dynamic memory allocation, creating a new, independent "context" for every such client.
I have no idea why did you bring up the pledge not to use the heap, on modern 32-bit MCUs (ARMs etc) there is no reason - and no justification - to avoid using the heap. The only reason not to use the heap is to avoid memory fragmentation, but good heap implementation and careful memory allocation planning will overcome that.
Preaching not using the heap at all is even worse than completely banning 'goto' statements from your program...
Sign in to Reply
SamurAchzar
1/24/2010 12:05 PM EST
@QL
While I do agree problems exists (as I specifically mentioned in my post), allow me to respectfully disagree with your reasoning:
"Dynamically allocating and freeing memory can fragment the heap over time to the point that the program crashes because of an inability to allocate more RAM. The total remaining heap storage might be more than adequate, but no single piece satisfies a specific malloc() request."
As I already said, with careful planning and a good heap implementation, that should not be a problem. If you are using 60KB out of the 64KB of your processor, it will be very sensitive for heap problems, but if you have reasonable headroom, that is not really a concern. Many embedded systems today use ARM9 class MCUs with external, cheap DRAM so that is less of a concern.
Even on internal memory targets, using small SRAMs, If you manage your allocation well, you're less likely to run into such problems.
Furthermore, coalescing heap implementations will "heal" the heap to some extent, by combining adjacent released blocks into a continuous space.
"Heap-based memory management is wasteful. All heap management algorithms must maintain some form of header information for each block allocated. At the very least, this information includes the size of the block. For example, if the header causes a four-byte overhead, then a four-byte allocation requires at least eight bytes, so only 50 percent of the allocated memory is usable to the application. Because of these overheads and the aforementioned fragmentation, determining the minimum size of the heap is difficult. Even if you were to know the worst-case mix of objects simultaneously allocated on the heap (which you typically dont), the required heap storage is much more than a simple sum of the object sizes. As a result, the only practical way to make the heap more reliable is to massively oversize it. "
I find the usual heap overhead (4-8 bytes per block, plus some global management structures) to be pretty negligible. I wouldn't advise to use the heap for allocating receive buffers on runtime, for example, as this will create excess overhead and fragmentation. Use a buffer pool for that. Yet I no longer see the point in making 8 byte considerations in the day and age of 32-bit, $1 MCUs.
Now, when you consider the alternatives, of leaving around static memory which you just can't use because it's pre-allocated for something that might never even run under your usage scenario, the inefficiency inherent to the heap seems pretty attractive.
"Both malloc() and free() can be (and often are) nondeterministic, meaning that they potentially can take a long (hard to quantify) time to execute, which conflicts squarely with real-time constraints. Although many RTOSs have heap management algorithms with bounded, or even deterministic performance, they dont necessarily handle multiple small allocations efficiently."
More often than not, deterministic performance is unnecessary, and is a legacy of old time embedded guys who think everything they touch is "timing critical". If you need a completely deterministic, single-purpose operation, like a flight stabilization computer, use static allocations exclusively. If you have a consumer device with some degree of flexibility (user interface, etc), you want to use the heap.
"Unfortunately, the list of heap problems doesnt stop there. A new class of problems appears when you use heap in a multithreaded environment. The heap becomes a shared resource and consequently causes all the headaches associated with resource sharing, so the list goes on:
Both malloc() and free() can be (and often are) non-reentrant; that is, they cannot be safely called simultaneously from multiple threads of execution.
The reentrancy problem can be remedied by protecting malloc(), free(), realloc(), and so on internally with a mutex, which lets only one thread at a time access the shared heap. However, this scheme could cause excessive blocking of threads (especially if memory management is nondeterministic) and can significantly reduce parallelism. Mutexes can also be subject to priority inversion. Naturally, the heap management functions protected by a mutex are not available to interrupt service routines (ISRs) because ISRs cannot block."
While what you're saying is perfectly true, I fail to see your point, as every kind of statically allocated, runtime manageable buffer structure (buffer pools, etc) would require the same measures to synchronize.
"If you destroy all pointers to an object and fail to free it or you simply leave objects lying about well past their useful lifetimes, you create a memory leak. If you leak enough memory, your storage allocation eventually fails.
Conversely, if you free a heap object but the rest of the program still believes that pointers to the object remain valid, you have created dangling pointers. If you dereference such a dangling pointer to access the recycled object (which by that time might be already allocated to somebody else), your application can crash."
You're talking programmer incompetence here. I can say, on the other hand, that the heap allows you cheap and widespread sanity checks for detecting memory overruns (such as a block magic, checked on each allocation and free).
Like everything else in our limited, resource constrained and often backwards world, which is fed from the technological leftovers of other fields, heap use should be made with care and common sense. There's no one to clean up the mess for you. But in the *specific* case the original article author mentioned, I see no other reasonable solution than using the heap.
Sign in to Reply
Lundin
1/25/2010 3:02 AM EST
Regarding multi-threaded apps (which the article was about), it is not wise to use any kind of C library function, as such functions of any must be regarded as non-reentrant. If you don't regard them like that, your code is non-portable [bad]. From a multi-thread point-of-view, you therefore shouldn't use malloc/free simply because you must put a mutex outside of the function, and thereby occupy a lot of (non-deterministic) execution time.
With your own implementation using static buffers, you can put the mutex on a better spot, and possibly also make the mutex private and hide it away from the main application.
Your main resource concern when developing modern embedded systems should be execution time, or possibly program size. Modern computers have lots and lots of RAM memory. Sacrificing hundreds of CPU cycles to save some bytes of RAM... that must be a rare case nowadays. The situation is rather the opposite: as an example, an experienced programmer will want to overdimension their stack, sacrificing lots of RAM just to make sure you never hit the stack bottom. And if you find yourself low on RAM, it usually means you have picked the wrong MCU/MPU for your project.
Sign in to Reply
willc2010
1/27/2010 11:47 PM EST
"The moral is to be wary of the compiler"
You should be able to declare that a variable (or a type) is atomic, and know that accesses and updates will conform to certain rules. If this cannot be implemented for that type on a given architecture, then the program should not compile. This reduces portability somewhat, but at least you don't have to "be wary of the compiler". There should be alternative more general mechanisms built into the language as well.
"...talk to the compiler vendor to ensure that the entire runtime package is pure."
It is absurd that you would have to do this. Packages should be declared to be pure, or not. If they are declared to be pure (either predefined packages or routines, or your own) then the compiler should ensure that they really are, including the requirement that everything that they call is also pure.
Dynamic memory allocation tends to be particularly heavily used in many C programs because the semantics of the language do not allow for such basic operations as constructing an array (e.g. a string) and returning it from a function. Instead you have to allocate a buffer to hold the new array and return a pointer to it. Similarly if you want a working array within a function, and the size of that array depends on the arguments, then you have to allocate it and free it yourself. This leads to extra work as well as particularly bad fragmentation.
I don't agree with the attitude that a truly great programmer can use anything, and anybody who forgets to do a 'free' or uses a dangling pointer is simply incompetent. It is a fact of life that even generally competent people do make these mistakes in complex systems, and the C language both requires a lot of unnecessary hard, detailed work, and at the same time makes it exceptionally difficult to protect against error. The real moral is not to insist on persisting with a language that is so clearly inappropriate for the task of writing reliable software.
Sign in to Reply
Lundin
1/28/2010 10:06 AM EST
"I don't agree with the attitude that a truly great programmer can use anything, and anybody who forgets to do a 'free' or uses a dangling pointer is simply incompetent."
While this is true, a programmer who isn't insisting on using a static analyzer to detect these faults caused by human nature, could perhaps be regarded as incompetent. The moral of the story is rather: use a static analyzer for the task of writing reliable software.
Sign in to Reply
willc2010
1/28/2010 7:47 PM EST
Lundin,
I agree. The more static analysis the better. It is much better to catch bugs during design and build than at run time. Put in those terms everyone agrees with this. But it is undeniable that some languages are much more amenable to static analysis than others. MISRA-C is better than nothing, but no amount of MISRAing is going to make up for a fundamentally defective type system, or an assembly-level semantic model, or a crude substitute for a package mechanism. C is a big step backwards even from the state of the art in the 1970s. Or put it this way: even straight Ada, just by the language definition, can give you significantly better static analysis than you can achieve with MISRA-C.
Sign in to Reply