Programming Pointers

A sign of confusion

Dan Saks

2/8/2008 2:50 PM EST

In C and C++, the unusual nature of char leaves many programmers puzzled about when to use plain char in preference to an explicitly signed or unsigned char.

All of the integer types in C and C++ come in signed and unsigned variants. In all cases but one, the signed variant is the default. For instance, the type specifier int is short for signed int, and long int is short for signed long int. The exception is the char types.

The plain char type has the same representation and behavior as either signed char or unsigned char, but plain char is nonetheless a distinct type. For example, even with a compiler that implements plain char the same as signed char, the following pointer assignment is an error:

char *pc;
signed char *psc;
...
pc = psc;           // invalid conversion

Many compilers tolerate this conversion, but the language standards consider it to be an error.

The unusual nature of char--that it's distinct from its signed and unsigned cousins, but not completely so--leaves many programmers puzzled about when to use plain char in preference to an explicitly signed or unsigned char. Too often, programmers guess wrong, and find themselves compounding the error by using casts. The following letter from a reader typifies the problem:

Recently I faced a problem where I was using an object declared as:

signed char *ptr;

I tried to do something such as:

if (ptr[0] == 0xFF)

Using the debugger, I could see that ptr[0] always had the value 0xFF but the condition in the if-statement was always false. When I looked at the disassembled code, the register containing ptr[0] 's value showed 0xffffffff.

I solved the problem by casting ptr[0] to unsigned char. Though I got the expression to evaluate to true, I'm not quite sure how it works.

As I've explained in past columns1,2, using a cast is often an indication that you're doing something wrong. That's the case here.

Here's what's happening with that conditional expression. The left operand, ptr[0] , is a signed char. On a typical machine with 8-bit bytes and twos-complement arithmetic, a signed char has values in the range -128 to +127. If ptr[0] contains 0xFF, the decimal arithmetic value of ptr[0] is -1, not 255.

The right operand in the conditional expression, the literal 0xFF, is an int, or more precisely, a signed int.3 It's not a signed char. As a signed int, 0xFF has the value 255 (decimal).

According to the standard, when an expression compares a signed char with a signed int, the program promotes the signed char to signed int prior to doing the compare. The resulting signed int has the same value as the signed char, which in this case is -1. On a 32-bit twos-complement machine, -1 (decimal) is represented as 0xFFFFFFFF.

In short, ptr[0] is a signed char whose value is -1, and 0xFF is a signed integer whose value is 255, and their values are not equal.

The way to avoid such surprising behavior is to use objects and literals whose types can be combined safely without explicit conversions. For example, when you test the value of a plain char, you should compare it with another plain char or character constant, not with an int. For example, I'd replace:

signed char *ptr;
...
if (ptr[0] == 0xFF)

with:

char *ptr;
...
if (ptr[0] == '\xFF') 

The latter works correctly in C or C++ whether plain char is implemented as signed or unsigned.

In truth, character literals such as '\xFF' have type int in C. In C, the conditional expression in:

if (ptr[0] == '\xFF') 

actually compares a plain char to an int. The compiler promotes the left operand to int to match the right operand. Nonetheless, the comparison works correctly in C without casting because the compiler uses the same rule to promote a plain char to an int that it uses to obtain the integer value of a character literal.

Endnotes:
1. Saks, Dan, "Cast with caution," Embedded Systems Design, July 2006, p. 15.
Back
2. Saks, Dan, "A case study in portability," Embedded.com, November 2007.
Back
3. Saks, Dan, "Numeric Literals," Embedded Systems Programming, September 2000, p. 113.
Back





Theckyam

2/8/2008 8:17 PM EST

I strongly believe that typecasting is the greatest evil in software programming !
For the given example, does the compiler not generate a warning (in best case with -Wall and worst case with -ansi or -pedantic )? Aren't we supposed to clarify all the warnings before plunging into debugging?

Sign in to Reply



markjmeyer

2/13/2008 8:51 PM EST

I read the article and learned a couple of things.

1. That plain char is different than unsigned and signed char
2. The assignment style '\xFF'

(learn something new every other day)

The issue I have is the use of casting being bad. One reason I use casting is strickly for documentation purposes. You know what is going on. Some level of conversion is bing performed. You document is. You also increase the portability of code, lessening compiler and micro differences.

When it comes to the char type, I've always treated it as either containing an ASCII character or a number (signed or unsigned). A plain char would work for ASCII. Signed/unsigned char would work for numeric variables. By looking at the type declaration, I would know some level of use.

Sign in to Reply



cpns

2/21/2008 5:40 AM EST

> Of course, dealing with legacy
> (i.e., pre-C99) code is another
> matter.

Not really, uint8_t etc. are not new language features, merely an new standard header. They are just typedefs provided by , providing your own header to define these types for non-C99 compilers is trivial.

Sign in to Reply



kolio

2/22/2008 3:15 AM EST

Regarding C99 comments, one would think that Misra C guidelines are very annoying, but Rules 13-17 are good complement of the topic.

Sign in to Reply



JakobE

2/25/2008 12:05 PM EST

Any code that I see that uses plain "int" or "long" or "char" I consider bad. Since were are in embedded, data tends to have a known size. And since I am routinely seeing code move between 32-bit and 64-bit machines, int and long do tend to change size in unexpected ways. Even long before C99, having your own private uint8_t, int32_t, etc. types in a private header file was recommended practice.

A programmer should keep tight track of data sizes and signedness -- unless working in an environment where you can work in a single arbitrary-precision integer type. But that requires your code never to touch hardware or even sequences of bytes such as network packets. So it pretty much never happens.

Sign in to Reply



markjmeyer

2/25/2008 9:00 PM EST

I would agree that plain "int", "long" and "char" do change when moving between micro environments though not in unexpected ways. By referencing the documentation of the compiler, you know exactly the characteristics, such as size, of data types. Any unexpected ways occurs only when the programmer was not keeping a tight track of data types and usage. And, being embedded software, one needs to know the hardware environment it will be running in.

I also would agree with you that using a private header file does improve portability and, if practiced, does provide the tight control of data types and their usage. That, I think would be where static code analysis tools would come into play.

Sign in to Reply



one_armed_bandit

4/2/2008 10:55 PM EDT

I *always* define and use UINT8, INT8, UINT16, etc. The only times I use 'int i' is when I *know* I am looping thru some short buffer and I will always have enough room in the 'int'. However, I usually use a 'UINT32 i'.

I almost never use a signed value - there is little need in most of what I do. If the math requires it, of course I use signed, but this is pretty rare in the systems I work on.

The only times I use 'char' is when I am using a character array/string - and I usually make it UINT8 instead, then have to cast to (char *) for the C string library functions (sigh). This is one of the few times I cast.

(I also do not usually need to work with non-ASCII strings.)

I needed a cast for a macro that could take an argument that was either a UINT16 or UINT32. It was the only way I could get the compiler to shut up. I was doing a swap(a):

#define swap_field(ff) {
if( sizeof(ff) == sizeof(UINT16) ) {
UINT16 t16 = SEX16(ff);
ff = t16;
} else if(sizeof(ff) == sizeof(UINT32)) {
UINT32 t32 = SEX32(((UINT32)ff));
ff = t32;
} else { assert(
#ff " not UINT16 or UINT32" == NULL );
} }

Then I just needed to say

swap_field( ptr->field );

to deal with byte-sex in a struct and I did not need to care about the size of the field. The swap was done once, in place, and I would define SEX16() and SEX32() as needed.

Even though the field was UINT32, and SEX32() takes a UINT32, the compiler complained.

However, in general, I am careful when I cast && look to see if there *really* is a reason to. I agree with the article.

I also have to wonder why a pointer is being checked for == 0xff instead of NULL.....

Also - why is the 0xff not #defined as some symbol like PTR_DELIM or whatever...

Sign in to Reply



one_armed_bandit

4/2/2008 10:58 PM EDT

oops - the formatting on the code fragment turned out badly.

And - upon posting then reading, the pointer was not == 0xff, the ptr[0] == 0xff.

However, ptr[0] should == 0 for a null-string.

Sign in to Reply



dont know nick's name

6/10/2011 5:49 AM EDT

I believe that a good C programmer should avoid magic numbers, in all situations. Without using 'uint8' or such types, one could also replace this using C constants:
if( ptr[0] == MINUS_1 )
...
with:
#define MINUS_1 ((char)-1)

Casting the value as 'char' in the define is needed as by default all integer constants are assumed as 'int'. Moreover, it ensures that it can be used in all situations, even comparing with 'int' values, as it will automatically be promoted to 'int'. The opposite is not true, as you demonstrate it.

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Jobs sponsored by

Feedback Form