Pointer peril

Use of pointers in the C programming language is subject to a number of constraints, violation of which results in the dreaded undefined behaviour. If a situation with undefined behaviour occurs, anything is permitted to happen. The program may produce unexpected results, crash, or demons may fly out of the user’s nose.

Some of these rules concern pointer arithmetic, addition and subtraction in which one or both operands are pointers. The C99 specification spells it out in section 6.5.6:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. […] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. […]

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.

In simpler, if less accurate, terms, operands and results of pointer arithmetic must be within the same array object. If not, anything can happen.

To see some of this undefined behaviour in action, consider the following example.

#include <stdio.h>

int foo(void)
{
    int a, b;
    int d = &b - &a; /* undefined */
    int *p = &a;
    b = 0;
    p[d] = 1;        /* undefined */
    return b;
}

int main(void)
{
    printf("%d\n", foo());
    return 0;
}

This program breaks the above rules twice. Firstly, the &a - &b calculation is undefined because the pointers being subtracted do not point to elements of the same array. Most compilers will nonetheless evaluate this to the distance between the two variables on the stack. Secondly, accessing p[d] is undefined because p and p + d do not point to elements of the same array (unless the result of the first undefined expression happened to be zero).

It might be tempting to assume that on a modern system with a single, flat address space, these operations would result in the intuitively obvious outcomes, ultimately setting b to the value 1 and returning this same value. However, undefined is undefined, and the compiler is free to do whatever it wants:

$ gcc -O undef.c
$ ./a.out
0

Even on a perfectly normal system, compiled with optimisation enabled the program behaves as though the write to p[d] were ignored. In fact, this is exactly what happened, as this test shows:

$ gcc -O -fno-tree-pta undef.c
$ ./a.out
1

Disabling the tree-pta optimisation in gcc gives us back the intuitive behaviour. PTA stands for points-to analysis, which means the compiler analyses which objects any pointers can validly access. In the example, the pointer p, having been set to &a cannot be used in a valid access to the variable b, a and b not being part of the same array. Between the assignment b = 0 and the return statement, no valid access to b takes place, whence the return value is derived to be zero. The entire function is, in fact, reduced to the assembly equivalent of a simple return 0 statement, all because we decided to violate a couple of language rules.

While this example is obviously contrived for clarity, bugs rooted in these rules occur in real programs from time to time. My most recent encounter with one was in PARI/GP, where a somewhat more complicated incarnation of the example above can be found. Unfortunately, the maintainers of this program are not responsive to reports of such bad practices in their code:

Undefined according to what rule? The code is only requiring the adress space to be flat which is true on all supported platforms.

The rule in question is, of course, the one quoted above. Since the standard makes no exception for flat address spaces, no such exception exists. Although the behaviour could be logically defined in this case, it is not, and all programs must still follow the rules. Filing bug reports against the compiler will not make them go away. As of this writing, the issue remains unresolved.

Bookmark the permalink.

6 Responses to Pointer peril

Leave a Reply to Mans Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.