Use of pointers in the C programming language is subject to a number of constraints, violation of which results in the dreaded undefined behaviour. If a situation with undefined behaviour occurs, anything is permitted to happen. The program may produce unexpected results, crash, or demons may fly out of the user’s nose.
Some of these rules concern pointer arithmetic, addition and subtraction in which one or both operands are pointers. The C99 specification spells it out in section 6.5.6:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. […] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. […]
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
In simpler, if less accurate, terms, operands and results of pointer arithmetic must be within the same array object. If not, anything can happen.
To see some of this undefined behaviour in action, consider the following example.
#include <stdio.h> int foo(void) { int a, b; int d = &b - &a; /* undefined */ int *p = &a; b = 0; p[d] = 1; /* undefined */ return b; } int main(void) { printf("%d\n", foo()); return 0; }
This program breaks the above rules twice. Firstly, the &a - &b calculation is undefined because the pointers being subtracted do not point to elements of the same array. Most compilers will nonetheless evaluate this to the distance between the two variables on the stack. Secondly, accessing p[d] is undefined because p and p + d do not point to elements of the same array (unless the result of the first undefined expression happened to be zero).
It might be tempting to assume that on a modern system with a single, flat address space, these operations would result in the intuitively obvious outcomes, ultimately setting b to the value 1 and returning this same value. However, undefined is undefined, and the compiler is free to do whatever it wants:
$ gcc -O undef.c $ ./a.out 0
Even on a perfectly normal system, compiled with optimisation enabled the program behaves as though the write to p[d] were ignored. In fact, this is exactly what happened, as this test shows:
$ gcc -O -fno-tree-pta undef.c $ ./a.out 1
Disabling the tree-pta optimisation in gcc gives us back the intuitive behaviour. PTA stands for points-to analysis, which means the compiler analyses which objects any pointers can validly access. In the example, the pointer p, having been set to &a cannot be used in a valid access to the variable b, a and b not being part of the same array. Between the assignment b = 0 and the return statement, no valid access to b takes place, whence the return value is derived to be zero. The entire function is, in fact, reduced to the assembly equivalent of a simple return 0 statement, all because we decided to violate a couple of language rules.
While this example is obviously contrived for clarity, bugs rooted in these rules occur in real programs from time to time. My most recent encounter with one was in PARI/GP, where a somewhat more complicated incarnation of the example above can be found. Unfortunately, the maintainers of this program are not responsive to reports of such bad practices in their code:
Undefined according to what rule? The code is only requiring the adress space to be flat which is true on all supported platforms.
The rule in question is, of course, the one quoted above. Since the standard makes no exception for flat address spaces, no such exception exists. Although the behaviour could be logically defined in this case, it is not, and all programs must still follow the rules. Filing bug reports against the compiler will not make them go away. As of this writing, the issue remains unresolved.
I wonder how many bugs were reported for GCC producing “wrong” code when pointer aliasing is used. Or other things that ignore undefined behaviour.
All world runs on x86 CPUs after all…
As far as I know, nothing in the C standard promises that a and b are on a sizeof(int) address boundary. So even without optimizations, you could still have rounding errors ;-) depending on the ABI obviously.
On my machine, the following function yields the same result with and without -O0 (gcc 4.5.2). According to my understanding of the specs, it relies on the same undefined behaviour. Or do I misunderstand “array objects”?
Your program prints the difference between two addresses, computation of which is, as you say, undefined. In this particular case, with your compiler, nothing unexpected happens. Undefined behaviour is not required to do anything strange.
Hi,
for program where that kind of code exists :
int d = &b – &a; /* undefined */
Do you have any idea on how to solve this problem ?
I understand the problem, is it only by specifying ‘volatile’ for variable we could fix it (say to compiler to not optimize) ?
Thank you
You’re trying to solve the wrong problem. There should never, ever be a need to do such pointer arithmetic.