A few days ago, CodeSourcery released their latest version of GCC for ARM, dubbed 2008q3. An announcement email boasts “Improved support for NEON and, in particular, auto-vectorization using NEON.” It is time to put that claim to the test.
FFmpeg has a history of triggering compiler bugs, making it a good test case. Some extra speed would do it good as well.
The new compiler builds FFmpeg without complaint, so everything is looking good so far. To check for any speedup from the improved compiler, I use an Indiana Jones trailer encoded with H.264. Disappointingly, I am unable to get any speed figures. The decoding stops after 160 frames, the immediate cause being an unaligned NEON load in simple loop for copying a few bytes.
Is FFmpeg broken? The same code built with an older compiler release works perfectly, and the parameters passed to the failing function are similar-looking. The answer must lie in the copy loop itself. To verify this hypothesis, I set out to reproduce the error with a minimal test case.
The failure proves remarkably simple to trigger. The test case I arrive at consists of two C source files. The first file is our copy loop:
void copy(char *dst, char *src, int len) { int i; for (i = 0; i < len; i++) dst[i] = src[i]; }
The second file is our main() function, invoking the copy with suitably unaligned arguments:
extern void copy(char *dst, char *src, int len); char src[20], dst[16]; int main(void) { char *p = src + !((unsigned)src & 1); copy(dst, p, 16); return 0; }
Compiling this with -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -O3 flags results in a broken executable. Adding -fno-tree-vectorize makes the error go away.
So much for the improved auto-vectorisation.
Not testing every compiler on FFmpeg is understandable. Not testing even the most trivial of constructs is unforgivable.
Have you checked that the FSF GCC release 4.3.2 also produces a broken binary? If so, can you file a GCC bug report please?