Compilers - Hardwarebug

The bug I discovered in CodeSourcery’s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.

Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.

All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.

The test case looks strikingly familiar:

extern unsigned char dst[512] __attribute__((aligned(8)));
extern unsigned char src[512] __attribute__((aligned(8)));

void array_shift(void)
{
    int i;
    for (i = 0; i < 512; i++)
        dst[i] = src[i] >> 7;
}

Continue reading →

GCC inline asm annoyance

Saturday, 25th October, 2008 - 1:36 pm | Compilers, PowerPC

7 Comments

Doing some PowerPC work recently, I wanted to use the lwbrx instruction, which loads a little endian word from memory. A simple asm statement wrapped in an inline function seemed like the simplest way to do this.

The lwbrx instruction comes with a minor limitation. It is only available in X-form, that is, the effective address is formed by adding the values of two register operands. Normal load instructions also have a D-form, which computes the effective address by adding an immediate offset to a register operand.

This means that my asm statement cannot use a normal “m” constraint for the memory operand, as this would allow GCC to use D-form addressing, which this instruction does not allow. I thus go in search of a special constraint to request X-form. GCC inline assembler supports a number of machine-specific constraints to cover situations like this one. To my dismay, the manual makes no mention of a suitable contraint to use.

Not giving up hope, I head for Google. Google always has answers. Almost always. None of the queries I can think of return a useful result. My quest finally comes to an end with the GCC machine description for PowerPC. This cryptic file suggests an (undocumented) “Z” constraint might work.

My first attempt at using the newly discovered “Z” constraint fails. The compiler still generates D-form address operands. Another examination of the machine description provides the answer. When referring to the operand, I must use %y0 in place of the usual %0. Needless to say, documentation explaining this syntax is nowhere to be found.

After spending the better part of an hour on a task I expected to take no more than five minutes, I finally arrive at a working solution:

static inline uint32_t load_le32(const uint32_t *p)
{
    uint32_t v;
    asm ("lwbrx %0, %y1" : "=r"(v) : "Z"(*p));
    return v;
}

CodeSourcery’s defence

Tuesday, 14th October, 2008 - 3:22 am | ARM, Bugs, Compilers

CodeSourcery GCC 2008q3: FAIL

Saturday, 11th October, 2008 - 1:48 pm | ARM, Bugs, Compilers

1 Comment

A few days ago, CodeSourcery released their latest version of GCC for ARM, dubbed 2008q3. An announcement email boasts “Improved support for NEON and, in particular, auto-vectorization using NEON.” It is time to put that claim to the test.

FFmpeg has a history of triggering compiler bugs, making it a good test case. Some extra speed would do it good as well.

The new compiler builds FFmpeg without complaint, so everything is looking good so far. To check for any speedup from the improved compiler, I use an Indiana Jones trailer encoded with H.264. Disappointingly, I am unable to get any speed figures. The decoding stops after 160 frames, the immediate cause being an unaligned NEON load in simple loop for copying a few bytes.

Is FFmpeg broken? The same code built with an older compiler release works perfectly, and the parameters passed to the failing function are similar-looking. The answer must lie in the copy loop itself. To verify this hypothesis, I set out to reproduce the error with a minimal test case.

The failure proves remarkably simple to trigger. The test case I arrive at consists of two C source files. The first file is our copy loop:

void copy(char *dst, char *src, int len)
{
    int i;
    for (i = 0; i < len; i++)
        dst[i] = src[i];
}

The second file is our main() function, invoking the copy with suitably unaligned arguments:

extern void copy(char *dst, char *src, int len);
char src[20], dst[16];

int main(void)
{
    char *p = src + !((unsigned)src & 1);
    copy(dst, p, 16);
    return 0;
}

Compiling this with -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -O3 flags results in a broken executable. Adding -fno-tree-vectorize makes the error go away.

So much for the improved auto-vectorisation.

Not testing every compiler on FFmpeg is understandable. Not testing even the most trivial of constructs is unforgivable.

Category Archives: Compilers

Shared library woes and the price of PIC

CodeSourcery fails again

GCC inline asm annoyance

CodeSourcery’s defence

CodeSourcery GCC 2008q3: FAIL

Recent Posts

Recent Comments

Categories

Archives

Meta