CodeSourcery fails again

The bug I discovered in CodeSourcery’s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.

Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.

All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.

The test case looks strikingly familiar:

extern unsigned char dst[512] __attribute__((aligned(8)));
extern unsigned char src[512] __attribute__((aligned(8)));

void array_shift(void)
{
    int i;
    for (i = 0; i < 512; i++)
        dst[i] = src[i] >> 7;
}

The aligned(8) attribute is not required to trigger the bug; it merely removes some clutter from the generated assembler. Slightly edited for readability, the assembler output from the compiler looks like this:

array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        vmov.i32    d17, #249  @ v8qi
        mov         r1, #0
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3]
        cmp         r1, #512
        vshl.u8     d16, d16, d17
        vstr        d16, [r2]
        bne         .L2
        bx          lr

The vectoriser has done its job and decided to use NEON vector operations to process 8 elements in parallel. The mysterious-looking constant 249 is simply the 8-bit representation of -7. The error is in using the vmov.i32 instruction, which writes an immediate value into all 32-bit elements of the destination register. Using the resulting vector as the shift amount with the vshl.u8, which operates on vectors of 8-bit data, clearly will not work as intended. Only one in four elements of the array will be shifted, the rest being copied unchanged. The v8qi annotation next to the incorrect instruction is of particular interest. It indicates that the compiler in fact intended to create an 8-element vector of 8-bit values. The translation of this operation into an assembler instruction seems to have gone horribly wrong. A vmov.i8 instruction would have been correct.

As an experiment, I changed arrays to unsigned short, i.e. 16-bit, elements. This is what the compiler produced:

array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        mov         r1, #0
        vldr        d17, .L6
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3, #0]
        cmp         r1, #1024
        vshl.u16    d16, d16, d17
        vstr        d16, [r2, #0]
        bne         .L2
        bx          lr
.L7:
        .align      3
.L6:
        .short      -7
        .short      0
        .short      -7
        .short      0

The immediate operand of the vmov instruction is limited to 8 bits, so the compiler has decided to load the constant vector from a literal pool following the function. The constant it has placed there is perfectly analogous to the flawed value from the first test: the 16-bit representation of -7 zero-extended into a vector of 32-bit elements.

Finally, I replaced the right shift with a left shift. To my astonishment, the compiler generated the correct vmov.i8 instruction (with a constant of +7). It even repeated this feat with 16-bit arrays.

CodeSourcery insist they subject every compiler release to an extensive test suite. Evidently it does not extend to cover the right shift operator.

Bookmark the permalink.

4 Responses to CodeSourcery fails again

  1. Pingback: mv daten /dev/null

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.