CodeSourcery fails again

The bug I discovered in CodeSourcery’s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.

Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.

All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.

The test case looks strikingly familiar:

extern unsigned char dst[512] __attribute__((aligned(8)));
extern unsigned char src[512] __attribute__((aligned(8)));

void array_shift(void)
{
    int i;
    for (i = 0; i < 512; i++)
        dst[i] = src[i] >> 7;
}

The aligned(8) attribute is not required to trigger the bug; it merely removes some clutter from the generated assembler. Slightly edited for readability, the assembler output from the compiler looks like this:

array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        vmov.i32    d17, #249  @ v8qi
        mov         r1, #0
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3]
        cmp         r1, #512
        vshl.u8     d16, d16, d17
        vstr        d16, [r2]
        bne         .L2
        bx          lr

The vectoriser has done its job and decided to use NEON vector operations to process 8 elements in parallel. The mysterious-looking constant 249 is simply the 8-bit representation of -7. The error is in using the vmov.i32 instruction, which writes an immediate value into all 32-bit elements of the destination register. Using the resulting vector as the shift amount with the vshl.u8, which operates on vectors of 8-bit data, clearly will not work as intended. Only one in four elements of the array will be shifted, the rest being copied unchanged. The v8qi annotation next to the incorrect instruction is of particular interest. It indicates that the compiler in fact intended to create an 8-element vector of 8-bit values. The translation of this operation into an assembler instruction seems to have gone horribly wrong. A vmov.i8 instruction would have been correct.

As an experiment, I changed arrays to unsigned short, i.e. 16-bit, elements. This is what the compiler produced:

array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        mov         r1, #0
        vldr        d17, .L6
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3, #0]
        cmp         r1, #1024
        vshl.u16    d16, d16, d17
        vstr        d16, [r2, #0]
        bne         .L2
        bx          lr
.L7:
        .align      3
.L6:
        .short      -7
        .short      0
        .short      -7
        .short      0

The immediate operand of the vmov instruction is limited to 8 bits, so the compiler has decided to load the constant vector from a literal pool following the function. The constant it has placed there is perfectly analogous to the flawed value from the first test: the 16-bit representation of -7 zero-extended into a vector of 32-bit elements.

Finally, I replaced the right shift with a left shift. To my astonishment, the compiler generated the correct vmov.i8 instruction (with a constant of +7). It even repeated this feat with 16-bit arrays.

CodeSourcery insist they subject every compiler release to an extensive test suite. Evidently it does not extend to cover the right shift operator.

Bookmark the permalink.

4 Responses to CodeSourcery fails again

  1. Laurent says:

    That’s disappointing, but expected since NEON support is not that old.

    I wonder how the various buggy tests you find behave in qemu which has so many bugs in its NEON implementation.

  2. Mans —

    Thank you for continuing to exercise Sourcery G++ with FFMPEG.

    Of course, we’re disappointed that you found a defect in the latest
    release.

    (In the future, it would be helpful if you could post any defects to
    our public mailing list here:

    http://www.codesourcery.com/archives/arm-gnu/maillist.html

    so that we are sure to see them. It’s very helpful to have both
    preprocessed source code and command-line options with defect
    reports.)

    We have been able to reproduce this bug, though not with the source
    code you posted. It looks like the case that triggers the defect uses
    “unsigned char” rather than “unsigned short” as in your posting. With
    that change, and the options “-mcpu=cortex-a8 -mfpu=neon -O3
    -mfloat-abi=softfp”, we do see the bug.

    You can work around this bug by using “-fno-tree-vectorize” or “-O2”,
    both of which will disable automatic vectorization. You can use those
    options just on the affected kernels in FFMPEG, while still getting
    the benefits of vectorization elsewhere.

    As you know, we made available to you a preview version of this latest
    release specifically so that you could test FFMPEG with it. You
    reported that:

    > I’ve done some simple testing, and I’m not finding any bugs this
    > time. The test case from before works, and FFmpeg builds and runs
    > correctly.

    We’ve verified that the same bug that you reported here was indeed
    present in that preview release. It sounds like you’ve broadened your
    test coverage since then. Obviously, we’re sympathetic to the problem
    of testing large, complex software packages; it’s very difficult to
    cover all possible cases. We’re continually expanding our test
    coverage as well.

    We run hundreds of thousands (or millions, depending how you count) of
    test cases on every release, both on real hardware (including
    Cortex-A8) and in simulation. Those include the DejaGNU regression
    testsuites that are part of the FSF sources for the tools, as well as
    (depending on the exact release), testsuites and benchmarks like
    Plum-Hall, Dhrystone, EEMBC, and SPEC.

    We maintain a database of failures in these testsuites, and analyze
    new failures to determine their origin. We also do manual testing of
    each release in order to provide a “sanity check”. The pass rates in
    our testing are generally 99.99% or higher, with the remainder of the
    failures explained by vagaries of the test environment.

    The vectorization testsuite in DejaGNU does test the compiler’s
    ability to vectorize a variety of loops, and, further, that the
    vectorized loops run correctly. We know that on NEON our releases
    vectorize the same set of loops from these test cases that are
    vectorized on other platforms with similar vector capabilities. We
    also know that the vectorized loops run correctly on Cortex-A8.
    Unfortunately, the testsuites that we have do not cover the test case
    which you posted here. Therefore, when we fix this defect, we will
    add a new test case to the DejaGNU testsuite, ensuring that the defect
    does not reoccur in future releases.

    Unfortunately, the GNU toolchain does have defects. The tools
    community (in which CodeSourcery is an active participant) attempts to
    correct as many defects as possible — both in the context of
    volunteer and commercial development — but, inevitably, every release
    does have defects. A quick search of the public GCC bug database
    indicates 236 open wrong-code bugs; these are all cases where GCC is
    known to generate incorrect code on a valid input program, as in your
    test case here.

    We will continue to do our best to improve the quality of GCC in
    general, and our releases in particular. We have entered the defect
    you report into our internal defect tracker and expect that it will be
    corrected in a future release.

    Best regards,


    Mark Mitchell
    CodeSourcery

  3. Mans says:

    I accidentally pasted the code with “unsigned short” arrays; this has been corrected. That said, both fail in pretty much the same way, as I explain further on.

    Regarding workarounds, I am well aware that -fno-tree-vectorize will disable the broken optimisation. However, having vectorisation with such severe bugs enabled by default at -O3 is not acceptable in my opinion.

    The “preview release” of which you speak was nothing other than an evaluation download of the commercial package, and it is correct that I did not immediately find any errors in that release. As I said in the email you quoted above, my test case for the other bug passed, and FFmpeg worked correctly in a few simple tests. I have since added the ability to run the full FFmpeg regression test suite on the target board, which enabled me to find the latest bug.

    As for testing, the number of tests, even if inflated into the millions (a number I simply do not believe), is irrelevant. What matters is coverage. In particular, every feature should be exercised at least once during testing. Specifically, if the right-shifting every element in an array is recognised as vectorisable, this should be tested. It is evident that this did not happen.

    To the remainder of your game with test suite names and numbers, I can think of no response more appropriate than the words of the brilliant Dijkstra: Testing shows the presence, not the absence of bugs.

  4. Pingback: mv daten /dev/null

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.