The bug I discovered in CodeSourcery’s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.
Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.
All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.
The test case looks strikingly familiar:
extern unsigned char dst[512] __attribute__((aligned(8))); extern unsigned char src[512] __attribute__((aligned(8))); void array_shift(void) { int i; for (i = 0; i < 512; i++) dst[i] = src[i] >> 7; }
The aligned(8) attribute is not required to trigger the bug; it merely removes some clutter from the generated assembler. Slightly edited for readability, the assembler output from the compiler looks like this:
array_shift: movw ip, #:lower16:dst movw r0, #:lower16:src movt ip, #:upper16:dst movt r0, #:upper16:src vmov.i32 d17, #249 @ v8qi mov r1, #0 .L2: add r2, ip, r1 add r3, r0, r1 add r1, r1, #8 vldr d16, [r3] cmp r1, #512 vshl.u8 d16, d16, d17 vstr d16, [r2] bne .L2 bx lr
The vectoriser has done its job and decided to use NEON vector operations to process 8 elements in parallel. The mysterious-looking constant 249 is simply the 8-bit representation of -7. The error is in using the vmov.i32 instruction, which writes an immediate value into all 32-bit elements of the destination register. Using the resulting vector as the shift amount with the vshl.u8, which operates on vectors of 8-bit data, clearly will not work as intended. Only one in four elements of the array will be shifted, the rest being copied unchanged. The v8qi annotation next to the incorrect instruction is of particular interest. It indicates that the compiler in fact intended to create an 8-element vector of 8-bit values. The translation of this operation into an assembler instruction seems to have gone horribly wrong. A vmov.i8 instruction would have been correct.
As an experiment, I changed arrays to unsigned short, i.e. 16-bit, elements. This is what the compiler produced:
array_shift: movw ip, #:lower16:dst movw r0, #:lower16:src movt ip, #:upper16:dst movt r0, #:upper16:src mov r1, #0 vldr d17, .L6 .L2: add r2, ip, r1 add r3, r0, r1 add r1, r1, #8 vldr d16, [r3, #0] cmp r1, #1024 vshl.u16 d16, d16, d17 vstr d16, [r2, #0] bne .L2 bx lr .L7: .align 3 .L6: .short -7 .short 0 .short -7 .short 0
The immediate operand of the vmov instruction is limited to 8 bits, so the compiler has decided to load the constant vector from a literal pool following the function. The constant it has placed there is perfectly analogous to the flawed value from the first test: the 16-bit representation of -7 zero-extended into a vector of 32-bit elements.
Finally, I replaced the right shift with a left shift. To my astonishment, the compiler generated the correct vmov.i8 instruction (with a constant of +7). It even repeated this feat with 16-bit arrays.
CodeSourcery insist they subject every compiler release to an extensive test suite. Evidently it does not extend to cover the right shift operator.
Pingback: mv daten /dev/null