<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hardwarebug &#187; Bugs</title>
	<atom:link href="http://hardwarebug.org/category/bugs/feed/" rel="self" type="application/rss+xml" />
	<link>http://hardwarebug.org</link>
	<description>Everything is broken</description>
	<lastBuildDate>Tue, 17 Aug 2010 14:47:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Shared library woes and the price of PIC</title>
		<link>http://hardwarebug.org/2009/01/02/shared-library-woes-and-the-price-of-pic/</link>
		<comments>http://hardwarebug.org/2009/01/02/shared-library-woes-and-the-price-of-pic/#comments</comments>
		<pubDate>Fri, 02 Jan 2009 18:28:53 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Optimisation]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=100</guid>
		<description><![CDATA[It recently came to my attention that the GNU linker on ARM lacks support for several relocation types in shared libraries. Specifically, code using MOVW/MOVT instruction pairs to load the address of data symbols will not work in a shared library. The linker silently drops the necessary relocations, resulting in a runtime crash. When I [...]]]></description>
			<content:encoded><![CDATA[<p>It recently came to my attention that the GNU linker on ARM lacks support for several relocation types in shared libraries. Specifically, code using <code>MOVW/MOVT</code> instruction pairs to load the address of data symbols will not work in a shared library. The linker silently drops the necessary relocations, resulting in a runtime crash.</p>
<p>When I pointed out this shortcoming to Paul Brook of CodeSourcery, his response was that such relocations in shared libraries are not supported by the GNU tools, will never be, and that shared libraries should be built with position-independent code (PIC). This is an unfortunate attitude, and doubly so considering that the latest CodeSourcery GCC version will generate these instructions with default settings. In other words, the 2008q3 release of CodeSourcery GCC will, with default flags, build crashing shared libraries without so much as a warning.</p>
<p>The refusal to support non-PIC shared libraries is unfortunate also from a performance point of view. Position independent code is inherently slower than normal code.</p>
<p>In order to find out just how much slower PIC is on ARM, I made two builds of FFmpeg, one normal and one with PIC. The PIC build is about 1.7% slower in several tests, among them H.264 video decoding.</p>
<p>On typically resource-constrained ARM systems it would be nice to have the option of space-saving shared libraries without paying the PIC penalty in performance. Until now this option has been a reality. With CodeSourcery lazily refusing to support the relocations required by the latest version of their own compiler, this option may soon be a thing of the past, at least if the bugs that have haunted recent compiler releases are fixed in upcoming versions.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2009/01/02/shared-library-woes-and-the-price-of-pic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CodeSourcery fails again</title>
		<link>http://hardwarebug.org/2008/11/28/codesourcery-fails-again/</link>
		<comments>http://hardwarebug.org/2008/11/28/codesourcery-fails-again/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 00:19:49 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=83</guid>
		<description><![CDATA[The bug I discovered in CodeSourcery&#8217;s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive. Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/">bug</a> I discovered in CodeSourcery&#8217;s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.</p>
<p>Since last time, I have updated the <a href="http://ffmpeg.org/">FFmpeg</a> regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.</p>
<p>All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.</p>
<p>The test case looks strikingly familiar:</p>
<blockquote>
<pre>extern unsigned char dst[512] __attribute__((aligned(8)));
extern unsigned char src[512] __attribute__((aligned(8)));

void array_shift(void)
{
    int i;
    for (i = 0; i &lt; 512; i++)
        dst[i] = src[i] &gt;&gt; 7;
}</pre>
</blockquote>
<p><span id="more-83"></span>The <code>aligned(8)</code> attribute is not required to trigger the bug; it merely removes some clutter from the generated assembler. Slightly edited for readability, the assembler output from the compiler looks like this:</p>
<blockquote>
<pre>array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        vmov.i32    d17, #249  @ v8qi
        mov         r1, #0
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3]
        cmp         r1, #512
        vshl.u8     d16, d16, d17
        vstr        d16, [r2]
        bne         .L2
        bx          lr</pre>
</blockquote>
<p>The vectoriser has done its job and decided to use NEON vector operations to process 8 elements in parallel. The mysterious-looking constant 249 is simply the 8-bit representation of -7. The error is in using the <code>vmov.i32</code> instruction, which writes an immediate value into all <strong>32-bit</strong> elements of the destination register. Using the resulting vector as the shift amount with the <code>vshl.u8</code>, which operates on vectors of <strong>8-bit</strong> data, clearly will not work as intended. Only one in four elements of the array will be shifted, the rest being copied unchanged. The <code>v8qi</code> annotation next to the incorrect instruction is of particular interest. It indicates that the compiler in fact intended to create an 8-element vector of 8-bit values. The translation of this operation into an assembler instruction seems to have gone horribly wrong. A vmov.i8 instruction would have been correct.</p>
<p>As an experiment, I changed arrays to <code>unsigned short</code>, i.e. 16-bit, elements. This is what the compiler produced:</p>
<blockquote>
<pre>array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        mov         r1, #0
        vldr        d17, .L6
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3, #0]
        cmp         r1, #1024
        vshl.u16    d16, d16, d17
        vstr        d16, [r2, #0]
        bne         .L2
        bx          lr
.L7:
        .align      3
.L6:
        .short      -7
        .short      0
        .short      -7
        .short      0</pre>
</blockquote>
<p>The immediate operand of the <code>vmov</code> instruction is limited to 8 bits, so the compiler has decided to load the constant vector from a literal pool following the function. The constant it has placed there is perfectly analogous to the flawed value from the first test: the 16-bit representation of -7 zero-extended into a vector of 32-bit elements.</p>
<p>Finally, I replaced the right shift with a left shift. To my astonishment, the compiler generated the correct <code>vmov.i8</code> instruction (with a constant of +7). It even repeated this feat with 16-bit arrays.</p>
<p>CodeSourcery insist they subject every compiler release to an extensive test suite. Evidently it does not extend to cover the right shift operator.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/11/28/codesourcery-fails-again/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>CodeSourcery&#8217;s defence</title>
		<link>http://hardwarebug.org/2008/10/14/codesourcerys-defence/</link>
		<comments>http://hardwarebug.org/2008/10/14/codesourcerys-defence/#comments</comments>
		<pubDate>Tue, 14 Oct 2008 03:22:50 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=21</guid>
		<description><![CDATA[Having covered the spectacular failure of CodeSourcery&#8217;s latest ARM compiler a few days ago, I was engaged in a curious debate on IRC with one of their employees. Fiercely denying the problem at first, he eventually offered an explanation: they do not test the compiler output on real hardware; they use QEMU. QEMU is a [...]]]></description>
			<content:encoded><![CDATA[<p>Having covered the <a href="http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/">spectacular failure</a> of CodeSourcery&#8217;s latest ARM compiler a few days ago, I was engaged in a curious <a href="http://www.beagleboard.org/irclogs/index.php?date=2008-10-11#T11:31:19">debate on IRC</a> with one of their employees. Fiercely denying the problem at first, he eventually offered an explanation: they do not test the compiler output on real hardware; they use <a href="http://bellard.org/qemu/">QEMU</a>.</p>
<p>QEMU is a CPU emulator supporting a variety of targets. While great for casual development, and for running foreign applications, it is certainly no substitute for real hardware when testing a compiler. Like any piece of software, an emulator is bound to have a few errors, and as it happens, QEMU has known bugs in its handling of the NEON instruction set. Our friend at CodeSourcery should be well aware of these, also being a QEMU developer.</p>
<p>The use of emulators was explained as a necessity due to real hardware not being available. To be fair, CodeSourcery does develop against new hardware before it exists, so some reliance on emulators is unavoidable. This is, however, not the case this time. The <a href="http://elinux.org/BeagleBoard">Beagleboard</a> was made available to selected developers quite some time ago (I have had one since May, others still longer), and is now being sold by the thousands. CodeSourcery developers, so I am told, were also given an offer of a free board, an offer they chose to refuse.</p>
<p>What does all this mean? Did Murphy decide to inflict maximum bad luck on the hard-working developers, or is there perhaps a larger conspiracy at work? I shall not attempt to speculate in this matter. I will merely repeat this excellent piece of advice given by Robert J. Hanlon: <em>Never attribute to malice that which can be adequately explained by incompetence.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/10/14/codesourcerys-defence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CodeSourcery GCC 2008q3: FAIL</title>
		<link>http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/</link>
		<comments>http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/#comments</comments>
		<pubDate>Sat, 11 Oct 2008 13:48:23 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=11</guid>
		<description><![CDATA[A few days ago, CodeSourcery released their latest version of GCC for ARM, dubbed 2008q3. An announcement email boasts &#8220;Improved support for NEON and, in particular, auto-vectorization using NEON.&#8221; It is time to put that claim to the test. FFmpeg has a history of triggering compiler bugs, making it a good test case. Some extra [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago, CodeSourcery released their latest version of GCC for ARM, dubbed 2008q3. An <a href="http://www.codesourcery.com/archives/arm-gnu-announce/msg00024.html">announcement email</a> boasts &#8220;Improved support for NEON and, in particular, auto-vectorization using NEON.&#8221; It is time to put that claim to the test.<a href="http://ffmpeg.org/"></a></p>
<p><a href="http://ffmpeg.org/"> FFmpeg</a> has a history of triggering compiler bugs, making it a good test case. Some extra speed would do it good as well.</p>
<p>The new compiler builds FFmpeg without complaint, so everything is looking good so far. To check for any speedup from the improved compiler, I use an <a href="http://movies.apple.com/movies/paramount/indiana_jones_4/indiana_jones_4-tlr3_h640w.mov">Indiana Jones trailer</a> encoded with H.264. Disappointingly, I am unable to get any speed figures. The decoding stops after 160 frames, the immediate cause being an unaligned NEON load in simple loop for copying a few bytes.</p>
<p>Is FFmpeg broken? The same code built with an older compiler release works perfectly, and the parameters passed to the failing function are similar-looking. The answer must lie in the copy loop itself. To verify this hypothesis, I set out to reproduce the error with a minimal test case.</p>
<p>The failure proves remarkably simple to trigger. The test case I arrive at consists of two C source files. The first file is our copy loop:</p>
<blockquote><pre>void copy(char *dst, char *src, int len)
{
    int i;
    for (i = 0; i &lt; len; i++)
        dst[i] = src[i];
}</pre>
</blockquote>
<p>The second file is our <code>main()</code> function, invoking the copy with suitably unaligned arguments:</p>
<blockquote><pre>extern void copy(char *dst, char *src, int len);
char src[20], dst[16];

int main(void)
{
    char *p = src + !((unsigned)src &amp; 1);
    copy(dst, p, 16);
    return 0;
}</pre>
</blockquote>
<p>Compiling this with <code>-mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -O3</code> flags results in a broken executable. Adding <code>-fno-tree-vectorize</code> makes the error go away.</p>
<p>So much for the improved auto-vectorisation.</p>
<p>Not testing every compiler on FFmpeg is understandable. Not testing even the most trivial of constructs is unforgivable.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://movies.apple.com/movies/paramount/indiana_jones_4/indiana_jones_4-tlr3_h640w.mov" length="16215526" type="video/quicktime" />
		</item>
		<item>
		<title>F00F C7C8</title>
		<link>http://hardwarebug.org/2008/10/07/f00f-c7c8/</link>
		<comments>http://hardwarebug.org/2008/10/07/f00f-c7c8/#comments</comments>
		<pubDate>Tue, 07 Oct 2008 21:27:50 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[Bugs]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=4</guid>
		<description><![CDATA[Possibly the most well-known CPU bug in modern times, the Intel Pentium F00F bug will serve well as the topic of this first post. Read all about it at http://x86.ddj.com/errata/dec97/f00fbug.htm.]]></description>
			<content:encoded><![CDATA[<p>Possibly the most well-known CPU bug in modern times, the Intel Pentium F00F bug will serve well as the topic of this first post.</p>
<p>Read all about it at <a href="http://x86.ddj.com/errata/dec97/f00fbug.htm">http://x86.ddj.com/errata/dec97/f00fbug.htm</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/10/07/f00f-c7c8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
