<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hardwarebug &#187; Bugs</title>
	<atom:link href="http://hardwarebug.org/category/bugs/feed/" rel="self" type="application/rss+xml" />
	<link>http://hardwarebug.org</link>
	<description>Everything is broken</description>
	<lastBuildDate>Tue, 18 Oct 2011 00:41:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Pointer peril</title>
		<link>http://hardwarebug.org/2011/10/18/pointer-peril/</link>
		<comments>http://hardwarebug.org/2011/10/18/pointer-peril/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 00:26:00 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Optimisation]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=587</guid>
		<description><![CDATA[Use of pointers in the C programming language is subject to a number of constraints, violation of which results in the dreaded undefined behaviour. If a situation with undefined behaviour occurs, anything is permitted to happen. The program may produce unexpected results, crash, or demons may fly out of the user&#8217;s nose. Some of these [...]]]></description>
			<content:encoded><![CDATA[<p>Use of pointers in the C programming language is subject to a number of constraints, violation of which results in the dreaded <em>undefined behaviour</em>. If a situation with undefined behaviour occurs, anything is permitted to happen. The program may produce unexpected results, crash, or demons may fly out of the user&#8217;s nose.</p>
<p>Some of these rules concern pointer arithmetic, addition and subtraction in which one or both operands are pointers. The C99 specification spells it out in section 6.5.6:</p>
<blockquote><div class="frame-outer small">
<div style="text-align: left;">
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. [&hellip;] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. [&hellip;]<br />
<br/>When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
</div>
</div>
</blockquote>
<p>In simpler, if less accurate, terms, operands and results of pointer arithmetic must be within the same array object. If not, anything can happen.<br />
<span id="more-587"></span><br />
To see some of this undefined behaviour in action, consider the following example.</p>
<blockquote><div class="frame-outer small">
<pre style="text-align: left; margin: 0;">
#include &lt;stdio.h&gt;

int foo(void)
{
    int a, b;
    int d = &amp;b - &amp;a; /* undefined */
    int *p = &amp;a;
    b = 0;
    p[d] = 1;        /* undefined */
    return b;
}

int main(void)
{
    printf("%d\n", foo());
    return 0;
}
</pre>
</div>
</blockquote>
<p>This program breaks the above rules twice. Firstly, the <code>&amp;a - &amp;b</code> calculation is undefined because the pointers being subtracted do not point to elements of the same array.  Most compilers will nonetheless evaluate this to the distance between the two variables on the stack.  Secondly, accessing <code>p[d]</code> is undefined because <code>p</code> and <code>p + d</code> do not point to elements of the same array (unless the result of the first undefined expression happened to be zero).</p>
<p>It might be tempting to assume that on a modern system with a single, flat address space, these operations would result in the intuitively obvious outcomes, ultimately setting <code>b</code> to the value 1 and returning this same value.  However, undefined is undefined, and the compiler is free to do whatever it wants:</p>
<blockquote><div class="frame-outer small">
<pre style="text-align: left; margin: 0;">
$ gcc -O undef.c
$ ./a.out
0
</pre>
</div>
</blockquote>
<p>Even on a perfectly normal system, compiled with optimisation enabled the program behaves as though the write to <code>p[d]</code> were ignored.  In fact, this is exactly what happened, as this test shows:</p>
<blockquote><div class="frame-outer small">
<pre style="text-align: left; margin: 0;">
$ gcc -O -fno-tree-pta undef.c
$ ./a.out
1
</pre>
</div>
</blockquote>
<p>Disabling the <a href="http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html#index-ftree_002dpta-802">tree-pta optimisation</a> in gcc gives us back the intuitive behaviour.  PTA stands for points-to analysis, which means the compiler analyses which objects any pointers can validly access.  In the example, the pointer <code>p</code>, having been set to <code>&amp;a</code> cannot be used in a valid access to the variable <code>b</code>, <code>a</code> and <code>b</code> not being part of the same array.  Between the assignment <code>b = 0</code> and the return statement, no valid access to <code>b</code> takes place, whence the return value is derived to be zero.  The entire function is, in fact, reduced to the assembly equivalent of a simple <code>return 0</code> statement, all because we decided to violate a couple of language rules.</p>
<p>While this example is obviously contrived for clarity, bugs rooted in these rules occur in real programs from time to time.  My most recent encounter with one was in <a href="http://pari.math.u-bordeaux.fr/cgi-bin/bugreport.cgi?bug=1237">PARI/GP</a>, where a somewhat more complicated <a href="http://pari.math.u-bordeaux.fr/cgi-bin/gitweb.cgi?p=pari.git;a=blob;f=src/headers/pariinl.h;h=4b0680a27b7615df56f84b54b16a15986db9b82e;hb=HEAD#l590">incarnation</a> of the example above can be found.  Unfortunately, the maintainers of this program are not responsive to reports of such bad practices in their code:</p>
<blockquote><div class="frame-outer small">
<div style="text-align: left;">
Undefined according to what rule? The code is only requiring the adress space to be flat which is true on all supported platforms.
</div>
</div>
</blockquote>
<p>The rule in question is, of course, the one quoted above.  Since the standard makes no exception for flat address spaces, no such exception exists.  Although the behaviour could be logically defined in this case, it is not, and all programs must still follow the rules.  Filing <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49140">bug reports</a> against the compiler will not make them go away.  As of this writing, the issue remains unresolved.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2011/10/18/pointer-peril/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Shared library woes and the price of PIC</title>
		<link>http://hardwarebug.org/2009/01/02/shared-library-woes-and-the-price-of-pic/</link>
		<comments>http://hardwarebug.org/2009/01/02/shared-library-woes-and-the-price-of-pic/#comments</comments>
		<pubDate>Fri, 02 Jan 2009 18:28:53 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>
		<category><![CDATA[Optimisation]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=100</guid>
		<description><![CDATA[It recently came to my attention that the GNU linker on ARM lacks support for several relocation types in shared libraries. Specifically, code using MOVW/MOVT instruction pairs to load the address of data symbols will not work in a shared library. The linker silently drops the necessary relocations, resulting in a runtime crash. When I [...]]]></description>
			<content:encoded><![CDATA[<p>It recently came to my attention that the GNU linker on ARM lacks support for several relocation types in shared libraries. Specifically, code using <code>MOVW/MOVT</code> instruction pairs to load the address of data symbols will not work in a shared library. The linker silently drops the necessary relocations, resulting in a runtime crash.</p>
<p>When I pointed out this shortcoming to Paul Brook of CodeSourcery, his response was that such relocations in shared libraries are not supported by the GNU tools, will never be, and that shared libraries should be built with position-independent code (PIC). This is an unfortunate attitude, and doubly so considering that the latest CodeSourcery GCC version will generate these instructions with default settings. In other words, the 2008q3 release of CodeSourcery GCC will, with default flags, build crashing shared libraries without so much as a warning.</p>
<p>The refusal to support non-PIC shared libraries is unfortunate also from a performance point of view. Position independent code is inherently slower than normal code.</p>
<p>In order to find out just how much slower PIC is on ARM, I made two builds of FFmpeg, one normal and one with PIC. The PIC build is about 1.7% slower in several tests, among them H.264 video decoding.</p>
<p>On typically resource-constrained ARM systems it would be nice to have the option of space-saving shared libraries without paying the PIC penalty in performance. Until now this option has been a reality. With CodeSourcery lazily refusing to support the relocations required by the latest version of their own compiler, this option may soon be a thing of the past, at least if the bugs that have haunted recent compiler releases are fixed in upcoming versions.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2009/01/02/shared-library-woes-and-the-price-of-pic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CodeSourcery fails again</title>
		<link>http://hardwarebug.org/2008/11/28/codesourcery-fails-again/</link>
		<comments>http://hardwarebug.org/2008/11/28/codesourcery-fails-again/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 00:19:49 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=83</guid>
		<description><![CDATA[The bug I discovered in CodeSourcery&#8217;s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive. Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/">bug</a> I discovered in CodeSourcery&#8217;s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.</p>
<p>Since last time, I have updated the <a href="http://ffmpeg.org/">FFmpeg</a> regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.</p>
<p>All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.</p>
<p>The test case looks strikingly familiar:</p>
<blockquote>
<pre>extern unsigned char dst[512] __attribute__((aligned(8)));
extern unsigned char src[512] __attribute__((aligned(8)));

void array_shift(void)
{
    int i;
    for (i = 0; i &lt; 512; i++)
        dst[i] = src[i] &gt;&gt; 7;
}</pre>
</blockquote>
<p><span id="more-83"></span>The <code>aligned(8)</code> attribute is not required to trigger the bug; it merely removes some clutter from the generated assembler. Slightly edited for readability, the assembler output from the compiler looks like this:</p>
<blockquote>
<pre>array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        vmov.i32    d17, #249  @ v8qi
        mov         r1, #0
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3]
        cmp         r1, #512
        vshl.u8     d16, d16, d17
        vstr        d16, [r2]
        bne         .L2
        bx          lr</pre>
</blockquote>
<p>The vectoriser has done its job and decided to use NEON vector operations to process 8 elements in parallel. The mysterious-looking constant 249 is simply the 8-bit representation of -7. The error is in using the <code>vmov.i32</code> instruction, which writes an immediate value into all <strong>32-bit</strong> elements of the destination register. Using the resulting vector as the shift amount with the <code>vshl.u8</code>, which operates on vectors of <strong>8-bit</strong> data, clearly will not work as intended. Only one in four elements of the array will be shifted, the rest being copied unchanged. The <code>v8qi</code> annotation next to the incorrect instruction is of particular interest. It indicates that the compiler in fact intended to create an 8-element vector of 8-bit values. The translation of this operation into an assembler instruction seems to have gone horribly wrong. A vmov.i8 instruction would have been correct.</p>
<p>As an experiment, I changed arrays to <code>unsigned short</code>, i.e. 16-bit, elements. This is what the compiler produced:</p>
<blockquote>
<pre>array_shift:
        movw        ip, #:lower16:dst
        movw        r0, #:lower16:src
        movt        ip, #:upper16:dst
        movt        r0, #:upper16:src
        mov         r1, #0
        vldr        d17, .L6
.L2:
        add         r2, ip, r1
        add         r3, r0, r1
        add         r1, r1, #8
        vldr        d16, [r3, #0]
        cmp         r1, #1024
        vshl.u16    d16, d16, d17
        vstr        d16, [r2, #0]
        bne         .L2
        bx          lr
.L7:
        .align      3
.L6:
        .short      -7
        .short      0
        .short      -7
        .short      0</pre>
</blockquote>
<p>The immediate operand of the <code>vmov</code> instruction is limited to 8 bits, so the compiler has decided to load the constant vector from a literal pool following the function. The constant it has placed there is perfectly analogous to the flawed value from the first test: the 16-bit representation of -7 zero-extended into a vector of 32-bit elements.</p>
<p>Finally, I replaced the right shift with a left shift. To my astonishment, the compiler generated the correct <code>vmov.i8</code> instruction (with a constant of +7). It even repeated this feat with 16-bit arrays.</p>
<p>CodeSourcery insist they subject every compiler release to an extensive test suite. Evidently it does not extend to cover the right shift operator.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/11/28/codesourcery-fails-again/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>CodeSourcery&#8217;s defence</title>
		<link>http://hardwarebug.org/2008/10/14/codesourcerys-defence/</link>
		<comments>http://hardwarebug.org/2008/10/14/codesourcerys-defence/#comments</comments>
		<pubDate>Tue, 14 Oct 2008 03:22:50 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=21</guid>
		<description><![CDATA[Having covered the spectacular failure of CodeSourcery&#8217;s latest ARM compiler a few days ago, I was engaged in a curious debate on IRC with one of their employees. Fiercely denying the problem at first, he eventually offered an explanation: they do not test the compiler output on real hardware; they use QEMU. QEMU is a [...]]]></description>
			<content:encoded><![CDATA[<p>Having covered the <a href="http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/">spectacular failure</a> of CodeSourcery&#8217;s latest ARM compiler a few days ago, I was engaged in a curious <a href="http://www.beagleboard.org/irclogs/index.php?date=2008-10-11#T11:31:19">debate on IRC</a> with one of their employees. Fiercely denying the problem at first, he eventually offered an explanation: they do not test the compiler output on real hardware; they use <a href="http://bellard.org/qemu/">QEMU</a>.</p>
<p>QEMU is a CPU emulator supporting a variety of targets. While great for casual development, and for running foreign applications, it is certainly no substitute for real hardware when testing a compiler. Like any piece of software, an emulator is bound to have a few errors, and as it happens, QEMU has known bugs in its handling of the NEON instruction set. Our friend at CodeSourcery should be well aware of these, also being a QEMU developer.</p>
<p>The use of emulators was explained as a necessity due to real hardware not being available. To be fair, CodeSourcery does develop against new hardware before it exists, so some reliance on emulators is unavoidable. This is, however, not the case this time. The <a href="http://elinux.org/BeagleBoard">Beagleboard</a> was made available to selected developers quite some time ago (I have had one since May, others still longer), and is now being sold by the thousands. CodeSourcery developers, so I am told, were also given an offer of a free board, an offer they chose to refuse.</p>
<p>What does all this mean? Did Murphy decide to inflict maximum bad luck on the hard-working developers, or is there perhaps a larger conspiracy at work? I shall not attempt to speculate in this matter. I will merely repeat this excellent piece of advice given by Robert J. Hanlon: <em>Never attribute to malice that which can be adequately explained by incompetence.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/10/14/codesourcerys-defence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CodeSourcery GCC 2008q3: FAIL</title>
		<link>http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/</link>
		<comments>http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/#comments</comments>
		<pubDate>Sat, 11 Oct 2008 13:48:23 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Bugs]]></category>
		<category><![CDATA[Compilers]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=11</guid>
		<description><![CDATA[A few days ago, CodeSourcery released their latest version of GCC for ARM, dubbed 2008q3. An announcement email boasts &#8220;Improved support for NEON and, in particular, auto-vectorization using NEON.&#8221; It is time to put that claim to the test. FFmpeg has a history of triggering compiler bugs, making it a good test case. Some extra [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago, CodeSourcery released their latest version of GCC for ARM, dubbed 2008q3. An <a href="http://www.codesourcery.com/archives/arm-gnu-announce/msg00024.html">announcement email</a> boasts &#8220;Improved support for NEON and, in particular, auto-vectorization using NEON.&#8221; It is time to put that claim to the test.<a href="http://ffmpeg.org/"></a></p>
<p><a href="http://ffmpeg.org/"> FFmpeg</a> has a history of triggering compiler bugs, making it a good test case. Some extra speed would do it good as well.</p>
<p>The new compiler builds FFmpeg without complaint, so everything is looking good so far. To check for any speedup from the improved compiler, I use an <a href="http://movies.apple.com/movies/paramount/indiana_jones_4/indiana_jones_4-tlr3_h640w.mov">Indiana Jones trailer</a> encoded with H.264. Disappointingly, I am unable to get any speed figures. The decoding stops after 160 frames, the immediate cause being an unaligned NEON load in simple loop for copying a few bytes.</p>
<p>Is FFmpeg broken? The same code built with an older compiler release works perfectly, and the parameters passed to the failing function are similar-looking. The answer must lie in the copy loop itself. To verify this hypothesis, I set out to reproduce the error with a minimal test case.</p>
<p>The failure proves remarkably simple to trigger. The test case I arrive at consists of two C source files. The first file is our copy loop:</p>
<blockquote><pre>void copy(char *dst, char *src, int len)
{
    int i;
    for (i = 0; i &lt; len; i++)
        dst[i] = src[i];
}</pre>
</blockquote>
<p>The second file is our <code>main()</code> function, invoking the copy with suitably unaligned arguments:</p>
<blockquote><pre>extern void copy(char *dst, char *src, int len);
char src[20], dst[16];

int main(void)
{
    char *p = src + !((unsigned)src &amp; 1);
    copy(dst, p, 16);
    return 0;
}</pre>
</blockquote>
<p>Compiling this with <code>-mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -O3</code> flags results in a broken executable. Adding <code>-fno-tree-vectorize</code> makes the error go away.</p>
<p>So much for the improved auto-vectorisation.</p>
<p>Not testing every compiler on FFmpeg is understandable. Not testing even the most trivial of constructs is unforgivable.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/10/11/codesourcery-gcc-2008q3-fail/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://movies.apple.com/movies/paramount/indiana_jones_4/indiana_jones_4-tlr3_h640w.mov" length="16215526" type="video/quicktime" />
		</item>
		<item>
		<title>F00F C7C8</title>
		<link>http://hardwarebug.org/2008/10/07/f00f-c7c8/</link>
		<comments>http://hardwarebug.org/2008/10/07/f00f-c7c8/#comments</comments>
		<pubDate>Tue, 07 Oct 2008 21:27:50 +0000</pubDate>
		<dc:creator>Mans</dc:creator>
				<category><![CDATA[Bugs]]></category>

		<guid isPermaLink="false">http://hardwarebug.org/?p=4</guid>
		<description><![CDATA[Possibly the most well-known CPU bug in modern times, the Intel Pentium F00F bug will serve well as the topic of this first post. Read all about it at http://x86.ddj.com/errata/dec97/f00fbug.htm.]]></description>
			<content:encoded><![CDATA[<p>Possibly the most well-known CPU bug in modern times, the Intel Pentium F00F bug will serve well as the topic of this first post.</p>
<p>Read all about it at <a href="http://x86.ddj.com/errata/dec97/f00fbug.htm">http://x86.ddj.com/errata/dec97/f00fbug.htm</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://hardwarebug.org/2008/10/07/f00f-c7c8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

