<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: GCC makes a mess</title>
	<atom:link href="http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/feed/" rel="self" type="application/rss+xml" />
	<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/</link>
	<description>Everything is broken</description>
	<lastBuildDate>Mon, 30 Aug 2010 09:33:38 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Michael Kostylev</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-483</link>
		<dc:creator>Michael Kostylev</dc:creator>
		<pubDate>Tue, 01 Dec 2009 17:22:47 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-483</guid>
		<description>ColdFire (-mcpu=5475) is somehow supported, r20684 passes 308/310.</description>
		<content:encoded><![CDATA[<p>ColdFire (-mcpu=5475) is somehow supported, r20684 passes 308/310.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ami_stuff</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-71</link>
		<dc:creator>ami_stuff</dc:creator>
		<pubDate>Sun, 07 Jun 2009 12:42:20 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-71</guid>
		<description>Mans, could you try to create code for 68060? This way we will know if this slowdown is because of slow GCC asm generated code or maybe it&#039;s normal without hardware 32x32-&gt;64 and there is nothing what can be done without re-design of the mpegaudio decoder? Thanks</description>
		<content:encoded><![CDATA[<p>Mans, could you try to create code for 68060? This way we will know if this slowdown is because of slow GCC asm generated code or maybe it&#8217;s normal without hardware 32&#215;32-&gt;64 and there is nothing what can be done without re-design of the mpegaudio decoder? Thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bernd_afa</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-63</link>
		<dc:creator>Bernd_afa</dc:creator>
		<pubDate>Thu, 21 May 2009 10:53:06 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-63</guid>
		<description>I test this
-03 -m68020 
-m68040 same result

int64_t MULH(int a, int b)
{
	 return ((int64_t)(a) * (int64_t)(b))&gt;&gt;32;
}

main(int argc, char *argv[])
{

printf (&quot;%ld\n&quot;,MULH(argc,(long)argv));
}

the code for 68060 is very inefficent, because there is no asm macro in longlong.h
the command MULS.L  D4,D2:D6 is also not support on the UAE JIT and is slow execute by interpreter.

i think in gcc longlong.h is miss code for 68060.
I see coldfire code, but this code is too so complex, is a 32 bit *32 bit -&gt; 64 bit result not easier possible ? 

gcc 3.4.0
MOVE.L  8(A5),D7      
MOVE.L  $C(A5),D4     
BSR.L   ___main      ;
MOVE.L  D7,D6         
MULS.L  D4,D2:D6      
MOVE.L  D2,-(A7)      
SMI     D0            
EXTB.L  D0            
MOVE.L  D0,-(A7)      
PEA     _MAC64+$2E(PC)
LEA     _printf,A3   ;


gcc 4.3.2

MOVE.L  $C(A5),D1          
MULS.L  8(A5),D0:D1        
MOVE.L  D0,-(A7)           
SMI     D2                 
EXTB.L  D2                 
MOVE.L  D2,-(A7)           
PEA     _time_delay+$F8(PC)
LEA     _printf,A3   ;10F94

gcc 4.4.0

MOVE.L  8(A5),D2            
MOVE.L  $C(A5),D3           
JSR     ___main      ;10FD50
MOVE.L  D3,D1               
MULS.L  D2,D0:D1            
MOVE.L  D0,-(A7)            
SMI     D2                  
EXTB.L  D2                  
MOVE.L  D2,-(A7)            
PEA     _time_delay+$FA(PC) 
LEA     _printf,A3   ;10FD53

now with -m68060

MOVE.L  8(A5),-(A7)         
SMI     D0                  
EXTB.L  D0                  
MOVE.L  D0,-(A7)            
MOVE.L  $C(A5),-(A7)        
SMI     D2                  
EXTB.L  D2                  
MOVE.L  D2,-(A7)            
JSR     ___muldi3    ;110504
LEA     $10(A7),A7          
MOVE.L  D0,-(A7)            
SMI     D2                  
EXTB.L  D2                  
MOVE.L  D2,-(A7)            
PEA     _time_delay+$FA(PC) 
LEA     _printf,A3   ;110508
JSR     (A3)                


......

___muldi3                                     
110504A8: MOVE.L  A5,-(A7)                    
110504AA: MOVEA.L A7,A5                       
110504AC: MOVEM.L D2-D7/A2,-(A7)              
110504B0: MOVE.L  $C(A5),D5                   
110504B4: MOVE.L  $14(A5),D6                  
110504B8: MOVEA.L 8(A5),A2                    
110504BC: MOVE.L  $10(A5),D7                  
110504C0: MOVE.L  D5,D0                       
110504C2: MOVE.L  D6,D1                       
110504C4: MOVE.L  D0,D2                       
110504C6: SWAP    D0                          
110504C8: MOVE.L  D1,D3                       
110504CA: SWAP    D1                          
110504CC: MOVE    D2,D4                       
110504CE: MULU    D3,D4                       
110504D0: MULU    D1,D2                       
110504D2: MULU    D0,D3                       
110504D4: MULU    D0,D1                       
110504D6: MOVE.L  D4,D0                       
110504D8: EOR     D0,D0                       
110504DA: SWAP    D0                          
110504DC: ADD.L   D0,D2                       
110504DE: ADD.L   D3,D2                       
110504E0: BCC.S   ___muldi3+$40 ;110504E8     
110504E2: ADDI.L  #$10000,D1                  
110504E8: SWAP    D2                          
110504EA: MOVEQ   #0,D0                       
110504EC: MOVE    D2,D0                       
110504EE: MOVE    D4,D2                       
110504F0: MOVEA.L D2,A1                       
110504F2: ADD.L   D1,D0                       
110504F4: MOVEA.L D0,A0                       
110504F6: MOVE.L  A1,D1                       
110504F8: MULS.L  D7,D5                       
110504FC: MOVE.L  A2,D2                       
110504FE: MULS.L  D2,D6                       
11050502: ADD.L   D6,D5                       
11050504: ADD.L   A0,D5                       
11050506: MOVE.L  D5,D0                       
11050508: MOVEM.L (A7)+,D2-D7/A2              
1105050C: UNLK    A5                          
1105050E: RTS</description>
		<content:encoded><![CDATA[<p>I test this<br />
-03 -m68020<br />
-m68040 same result</p>
<p>int64_t MULH(int a, int b)<br />
{<br />
	 return ((int64_t)(a) * (int64_t)(b))&gt;&gt;32;<br />
}</p>
<p>main(int argc, char *argv[])<br />
{</p>
<p>printf (&#8220;%ld\n&#8221;,MULH(argc,(long)argv));<br />
}</p>
<p>the code for 68060 is very inefficent, because there is no asm macro in longlong.h<br />
the command MULS.L  D4,D2:D6 is also not support on the UAE JIT and is slow execute by interpreter.</p>
<p>i think in gcc longlong.h is miss code for 68060.<br />
I see coldfire code, but this code is too so complex, is a 32 bit *32 bit -&gt; 64 bit result not easier possible ? </p>
<p>gcc 3.4.0<br />
MOVE.L  8(A5),D7<br />
MOVE.L  $C(A5),D4<br />
BSR.L   ___main      ;<br />
MOVE.L  D7,D6<br />
MULS.L  D4,D2:D6<br />
MOVE.L  D2,-(A7)<br />
SMI     D0<br />
EXTB.L  D0<br />
MOVE.L  D0,-(A7)<br />
PEA     _MAC64+$2E(PC)<br />
LEA     _printf,A3   ;</p>
<p>gcc 4.3.2</p>
<p>MOVE.L  $C(A5),D1<br />
MULS.L  8(A5),D0:D1<br />
MOVE.L  D0,-(A7)<br />
SMI     D2<br />
EXTB.L  D2<br />
MOVE.L  D2,-(A7)<br />
PEA     _time_delay+$F8(PC)<br />
LEA     _printf,A3   ;10F94</p>
<p>gcc 4.4.0</p>
<p>MOVE.L  8(A5),D2<br />
MOVE.L  $C(A5),D3<br />
JSR     ___main      ;10FD50<br />
MOVE.L  D3,D1<br />
MULS.L  D2,D0:D1<br />
MOVE.L  D0,-(A7)<br />
SMI     D2<br />
EXTB.L  D2<br />
MOVE.L  D2,-(A7)<br />
PEA     _time_delay+$FA(PC)<br />
LEA     _printf,A3   ;10FD53</p>
<p>now with -m68060</p>
<p>MOVE.L  8(A5),-(A7)<br />
SMI     D0<br />
EXTB.L  D0<br />
MOVE.L  D0,-(A7)<br />
MOVE.L  $C(A5),-(A7)<br />
SMI     D2<br />
EXTB.L  D2<br />
MOVE.L  D2,-(A7)<br />
JSR     ___muldi3    ;110504<br />
LEA     $10(A7),A7<br />
MOVE.L  D0,-(A7)<br />
SMI     D2<br />
EXTB.L  D2<br />
MOVE.L  D2,-(A7)<br />
PEA     _time_delay+$FA(PC)<br />
LEA     _printf,A3   ;110508<br />
JSR     (A3)                </p>
<p>&#8230;&#8230;</p>
<p>___muldi3<br />
110504A8: MOVE.L  A5,-(A7)<br />
110504AA: MOVEA.L A7,A5<br />
110504AC: MOVEM.L D2-D7/A2,-(A7)<br />
110504B0: MOVE.L  $C(A5),D5<br />
110504B4: MOVE.L  $14(A5),D6<br />
110504B8: MOVEA.L 8(A5),A2<br />
110504BC: MOVE.L  $10(A5),D7<br />
110504C0: MOVE.L  D5,D0<br />
110504C2: MOVE.L  D6,D1<br />
110504C4: MOVE.L  D0,D2<br />
110504C6: SWAP    D0<br />
110504C8: MOVE.L  D1,D3<br />
110504CA: SWAP    D1<br />
110504CC: MOVE    D2,D4<br />
110504CE: MULU    D3,D4<br />
110504D0: MULU    D1,D2<br />
110504D2: MULU    D0,D3<br />
110504D4: MULU    D0,D1<br />
110504D6: MOVE.L  D4,D0<br />
110504D8: EOR     D0,D0<br />
110504DA: SWAP    D0<br />
110504DC: ADD.L   D0,D2<br />
110504DE: ADD.L   D3,D2<br />
110504E0: BCC.S   ___muldi3+$40 ;110504E8<br />
110504E2: ADDI.L  #$10000,D1<br />
110504E8: SWAP    D2<br />
110504EA: MOVEQ   #0,D0<br />
110504EC: MOVE    D2,D0<br />
110504EE: MOVE    D4,D2<br />
110504F0: MOVEA.L D2,A1<br />
110504F2: ADD.L   D1,D0<br />
110504F4: MOVEA.L D0,A0<br />
110504F6: MOVE.L  A1,D1<br />
110504F8: MULS.L  D7,D5<br />
110504FC: MOVE.L  A2,D2<br />
110504FE: MULS.L  D2,D6<br />
11050502: ADD.L   D6,D5<br />
11050504: ADD.L   A0,D5<br />
11050506: MOVE.L  D5,D0<br />
11050508: MOVEM.L (A7)+,D2-D7/A2<br />
1105050C: UNLK    A5<br />
1105050E: RTS</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ami_stuff</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-62</link>
		<dc:creator>ami_stuff</dc:creator>
		<pubDate>Thu, 21 May 2009 01:30:06 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-62</guid>
		<description>so for 68060 CPU libmad is the best choice</description>
		<content:encoded><![CDATA[<p>so for 68060 CPU libmad is the best choice</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ami_stuff</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-61</link>
		<dc:creator>ami_stuff</dc:creator>
		<pubDate>Thu, 21 May 2009 01:27:14 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-61</guid>
		<description>Bad News. This all speedup was because I compared 68060 build of FFmpeg with 68040 build. 68060 build needs to emulate &quot;muls&quot; instruction - here is a speedup. Also, 68040 build generates different wav file compared to 68060 build.

When I compile with your MAC64 &amp; MLS64 functions I get only 1 sec. speedup.

MULH don&#039;t want to compile - statement &#039;mul.l (a6),d2:d1&#039; ignored etc.</description>
		<content:encoded><![CDATA[<p>Bad News. This all speedup was because I compared 68060 build of FFmpeg with 68040 build. 68060 build needs to emulate &#8220;muls&#8221; instruction &#8211; here is a speedup. Also, 68040 build generates different wav file compared to 68060 build.</p>
<p>When I compile with your MAC64 &amp; MLS64 functions I get only 1 sec. speedup.</p>
<p>MULH don&#8217;t want to compile &#8211; statement &#8216;mul.l (a6),d2:d1&#8242; ignored etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mans</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-60</link>
		<dc:creator>Mans</dc:creator>
		<pubDate>Wed, 20 May 2009 23:48:20 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-60</guid>
		<description>Could you try disabling the asm functions one at a time and see which causes the discrepancy?</description>
		<content:encoded><![CDATA[<p>Could you try disabling the asm functions one at a time and see which causes the discrepancy?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ami_stuff</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-59</link>
		<dc:creator>ami_stuff</dc:creator>
		<pubDate>Wed, 20 May 2009 23:10:16 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-59</guid>
		<description>I&#039;m running it on WinUAE emulator. I hear no different, but &quot;find dups&quot; program don&#039;t recognize file decoded with standard FFmpeg and asm-optimized as identicial. I can decode some short file and send it to you to mail if you want, so you can analyze it.</description>
		<content:encoded><![CDATA[<p>I&#8217;m running it on WinUAE emulator. I hear no different, but &#8220;find dups&#8221; program don&#8217;t recognize file decoded with standard FFmpeg and asm-optimized as identicial. I can decode some short file and send it to you to mail if you want, so you can analyze it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mans</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-58</link>
		<dc:creator>Mans</dc:creator>
		<pubDate>Wed, 20 May 2009 23:05:44 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-58</guid>
		<description>What hardware are you running this on? Can you confirm that the decoded output is correct, please?

68060 doesn&#039;t have the 32x32-&gt;64 multiply instruction, but I&#039;m sure it&#039;s not too hard to beat gcc even without it.

The floating point rounding functions are not used in speed-critical places so there is no need to optimise them.</description>
		<content:encoded><![CDATA[<p>What hardware are you running this on? Can you confirm that the decoded output is correct, please?</p>
<p>68060 doesn&#8217;t have the 32&#215;32->64 multiply instruction, but I&#8217;m sure it&#8217;s not too hard to beat gcc even without it.</p>
<p>The floating point rounding functions are not used in speed-critical places so there is no need to optimise them.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ami_stuff</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-57</link>
		<dc:creator>ami_stuff</dc:creator>
		<pubDate>Wed, 20 May 2009 22:55:31 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-57</guid>
		<description>It beats libmad by 1 sec. With more optimizations like asm llrint() it will be even faster.

Is there a way to asm optimized code for 68060 build too?</description>
		<content:encoded><![CDATA[<p>It beats libmad by 1 sec. With more optimizations like asm llrint() it will be even faster.</p>
<p>Is there a way to asm optimized code for 68060 build too?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ami_stuff</title>
		<link>http://hardwarebug.org/2009/05/13/gcc-makes-a-mess/comment-page-1/#comment-56</link>
		<dc:creator>ami_stuff</dc:creator>
		<pubDate>Wed, 20 May 2009 22:50:56 +0000</pubDate>
		<guid isPermaLink="false">http://hardwarebug.org/?p=131#comment-56</guid>
		<description>Holly shit! Only 29 sec. now!</description>
		<content:encoded><![CDATA[<p>Holly shit! Only 29 sec. now!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
