ARM compiler shoot-out, round 2

In my recent test of ARM compilers, I had to leave out Texas Instrument’s compiler since it failed to build FFmpeg. Since then, the TI compiler team has been busy fixing bugs, and a snapshot I was given to test was able to build enough of a somewhat patched FFmpeg that I can now present round two in this shoot-out.

The contenders this time were the fastest GCC variant from round one, ARM RVCT, and newcomer TI TMS470. With the same rules as last time, the exact versions and optimisation options were like this:

CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros
TI TMS470 4.7.0-a9229, –-float_support=vfpv3 -mv=7a8 -O3 -mf=5

To keep things fair, I left the vectoriser off also with the TI compiler. The table below lists the decoding times for the sample files, this time normalised against the participating GCC compiler. Remember, smaller numbers are better. Also keep in mind that this test was done with a development snapshot of TMS470, not an approved release.

Sample name	Codec	Code type	GCC	RVCT	TI
cathedral	H.264 CABAC	integer	1.00	0.95	1.02
NeroAVC	H.264 CABAC	integer	1.00	0.96	1.05
indiana_jones_4	H.264 CAVLC	integer	1.00	0.92	1.02
NeroRecodeSample	MPEG-4 ASP	integer	1.00	1.01	1.08
Silent_Light	MP3	64-bit integer	1.00	0.48	0.72
When_I_Grow_Up	FLAC	integer	1.00	0.87	0.93
Lumme-Badloop	Vorbis	float	1.00	0.94	1.05
Canyon	AC-3	float	1.00	0.88	1.01
lotr	DTS	float	1.00	1.00	1.08

Overall, the TI TMS470 compiler comes off slightly worse than GCC. In two cases, however, it was significantly better than GCC, but not as good as RVCT. Incidentally, those were also the ones where RVCT scored the biggest win over GCC.

My conclusions from this test are twofold:

ARM’s own compiler is very hard to beat. They do seem to know how their chips work.
GCC is incredibly bad at 64-bit arithmetic on 32-bit machines.

The logical next step is to test these compilers with vectorisation enabled. FFmpeg should offer plenty of opportunities for this feature to shine. Unfortunately, that test will have to wait until the RVCT vectoriser is fixed. The current release does not compile FFmpeg with vectorisation enabled.

Bookmark the permalink.

12 Responses to ARM compiler shoot-out, round 2

veryzhang says:

Friday, 21st August, 2009 at 3:22 am

Does that mean, the gcc standard library for arm is not fully optimized for 64bit intergration arithmetic?
- Mans says:
  
  Friday, 21st August, 2009 at 7:21 am
  
  It is not library code that is slow, it is the ARM code generated by gcc from the C code that is bad. One thing gcc often does when doing 64-bit computations on a 32-bit is to set a register to zero (the upper half of a 32-bit number converted to 64-bit), then multiply something by it. It shouldn’t take much to realise that multiplying by zero produces zero, and that part of the calculation can be dropped.
Reimar says:

Tuesday, 25th August, 2009 at 1:01 pm

s/incredibly bad/useless/
Really, you just can’t use gcc to do multiplications larger than the native size if speed or code size are in any way relevant.
Anton Korobeynikov says:

Tuesday, 10th November, 2009 at 12:57 pm

Why are you using softfp ABI for the benchmarks? Hardware FP ABI is much better suited for such of applications and might yield a noticeable speedup.

PS: Could you please also include LLVM for the tests?
- Mans says:
  
  Tuesday, 10th November, 2009 at 1:08 pm
  
  Most of the compilers support only softfp ABI. I compared soft and hard with gcc-csl 2009q1, and there was very little difference. FFmpeg passes floats as arguments or return values in very few places.
  
  I’m planning a new round soon, and I’d be happy to include LLVM, if only I could figure out how to configure it as a cross-compiler.
  - Anton Korobeynikov says:
    
    Thursday, 12th November, 2009 at 1:18 pm
    
    Build llvm-gcc as usual gcc. There is even script to use codesourcery-provided binutils as a “bootstrap” toolchain.
    - Mans says:
      
      Thursday, 12th November, 2009 at 1:36 pm
      
      I see nothing usual about building llvm-gcc. There isn’t even a configure script.
      - Anton Korobeynikov says:
        
        Thursday, 12th November, 2009 at 9:19 pm
        
        Huh, how so? There is definitely one:
        http://llvm.org/viewvc/llvm-project/llvm-gcc-4.2/trunk/configure
        
        Make sure you checked out stuff properly. README.LLVM is also a good thing to read before building.
        
        There is a script for almost automatic build of cross-compilers. See http://llvm.org/viewvc/llvm-project/llvm/trunk/utils/crosstool/ARM/README
        
        Mans says:
        
        Thursday, 12th November, 2009 at 9:27 pm
        
        That is most definitely not what was in a tarball I downloaded. The only instructions I found involved a complicated procedure combining parts from an llvm base tarball with some llvm-gcc bits, and none of it made much sense, so I gave up.
        
        Anton Korobeynikov says:
        
        Friday, 13th November, 2009 at 1:12 pm
        
        Sorry, I really have no idea what you’ve downloaded. Release tarballs contains all the code from the SVN repository. For ARM stuff you might really want do checkout code from SVN, since after the 2.6 release bunch of stuff was fixed / improved.
Marcus Harnisch says:

Wednesday, 18th November, 2009 at 9:19 am

Just as a reminder: Unlike gcc, armcc optimizes for space by default. Please also specify -Otime to make sure it directs its effort to execution time. Please check this out, too. No guarantees, though.
- Mans says:
  
  Wednesday, 18th November, 2009 at 12:25 pm
  
  I already said I’m using –translate_gcc which maps -O3 to -O3 -Otime.

ARM compiler shoot-out, round 2

12 Responses to ARM compiler shoot-out, round 2

Recent Posts

Recent Comments

Categories

Archives

Meta