ARM compiler shoot-out

A proper comparison of different compilers targeting ARM is long overdue, so I decided to do my part. I compiled FFmpeg using a selection of compilers, and measured the speed of the result when decoding various media samples. Since we are testing compilers, I disabled all hand-written assembler. The tests were run on a Beagle board clocked at 600 MHz.

These are the compilers I deemed worthy to participate in the test and the optimisation flags I used with each:

GCC 4.3.3, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
GCC 4.4.1, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
CodeSourcery GCC 2007q3 (based on 4.2.1), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-tree-vectorize
CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros

I would have also included the ARM compiler from Texas Instruments, had it been able to compile FFmpeg.

With sample files chosen to exercise various types of code, the result of the test is, sadly, no surprise. The following table lists the runtimes of the different builds relative to the CodeSourcery 2007q3 build. Lower numbers are better.

Sample name	Codec	Code type	2009q1	4.3.3	4.4.1	RVCT
cathedral	H.264 CABAC	integer	0.97	1.02	1.09	0.93
NeroAVC	H.264 CABAC	integer	0.98	1.02	1.12	0.95
indiana_jones_4	H.264 CAVLC	integer	0.97	1.02	1.09	0.89
NeroRecodeSample	MPEG-4 ASP	integer	0.96	1.03	1.27	0.96
Silent_Light	MP3	64-bit integer	0.89	0.88	0.97	0.44
When_I_Grow_Up	FLAC	integer	0.98	0.98	0.93	0.86
Lumme-Badloop	Vorbis	float	1.03	1.03	1.02	0.97
Canyon	AC-3	float	1.02	1.02	0.99	0.90
lotr	DTS	float	1.02	1.02	1.00	1.03

Looking at the table, I make these observations:

CodeSourcery 2009q1 produces faster integer code, but slower floating-point code, than 2007q3.
GCC 4.4.1 produces much slower code than 4.3.3 in several cases, and is never significantly better.
CodeSourcery GCC generally beats FSF GCC.
ARM RVCT readily beats every GCC version. The MP3 figure is not a typo.

My recommendation for a free compiler is CodeSourcery 2009q1 unless your code makes heavy use of floating-point, in which case 2007q3 may give better results. If you prefer, for whatever reason, official GNU releases, 4.3.3 should be the version of choice. Avoid GCC 4.4.1; it is far too unpredictable.

Bootnotes

See also Mike’s test of x86 compilers.
Thanks to ARM for providing the RVCT compiler.
Thanks to TI for providing the Beagle board.

Bookmark the permalink.

16 Responses to ARM compiler shoot-out

Erik Rainey says:

Wednesday, 5th August, 2009 at 2:30 am

Firstly, I work for TI, so my opinion is biased. In my experience the TI compiler is very good, when it supports the ARM revision that you are using. I found that it can dramatically decrease code size and can really leverage the unique features of the ARM assembly. I find it surprising that it didn’t compile the code. Are the specific errors that you can post? (Caveat: I also don’t work on the compiler, so I can’t directly fix these issues, but I might be able to find the people who can.)
- Mans says:
  
  Wednesday, 5th August, 2009 at 8:25 am
  
  I have already reported a number of errors to TI, and the compiler team is working on fixing them.
Vitor says:

Wednesday, 5th August, 2009 at 3:44 am

Wow, does this performance difference for MP3 remains when compiling with ASM enabled or it all comes down to badly compiled MULx() macros?
- Mans says:
  
  Wednesday, 5th August, 2009 at 8:32 am
  
  With inline asm enabled, the difference for MP3 is more in line with the other tests. I don’t remember the exact numbers.
Mark Mitchell says:

Wednesday, 5th August, 2009 at 3:56 am

What compilation options did you use with each of these compilers?
- Mans says:
  
  Wednesday, 5th August, 2009 at 8:42 am
  
  I’ve updated the post with the flags used.
kert says:

Wednesday, 5th August, 2009 at 3:22 pm

Does anyone know if any other optflags beyond -Os now work reliably in ARM code ?
A year or so back, -O2 produced very weird assembler-level mess here and there in the code ( a large c++ lib ) and from newsgroups i basically got this: dont use anything beyond -Os
mh says:

Tuesday, 11th August, 2009 at 2:12 am

I doubt that the RVCT options you quoted were the same you actually used. Looks more like a copy paste issue to me.

Did you use –vectorize with RVCT? Otherwise you might not be able to take full advantage of NEON. The capability of the compiler to vectorize code depends quite a bit on the C code structure, though.
- Mans says:
  
  Tuesday, 11th August, 2009 at 2:26 am
  
  I enabled gcc compatibility in RVCT so those options are exactly what I used. I’m not sure what RVCT-native options they map to.
  
  I did not use –vectorize for the simple reason that it fails to compile FFmpeg with vectorisation enabled.
asn says:

Wednesday, 12th August, 2009 at 1:38 pm

Hi,
How did you measure the performance ? Have you put any Linux distro on the board ?
- Mans says:
  
  Wednesday, 12th August, 2009 at 2:02 pm
  
  Yes, I’m running Linux on the board.
Fredrik says:

Wednesday, 19th August, 2009 at 11:06 am

Hi,
How did you measure the performance?
- Mans says:
  
  Wednesday, 19th August, 2009 at 12:37 pm
  
  I measured the CPU time needed to decode the samples.
salixman says:

Thursday, 18th November, 2010 at 11:13 am

How do I know what codec uses integer or float? Is this documented somewhere?
- Mans says:
  
  Friday, 19th November, 2010 at 8:39 am
  
  Video codecs never use floating-point. Audio codecs can be either fixed-point or floating-point.
Pingback: A few things iOS developers ought to know about the ARM architecture « AVATAR.Dev - iOS Developer Tips, Tricks and Tutorials.

ARM compiler shoot-out

Bootnotes

16 Responses to ARM compiler shoot-out

Recent Posts

Recent Comments

Categories

Archives

Meta