ARM compiler shoot-out, round 2

In my recent test of ARM compilers, I had to leave out Texas Instrument’s compiler since it failed to build FFmpeg. Since then, the TI compiler team has been busy fixing bugs, and a snapshot I was given to test was able to build enough of a somewhat patched FFmpeg that I can now present round two in this shoot-out.

The contenders this time were the fastest GCC variant from round one, ARM RVCT, and newcomer TI TMS470. With the same rules as last time, the exact versions and optimisation options were like this:

  • CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros
  • TI TMS470 4.7.0-a9229, -float_support=vfpv3 -mv=7a8 -O3 -mf=5

Continue reading

DRM the Big Blue way

A few months ago, I downloaded an evaluation copy of IBM’s XLC compiler to try it out on FFmpeg. The trial licence has now expired, so what better way to spend a few minutes than by cracking it?

The installation script, as expected, copied a number of files into a directory under /opt. More unusually, it also created a small shared library, libxlc101e.so.1, and placed it in /usr/lib. No other files from the installation package were modified, so this must be where the licence is hiding. Without further ado, we proceed to take it apart.
Continue reading

ARM compiler shoot-out

A proper comparison of different compilers targeting ARM is long overdue, so I decided to do my part. I compiled FFmpeg using a selection of compilers, and measured the speed of the result when decoding various media samples. Since we are testing compilers, I disabled all hand-written assembler. The tests were run on a Beagle board clocked at 600 MHz.

These are the compilers I deemed worthy to participate in the test and the optimisation flags I used with each:

  • GCC 4.3.3, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • GCC 4.4.1, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • CodeSourcery GCC 2007q3 (based on 4.2.1), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-tree-vectorize
  • CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros

I would have also included the ARM compiler from Texas Instruments, had it been able to compile FFmpeg.
Continue reading

IJG is back

When FFmpeg released version 0.5 earlier this year, nearly five years had passed since the previous release, during which time the project had attracted frequent criticism for the lack of regular releases. There exists, however, a project whose release interval dwarves that of FFmpeg. I speak of the Independent JPEG Group’s libjpeg, version 7 of which was recently released after 11 years of silence.

So what have they been doing during the last 11 years? Not a lot, it seems. The only change log entry I find noteworthy is the addition of arithmetic entropy coding, previously omitted due to patent concerns. Contrast this with the TO DO note from the previous release:

The major thrust for v7 will probably be improvement of visual quality. The current method for scaling the quantization tables is known not to be very good at low Q values.  We also intend to investigate block boundary smoothing, “poor man’s variable quantization”, and other means of improving quality-vs-file-size performance without sacrificing compatibility.

In future versions, we are considering supporting some of the upcoming JPEG Part 3 extensions — principally, variable quantization and the SPIFF file format.

As always, speeding things up is of great interest.

Eleven years is of course plenty of time for the developers to change their minds, or perhaps even lose them. The TO DO note in version 7 reads thus:
Continue reading

GCC makes a mess

Following up on a report about FFmpeg being slower at MPEG audio decoding than MAD, I compared the speed of the two decoders on a few machines. FFmpeg came out somewhat ahead of MAD on most of my test systems with the exception of 32-bit PowerPC. On the PPC MAD was nearly twice as fast as FFmpeg, suggesting something was going badly wrong in the compilation.

A session with oprofile exposes multiplication as the root of the problem. The MPEG audio decoder in FFmpeg includes many operations of the form a += b * c where b and c are 32 bits in size and a is 64-bit. 64-bit maths on a 32-bit CPU is not handled well by GCC, even when good hardware support is available. A couple of examples compiled with GCC 4.3.3 illustrate this.
Continue reading