Jan 14 2010

Beware the builtins

GCC includes a large number of builtin functions allegedly providing optimised code for common operations not easily expressed directly in C. Rather than taking such claims at face value (this is GCC after all), I decided to conduct a small investigation to see how well a few of these functions are actually implemented for various targets.

For my test, I selected the following functions:

  • __builtin_bswap32: Byte-swap a 32-bit word.
  • __builtin_bswap64: Byte-swap a 64-bit word.
  • __builtin_clz: Count leading zeros in a word.
  • __builtin_ctz: Count trailing zeros in a word.
  • __builtin_prefetch: Prefetch data into cache.

To test the quality of these builtins, I wrapped each in a normal function, then compiled the code for these targets:

  • ARMv7
  • AVR32
  • MIPS
  • MIPS64
  • PowerPC
  • PowerPC64
  • x86
  • x86_64

In all cases I used compiler flags were -O3 -fomit-frame-pointer plus any flags required to select a modern CPU model.
Continue reading


Aug 20 2009

ARM compiler shoot-out, round 2

In my recent test of ARM compilers, I had to leave out Texas Instrument’s compiler since it failed to build FFmpeg. Since then, the TI compiler team has been busy fixing bugs, and a snapshot I was given to test was able to build enough of a somewhat patched FFmpeg that I can now present round two in this shoot-out.

The contenders this time were the fastest GCC variant from round one, ARM RVCT, and newcomer TI TMS470. With the same rules as last time, the exact versions and optimisation options were like this:

  • CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros
  • TI TMS470 4.7.0-a9229, --float_support=vfpv3 -mv=7a8 -O3 -mf=5

Continue reading


Aug 10 2009

DRM the Big Blue way

A few months ago, I downloaded an evaluation copy of IBM’s XLC compiler to try it out on FFmpeg. The trial licence has now expired, so what better way to spend a few minutes than by cracking it?

The installation script, as expected, copied a number of files into a directory under /opt. More unusually, it also created a small shared library, libxlc101e.so.1, and placed it in /usr/lib. No other files from the installation package were modified, so this must be where the licence is hiding. Without further ado, we proceed to take it apart.
Continue reading


Aug 5 2009

ARM compiler shoot-out

A proper comparison of different compilers targeting ARM is long overdue, so I decided to do my part. I compiled FFmpeg using a selection of compilers, and measured the speed of the result when decoding various media samples. Since we are testing compilers, I disabled all hand-written assembler. The tests were run on a Beagle board clocked at 600 MHz.

These are the compilers I deemed worthy to participate in the test and the optimisation flags I used with each:

  • GCC 4.3.3, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • GCC 4.4.1, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • CodeSourcery GCC 2007q3 (based on 4.2.1), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-tree-vectorize
  • CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
  • ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros

I would have also included the ARM compiler from Texas Instruments, had it been able to compile FFmpeg.
Continue reading


Aug 4 2009

IJG is back

When FFmpeg released version 0.5 earlier this year, nearly five years had passed since the previous release, during which time the project had attracted frequent criticism for the lack of regular releases. There exists, however, a project whose release interval dwarves that of FFmpeg. I speak of the Independent JPEG Group’s libjpeg, version 7 of which was recently released after 11 years of silence.

So what have they been doing during the last 11 years? Not a lot, it seems. The only change log entry I find noteworthy is the addition of arithmetic entropy coding, previously omitted due to patent concerns. Contrast this with the TO DO note from the previous release:

The major thrust for v7 will probably be improvement of visual quality. The current method for scaling the quantization tables is known not to be very good at low Q values.  We also intend to investigate block boundary smoothing, “poor man’s variable quantization”, and other means of improving quality-vs-file-size performance without sacrificing compatibility.

In future versions, we are considering supporting some of the upcoming JPEG Part 3 extensions — principally, variable quantization and the SPIFF file format.

As always, speeding things up is of great interest.

Eleven years is of course plenty of time for the developers to change their minds, or perhaps even lose them. The TO DO note in version 7 reads thus:
Continue reading