IJG swings again, and misses

Earlier this month the IJG unleashed version 8 of its ubiquitous libjpeg library on the world. Eager to try out the “major breakthrough in image coding technology” promised in the README file accompanying v7, I downloaded the release. A glance at the README file suggests something major indeed is afoot:

Version 8.0 is the first release of a new generation JPEG standard to overcome the limitations of the original JPEG specification.

The text also hints at the existence of a document detailing these marvellous new features, and a Google search later a copy has found its way onto my monitor. As I read, however, my state of mind shifts from an initial excited curiosity, through bewilderment and disbelief, finally arriving at pure merriment. Continue reading

ARM compiler update

Since my last shootout,  all the tested vendors have updated their compilers. Here is a quick update on each of them.

Both the 4.3 and 4.4 branches of FSF GCC have had bugfix releases, bringing them to 4.3.4 and 4.4.2, respectively. Neither update contains anything particularly noteworthy.

The CodeSourcery 2009q3 release sees an update to a GCC 4.4 base, a significant change from the 4.3 base used in 2009q1. The update is a mixed blessing. In fact, it is mostly a curse and hardly a blessing at all. On the bright side, the floating-point speed regressions in 2009q1 are gone, 2009q3 being a few per cent faster even than 2007q3. Unfortunately, this improvement is completely overshadowed by a major speed regression on integer code, a whopping 24% in one case. This ties in with the slowdown previously observed with FSF GCC 4.4 compared to 4.3.

ARM RVCT 4.0 is now at Build 697. This update fixes some bugs and introduces others. Notably, it no longer builds FFmpeg correctly. The issue has been reported to ARM.

Texas Instruments, finally, have made a formal release, v4.6.1, of their TMS470 compiler incorporating various fixes allowing it to build a moderately patched FFmpeg. The performance remains somewhere between GCC and RVCT on average.

In light of the above, my recommendations remain unchanged:

  • For a free compiler, choose CodeSourcery 2009q1. It beats GCC 4.3.4 by 5-10% in most cases.
  • GNU purists are best served by GCC 4.3.4, which is up to 20% faster than 4.4.2 and rarely slower.
  • When price is not a concern, ARM RCVT is a good option, outperforming GCC by up to a factor 2.
  • In all cases, disable any auto-vectorisation features.

Regardless of which compiler is chosen, I cannot overstress the importance of testing. All compilers are crawling with bugs, and even the most innocent-looking code change can trigger one of them. When using a compiler other than GCC, extra caution is advised considering a lot of code is developed using only GCC and may thus fall prey to bugs unique to said other compiler.

Beware the builtins

GCC includes a large number of builtin functions allegedly providing optimised code for common operations not easily expressed directly in C. Rather than taking such claims at face value (this is GCC after all), I decided to conduct a small investigation to see how well a few of these functions are actually implemented for various targets.

For my test, I selected the following functions:

  • __builtin_bswap32: Byte-swap a 32-bit word.
  • __builtin_bswap64: Byte-swap a 64-bit word.
  • __builtin_clz: Count leading zeros in a word.
  • __builtin_ctz: Count trailing zeros in a word.
  • __builtin_prefetch: Prefetch data into cache.

To test the quality of these builtins, I wrapped each in a normal function, then compiled the code for these targets:

  • ARMv7
  • AVR32
  • MIPS
  • MIPS64
  • PowerPC
  • PowerPC64
  • x86
  • x86_64

In all cases I used compiler flags were -O3 -fomit-frame-pointer plus any flags required to select a modern CPU model.
Continue reading