Ever since Apple released their iPhone SDK, the FFmpeg mailing lists have seen a steady stream of error reports from users attempting to build FFmpeg for the iPhone, and eventually they got my attention.
The iPhone is built around an ARM1176 CPU, so the SDK includes an ARM cross-compiler and assembler. Most of the reported errors originate from the Apple assembler which appears to have trouble processing the assembler source files from FFmpeg.
The source files use the GNU assembler syntax, and the Apple assembler is based on an old GNU version, so one might reasonably expect it to work. What I had not realised was just how old a version Apple based their assembler on. The version they chose was 1.38.1, released in January 1991, 18 years ago. Features which have since been added to the GNU assembler, and there are many, have not been merged by Apple. As a result, many special directives and macro features used in FFmpeg are not recognised by the Apple assembler, and modifying the code to work with this assembler would render it unusable with modern GNU versions.
Why not replace the assembler in the SDK with a GNU version, one might ask. The answer is that this is not possible. The Apple system uses an object file format, Mach-O, not supported by the GNU tools. The chances of Apple updating their assembler to support the newer syntax appear slim, so our best hope is for the GNU binutils package to gain support for the Mach-O format. This will need a lot of work, and a working version cannot be expected for yet some time.
While this incompatibility persists, those wishing to run an optimised FFmpeg build on their iPhone will have to rely on patches to make it palatable to the Apple assembler. Supporting the Apple syntax directly in FFmpeg is unfortunately not feasible.
Press releases are always rich riddled infested with current buzz-words, but this one is better than many.
The analytics-enabled video lifestyle management of the title is, apparently, some kind of video surveillance system targeted at home users. According to the press release, it uses mobile video intelligence (MVI), which has got to be a good thing, even having been given an acronym. With all this power, it delivers proactive, video-based information, and does so in a manner that fits today’s connected, mobile lifestyle.
This must be a truly amazing device. It provides users with better lifestyle management, and to top it off, the surveillance footage it supplies is allegedly so great that it also changes how consumers view video – from a passive, entertainment form to a source of rich, real-time information. Not a bad feat for a video of your back door, I must admit.
It recently came to my attention that the GNU linker on ARM lacks support for several relocation types in shared libraries. Specifically, code using MOVW/MOVT instruction pairs to load the address of data symbols will not work in a shared library. The linker silently drops the necessary relocations, resulting in a runtime crash.
When I pointed out this shortcoming to Paul Brook of CodeSourcery, his response was that such relocations in shared libraries are not supported by the GNU tools, will never be, and that shared libraries should be built with position-independent code (PIC). This is an unfortunate attitude, and doubly so considering that the latest CodeSourcery GCC version will generate these instructions with default settings. In other words, the 2008q3 release of CodeSourcery GCC will, with default flags, build crashing shared libraries without so much as a warning.
The refusal to support non-PIC shared libraries is unfortunate also from a performance point of view. Position independent code is inherently slower than normal code.
In order to find out just how much slower PIC is on ARM, I made two builds of FFmpeg, one normal and one with PIC. The PIC build is about 1.7% slower in several tests, among them H.264 video decoding.
On typically resource-constrained ARM systems it would be nice to have the option of space-saving shared libraries without paying the PIC penalty in performance. Until now this option has been a reality. With CodeSourcery lazily refusing to support the relocations required by the latest version of their own compiler, this option may soon be a thing of the past, at least if the bugs that have haunted recent compiler releases are fixed in upcoming versions.
The NEON coprocessor found in the Cortex-A8 operates asynchronously from the ARM pipeline, receiving its instructions from the ARM execution unit through a 16-entry FIFO. Furthermore, the NEON unit has its own load/store unit. This suggests that some mechanism exists to resolve data hazards between the ARM and NEON units such that memory operations appear as if the instructions were executed entirely in order.
Although clearly important with a view to code optimisation, the Cortex-A8 Technical Reference Manual unfortunately does not mention any details about these hazards. In fact, it does not mention them at all.
To sched some light on the situation, I ran a simple benchmark to determine two important parameters of ARM-NEON memory hazard resolution: granularity and latency.
The bug I discovered in CodeSourcery’s 2008q3 release of their GCC version was apparently deemed serious enough for the company to publish an updated release, tagged 2008q3-72, earlier this week. I took it for a test drive.
Since last time, I have updated the FFmpeg regression test scripts, enabling a cross-build to be easily tested on the target device. For the compiler test this means that much more code will be checked for correct operation compared to the rather limited tests I performed on previous versions. Having verified all tests passing when built with the 2007q3 release, I proceeded with the new 2008q3-72 compiler.
All but one of the FFmpeg regression tests passed. Converting a colour image to 1-bit monochrome format failed. A few minutes of detective work revealed the erroneous code, and a simple test case was easily extracted.
The test case looks strikingly familiar:
extern unsigned char dst __attribute__((aligned(8)));
extern unsigned char src __attribute__((aligned(8)));
for (i = 0; i < 512; i++)
dst[i] = src[i] >> 7;