ARM - Hardwarebug

Cortex-A7 instruction cycle timings

Thursday, 15th May, 2014 - 3:15 pm | ARM

The Cortex-A7 ARM core is a popular choice in low-power and low-cost designs. Unfortunately, the public TRM does not include instruction timing information. It does reveal that execution is in-order which makes measuring the throughput and latency for individual instructions relatively straight-forward. Continue reading →

ARM inline asm secrets

Tuesday, 6th July, 2010 - 9:52 pm | ARM, Compilers

24 Comments

Although I generally recommend against using GCC inline assembly, preferring instead pure assembly code in separate files, there are occasions where inline is the appropriate solution. Should one, at a time like this, turn to the GCC documentation for guidance, one must be prepared for a degree of disappointment. As it happens, much of the inline asm syntax is left entirely undocumented. This article attempts to fill in some of the blanks for the ARM target.
Continue reading →

ARM compiler update

Friday, 15th January, 2010 - 6:48 pm | ARM, Compilers

2 Comments

Since my last shootout, all the tested vendors have updated their compilers. Here is a quick update on each of them.

Both the 4.3 and 4.4 branches of FSF GCC have had bugfix releases, bringing them to 4.3.4 and 4.4.2, respectively. Neither update contains anything particularly noteworthy.

The CodeSourcery 2009q3 release sees an update to a GCC 4.4 base, a significant change from the 4.3 base used in 2009q1. The update is a mixed blessing. In fact, it is mostly a curse and hardly a blessing at all. On the bright side, the floating-point speed regressions in 2009q1 are gone, 2009q3 being a few per cent faster even than 2007q3. Unfortunately, this improvement is completely overshadowed by a major speed regression on integer code, a whopping 24% in one case. This ties in with the slowdown previously observed with FSF GCC 4.4 compared to 4.3.

ARM RVCT 4.0 is now at Build 697. This update fixes some bugs and introduces others. Notably, it no longer builds FFmpeg correctly. The issue has been reported to ARM.

Texas Instruments, finally, have made a formal release, v4.6.1, of their TMS470 compiler incorporating various fixes allowing it to build a moderately patched FFmpeg. The performance remains somewhere between GCC and RVCT on average.

In light of the above, my recommendations remain unchanged:

For a free compiler, choose CodeSourcery 2009q1. It beats GCC 4.3.4 by 5-10% in most cases.
GNU purists are best served by GCC 4.3.4, which is up to 20% faster than 4.4.2 and rarely slower.
When price is not a concern, ARM RCVT is a good option, outperforming GCC by up to a factor 2.
In all cases, disable any auto-vectorisation features.

Regardless of which compiler is chosen, I cannot overstress the importance of testing. All compilers are crawling with bugs, and even the most innocent-looking code change can trigger one of them. When using a compiler other than GCC, extra caution is advised considering a lot of code is developed using only GCC and may thus fall prey to bugs unique to said other compiler.

ARM compiler shoot-out, round 2

Thursday, 20th August, 2009 - 8:20 pm | ARM, Compilers

12 Comments

In my recent test of ARM compilers, I had to leave out Texas Instrument’s compiler since it failed to build FFmpeg. Since then, the TI compiler team has been busy fixing bugs, and a snapshot I was given to test was able to build enough of a somewhat patched FFmpeg that I can now present round two in this shoot-out.

The contenders this time were the fastest GCC variant from round one, ARM RVCT, and newcomer TI TMS470. With the same rules as last time, the exact versions and optimisation options were like this:

CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros
TI TMS470 4.7.0-a9229, –-float_support=vfpv3 -mv=7a8 -O3 -mf=5

Continue reading →

ARM compiler shoot-out

Wednesday, 5th August, 2009 - 12:06 am | ARM, Compilers

16 Comments

A proper comparison of different compilers targeting ARM is long overdue, so I decided to do my part. I compiled FFmpeg using a selection of compilers, and measured the speed of the result when decoding various media samples. Since we are testing compilers, I disabled all hand-written assembler. The tests were run on a Beagle board clocked at 600 MHz.

These are the compilers I deemed worthy to participate in the test and the optimisation flags I used with each:

GCC 4.3.3, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
GCC 4.4.1, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
CodeSourcery GCC 2007q3 (based on 4.2.1), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-tree-vectorize
CodeSourcery GCC 2009q1 (based on 4.3.3), -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize
ARM RVCT 4.0 Build 591, -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a8 -std=c99 -fomit-frame-pointer -O3 -fno-math-errno -fno-signed-zeros

I would have also included the ARM compiler from Texas Instruments, had it been able to compile FFmpeg.
Continue reading →

Category Archives: ARM

Cortex-A7 instruction cycle timings

ARM inline asm secrets

ARM compiler update

ARM compiler shoot-out, round 2

ARM compiler shoot-out

Recent Posts

Recent Comments

Categories

Archives

Meta