Thumbs up

ARM processors have long supported the 16-bit Thumb instruction set, achieving smaller code size at the price of reduced performance. The Thumb-2 extension, introduced with the ARM1156T2-S processor, promises to regain most of this performance loss while retaining the small code size. This is accomplished by mixing 16-bit and 32-bit instructions.

Thumb-2 performance is claimed to reach 98% of the equivalent ARM code while being only 74% of the size. I decided to put this claim to the test with FFmpeg as the target and compiled the same source revision in ARM and Thumb-2 mode using the RVCT 4.0 compiler. For this test I disabled all hand-written assembler optimisations.

The Thumb-2 executable is 85% of the ARM one in size, which although being a substantial reduction falls somewhat short of the promised 74%. I tested the performance by measuring the time to decode a few sample media files on a Beagle board. Several of the samples actually decoded faster with the Thumb-2 build, with one H.264 video clip decoding 4% faster. Only one test, MP3 audio decoding, was significantly slower (15%) compared to ARM code. The speedup is likely due to reduced I-cache pressure. Thumb-2 and ARM instructions are executed identically after the initial decode stage, so no improvement can result from the change of instruction set alone.

In conclusion, the Thumb-2 performance is better than I had expected. Nevertheless, a 15% slowdown in even one case is reason enough to carefully benchmark the effects before deciding on a switch.

Bookmark the permalink.

6 Responses to Thumbs up

  1. Regarding omitting hand-written assembler: It would be interesting to see the effects of recompiling it for a Thumb-2 target. I suspect the modifications necessary are minor (if any). Thanks to UAL which most ARM tool chains support, the assembler source code differences between ARM and Thumb-2 are marginal in most cases.

    Regarding MP3 performance: Were you able to find out why MP3 is that much slower, unlike any of the other algorithms? Do you have access to RealView Profiler?


  2. Mans says:

    The hand-written assembler would be almost entirely 32-bit instructions, so I doubt there would be any gains there. Also, the current GNU assembler doesn’t fully support UAL. Including the assembler code as ARM in an otherwise Thumb-2 build works fine.

    I haven’t investigated the MP3 performance yet. I have the RealView software but no JTAG hardware. I will try oprofile first.

  3. Laurent says:

    Even if there will be mostly no size reduction by rewriting assembly routines to Thumb-2, there might be a gain from not requiring ISA switching (though I don’t know how A8 behaves on that).

  4. @Mans
    I don’t expect the assembler code to be significantly smaller than before when compiled for Thumb-2. The benefit of simply recompiling it would be that you could actually take advantage of the speed optimizations. And Thumb-2 wouldn’t be a special case any longer.


  5. Mans says:

    Switching to ARM state when entering the assembler functions works well. I did a quick benchmark comparing the overhead of calling a function with mode switching and without, and there wasn’t much difference.

  6. mat says:

    By any chance, have you some thumb number ?

    Now because of eabi force thumb interwork, I believe a clever complier could do interesting things :
    – use 32 bits for hot path code and 16 bits thumb for cold path code (error case, …)

Leave a Reply

Your email address will not be published. Required fields are marked *