Although I generally recommend against using GCC inline assembly, preferring instead pure assembly code in separate files, there are occasions where inline is the appropriate solution. Should one, at a time like this, turn to the GCC documentation for guidance, one must be prepared for a degree of disappointment. As it happens, much of the inline asm syntax is left entirely undocumented. This article attempts to fill in some of the blanks for the ARM target.
Each operand of an inline asm block is described by a constraint string encoding the valid representations of the operand in the generated assembly. For example the “r” code denotes a general-purpose register. In addition to the standard constraints, ARM allows a number of special codes, only some of which are documented. The full list, including a brief description, is available in the constraints.md file in the GCC source tree. The following table is an extract from this file consisting of the codes which are meaningful in an inline asm block (a few are only useful in the machine description itself).
|f||Legacy FPA registers f0-f7.|
|t||The VFP registers s0-s31.|
|v||The Cirrus Maverick co-processor registers.|
|w||The VFP registers d0-d15, or d0-d31 for VFPv3.|
|x||The VFP registers d0-d7.|
|y||The Intel iWMMX co-processor registers.|
|z||The Intel iWMMX GR registers.|
|l||In Thumb state the core registers r0-r7.|
|h||In Thumb state the core registers r8-r15.|
|j||A constant suitable for a MOVW instruction. (ARM/Thumb-2)|
|b||Thumb only. The union of the low registers and the stack register.|
|I||In ARM/Thumb-2 state a constant that can be used as an immediate value in a Data Processing instruction. In Thumb-1 state a constant in the range 0 to 255.|
|J||In ARM/Thumb-2 state a constant in the range -4095 to 4095. In Thumb-1 state a constant in the range -255 to -1.|
|K||In ARM/Thumb-2 state a constant that satisfies the I constraint if inverted. In Thumb-1 state a constant that satisfies the I constraint multiplied by any power of 2.|
|L||In ARM/Thumb-2 state a constant that satisfies the I constraint if negated. In Thumb-1 state a constant in the range -7 to 7.|
|M||In Thumb-1 state a constant that is a multiple of 4 in the range 0 to 1020.|
|N||Thumb-1 state a constant in the range 0 to 31.|
|O||In Thumb-1 state a constant that is a multiple of 4 in the range -508 to 508.|
|Pa||In Thumb-1 state a constant in the range -510 to +510|
|Pb||In Thumb-1 state a constant in the range -262 to +262|
|Ps||In Thumb-2 state a constant in the range -255 to +255|
|Pt||In Thumb-2 state a constant in the range -7 to +7|
|G||In ARM/Thumb-2 state a valid FPA immediate constant.|
|H||In ARM/Thumb-2 state a valid FPA immediate constant when negated.|
|Da||In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with two Data Processing insns.|
|Db||In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with three Data Processing insns.|
|Dc||In ARM/Thumb-2 state a const_int, const_double or const_vector that can be generated with four Data Processing insns. This pattern is disabled if optimizing for space or when we have load-delay slots to fill.|
|Dn||In ARM/Thumb-2 state a const_vector which can be loaded with a Neon vmov immediate instruction.|
|Dl||In ARM/Thumb-2 state a const_vector which can be used with a Neon vorr or vbic instruction.|
|DL||In ARM/Thumb-2 state a const_vector which can be used with a Neon vorn or vand instruction.|
|Dv||In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts instruction.|
|Dy||In ARM/Thumb-2 state a const_double which can be used with a VFP fconstd instruction.|
|Ut||In ARM/Thumb-2 state an address valid for loading/storing opaque structure types wider than TImode.|
|Uv||In ARM/Thumb-2 state a valid VFP load/store address.|
|Uy||In ARM/Thumb-2 state a valid iWMMX load/store address.|
|Un||In ARM/Thumb-2 state a valid address for Neon doubleword vector load/store instructions.|
|Um||In ARM/Thumb-2 state a valid address for Neon element and structure load/store instructions.|
|Us||In ARM/Thumb-2 state a valid address for non-offset loads/stores of quad-word values in four ARM registers.|
|Uq||In ARM state an address valid in ldrsb instructions.|
|Q||In ARM/Thumb-2 state an address that is a single base register.|
Within the text of an inline asm block, operands are referenced as %0, %1 etc. Register operands are printed as rN, memory operands as [rN, #offset], and so forth. In some situations, for example with operands occupying multiple registers, more detailed control of the output may be required, and once again, an undocumented feature comes to our rescue.
Special code letters inserted between the % and the operand number alter the output from the default for each type of operand. The table below lists the more useful ones.
|c||An integer or symbol address without a preceding # sign|
|B||Bitwise inverse of integer or symbol without a preceding #|
|L||The low 16 bits of an immediate constant|
|m||The base register of a memory operand|
|M||A register range suitable for LDM/STM|
|H||The highest-numbered register of a pair|
|Q||The least significant register of a pair|
|R||The most significant register of a pair|
|P||A double-precision VFP register|
|p||The high single-precision register of a VFP double-precision register|
|q||A NEON quad register|
|e||The low doubleword register of a NEON quad register|
|f||The high doubleword register of a NEON quad register|
|h||A range of VFP/NEON registers suitable for VLD1/VST1|
|A||A memory operand for a VLD1/VST1 instruction|
|y||S register as indexed D register, e.g. s5 becomes d2|
You shouldn’t have to look in constraints.md, all constraints should be documented, see http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Constraints.html#Constraints
If something is missing, that’d be a bug.
Are you trolling? That list is so far from complete it’s not even funny. The documentation does not mention the existence of modifier codes used with the % references at all, and less than half of the constraints are covered.
Telling people they should be able to look up the correct information in a manual instead of the codebase for a gigantic compiler system is now trolling.
You ought to be able to look it up in the manual. In reality, you can’t. That makes the suggestion seem a bit trollish.
As I understand it, and based on exchanging some e-mails with gcc maintainers long ago (unless I got it wrong), most of this stuff is undocumented on purpose. So that nobody will have any right to complain once/if it gets changed in the future versions of gcc without notice. The use of fancy undocumented constraints is strongly discouraged. It is for most parts internal gcc stuff which is not supposed to be exposed.
This is just like relying on the use of some internal undocumented API of some library. Sure, everyone can read the sources and figure out how it works. But you are the only one at fault if anything goes wrong, blaming the developers is pointless.
Nevertheless, extending gcc documentation to add information about at least a few more useful constraints makes sense. If you want to get your opinion taken into account, it is better to use gcc bugzilla:
The currently documented constraints are mostly useless for ARM. Anything beyond a simple register operand requires something secret to work reliably.
That bugzilla entry has been open for almost two years with no action taken. Typical GCC behaviour…
And typical user behavior to complain that volunteers do not work on something they apparently do not care so much about as you do. What stops you from submitting a patch if you can write a good article about this?
Please do not give me the volunteer bullshit. CodeSourcery has an entire team of paid people working on gcc, as do other companies. The bug report is already there, yet nobody bothers to so much as confirm it, let alone do anything about it. One wonders what these people spend their days doing.
Especially in the case of ARM and CodeSourcery, one would expect that your complaint would be addressed if ARM thinks it is important.
Paid developers who get paid to care about something else than what you care about. I don’t see the difference. In both cases, you are pissing on people because they do not do what you want them to do, because they think they have better things to do. What gives you the right to complain about that?
I do not need permission from anybody to write about an undocumented feature in a piece of free software.
Using undocumented features that might change in future gcc versions isn’t pure evil as some people claim.
If you need some fancy features in an project you are working on, you probably lock down which versions of libraries and tools-chains that you work on. Jumping for gcc3.x to gcc4.x for instance is normally not recommended in a live project, especially in the embedded world
Thanks for that list. I’ve struggled a long time before I finally found your site.
I use the %q, %e, %f constraints so that gcc will auto allocate registers (I don’t like hardcoding registers in my inline assembly). But GCC seems pretty dumb when it comes to loading/saving values. Consider the following:
GCC 4.5 will produce the following code:
Why all the vldr/vstr when it could do everything with just 2 vldmia and 1 vstmia? Do youknow any way to tell him to use result.val as a range of 4 values, like if I were doing “r” (result.val), yet still use auto-allocated registers?
The GCC register allocator isn’t particularly clever (in fairness, it is a rather hard problem). My guess is that it allocates the registers requested by the constraints of the asm block first, then generates the necessary code to load the values.
Is this function intended to be inlined? If not, I recommend writing the entire function in pure assembler or, if you must use inline asm, manually do the loads and stores to hardcoded registers. There might also be some way to coerce better behaviour from GCC using range constraints, but those are a bit tricky. If the function is being inlined, you must examine the generated code where it is used as this can differ dramatically from what you get for the function on its own.
Yes, it is meant to get inlined. When I do chain-multiplication of matrices, GCC is smart enough to not store/load the intermediate results, but still use lots of vldr in front of the chain and lots of vstr to store the end result > <;
I can't find anything about range constraints in the GCC undocumentation, unfortunately. I think I'm going to dig in codesourcery forums and the gcc dev mailing list for more info.
Happy Christmas to you btw!
FWIW, the following inline assembly:
…seems to produce better code than using plain old
(note the use of “m” and “Us” versus “r”)
Be careful. Q registers don’t work in the clobber list. You must list the equivalent D pairs. Also using dummy memory input/output operands for the variables and dropping the memory clobber can improve the code.
I didn’t know that, thanks! I’m not using any hardcoded clobberlist anymore anyway (except for memory and cc).
This seems fixed in gcc 4.5.
…produces the following comment in the assembly:
GCC can’t seem to be able to produce a range big enough to hold a quadword, yet alone a range of multiple quadword registers, so it’s kind of useless to use this IMO.
…and I think it’s the reason for GCC’s vld/vstr abuse! I hope codesourcery will fix that someday :-(
Thanks for your web page. The GCC constraints are unfathomably complex. I wanted 2 lines of assembly code as follows ” svc sym1″, “.word symbol2”. No constraint would allow symbol2 defined either as a symbolic address or 32 bit constant. Some crashed gcc. e.g:
will accept 32 bit constants, not symbols when invoked.
My solution was:
Invoked by, e.g. :
The main thing to realise is that gcc inline assembler
constraints,however silly, cannot be circumvented. Quotes at at least allowed the constraints to be dodged!
Your web page helped me realise this earlier – you are
definitely not trolling – thanks again. GPE
Pingback: Inline-Assembly – 문c 블로그
Pingback: Kombinace assembleru a programovacího jazyka C na procesorech ARM | MojeFedora.cz
Pingback: Kombinace assembleru a programovacího jazyka C na procesorech ARM (dokončení) | MojeFedora.cz