Doing some PowerPC work recently, I wanted to use the lwbrx instruction, which loads a little endian word from memory. A simple asm statement wrapped in an inline function seemed like the simplest way to do this.
The lwbrx instruction comes with a minor limitation. It is only available in X-form, that is, the effective address is formed by adding the values of two register operands. Normal load instructions also have a D-form, which computes the effective address by adding an immediate offset to a register operand.
This means that my asm statement cannot use a normal “m” constraint for the memory operand, as this would allow GCC to use D-form addressing, which this instruction does not allow. I thus go in search of a special constraint to request X-form. GCC inline assembler supports a number of machine-specific constraints to cover situations like this one. To my dismay, the manual makes no mention of a suitable contraint to use.
Not giving up hope, I head for Google. Google always has answers. Almost always. None of the queries I can think of return a useful result. My quest finally comes to an end with the GCC machine description for PowerPC. This cryptic file suggests an (undocumented) “Z” constraint might work.
My first attempt at using the newly discovered “Z” constraint fails. The compiler still generates D-form address operands. Another examination of the machine description provides the answer. When referring to the operand, I must use %y0 in place of the usual %0. Needless to say, documentation explaining this syntax is nowhere to be found.
After spending the better part of an hour on a task I expected to take no more than five minutes, I finally arrive at a working solution:
static inline uint32_t load_le32(const uint32_t *p) { uint32_t v; asm ("lwbrx %0, %y1" : "=r"(v) : "Z"(*p)); return v; }
Nice find, thanks! I was having the same problem.
The Z constraint has been documented since GCC 4.3.0 (as I added the documentation for it and all the missing constraints for PPC). Also it is better to use __builtin_bswap32 for 4.3.0 and above because GCC is able to optimize that better if the load does not need to happen (if the value is in a register already, it uses three instructions, if the value is in a register).
Here is the documentation for the Z constraint (from http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Machine-Constraints.html):
Z
Memory operand that is an indexed or indirect from a register (`m’ is preferable for asm statements)
Thanks,
Andrew Pinski
A few observations:
– There is still no mention of the magic ‘y’ modifier.
– GCC 4.3 is so buggy, particularly for PPC, that it would be unwise to use it widely.
– The day GCC is able to optimise anything at all reliably will be a day to celebrate.
What’s wrong with the traditional version?
from the gcc page
Z
Memory operand that is an indexed or indirect from a register (`m’ is preferable for asm statements)
Both gave the same assembly
A simple test like that is likely to produce the same code for both. Try something slightly more complicated instead:
This gives the following assembly:
I’m not quite sure what’s going on here, but
func_m
is obviously incorrect.