GCC inline asm annoyance

Doing some PowerPC work recently, I wanted to use the lwbrx instruction, which loads a little endian word from memory. A simple asm statement wrapped in an inline function seemed like the simplest way to do this.

The lwbrx instruction comes with a minor limitation. It is only available in X-form, that is, the effective address is formed by adding the values of two register operands. Normal load instructions also have a D-form, which computes the effective address by adding an immediate offset to a register operand.

This means that my asm statement cannot use a normal “m” constraint for the memory operand, as this would allow GCC to use D-form addressing, which this instruction does not allow. I thus go in search of a special constraint to request X-form. GCC inline assembler supports a number of machine-specific constraints to cover situations like this one. To my dismay, the manual makes no mention of a suitable contraint to use.

Not giving up hope, I head for Google. Google always has answers. Almost always. None of the queries I can think of return a useful result. My quest finally comes to an end with the GCC machine description for PowerPC. This cryptic file suggests an (undocumented) “Z” constraint might work.

My first attempt at using the newly discovered “Z” constraint fails. The compiler still generates D-form address operands. Another examination of the machine description provides the answer. When referring to the operand, I must use %y0 in place of the usual %0. Needless to say, documentation explaining this syntax is nowhere to be found.

After spending the better part of an hour on a task I expected to take no more than five minutes, I finally arrive at a working solution:

static inline uint32_t load_le32(const uint32_t *p)
{
    uint32_t v;
    asm ("lwbrx %0, %y1" : "=r"(v) : "Z"(*p));
    return v;
}
Bookmark the permalink.

7 Responses to GCC inline asm annoyance

  1. Johannes Rajala says:

    Nice find, thanks! I was having the same problem.

  2. Andrew Pinski says:

    The Z constraint has been documented since GCC 4.3.0 (as I added the documentation for it and all the missing constraints for PPC). Also it is better to use __builtin_bswap32 for 4.3.0 and above because GCC is able to optimize that better if the load does not need to happen (if the value is in a register already, it uses three instructions, if the value is in a register).

    Here is the documentation for the Z constraint (from http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Machine-Constraints.html):

    Z
    Memory operand that is an indexed or indirect from a register (`m’ is preferable for asm statements)

    Thanks,
    Andrew Pinski

    • Mans says:

      A few observations:
      – There is still no mention of the magic ‘y’ modifier.
      – GCC 4.3 is so buggy, particularly for PPC, that it would be unwise to use it widely.
      – The day GCC is able to optimise anything at all reliably will be a day to celebrate.

  3. Mike says:

    What’s wrong with the traditional version?

    static inline uint32_t __lwbrx(const register uint32_t *p)
    {
        register uint32_t v;
        asm ("lwbrx %0, 0, %1" : "=r"(v) : "b"(p));
        return v;
    }
    
  4. Manish says:

    from the gcc page
    Z
    Memory operand that is an indexed or indirect from a register (`m’ is preferable for asm statements)

    asm volatile("stbrx %1,%y0": "=m"(*addr): "r"(reg): "memory" );
    asm volatile("stbrx %1,%y0": "=Z"(*addr): "r"(reg): "memory" );
    

    Both gave the same assembly

       c:   81 3f 00 08     lwz     r9,8(r31)
      10:   7d 80 4d 2c     stwbrx  r12,0,r9
    
    • Mans says:

      A simple test like that is likely to produce the same code for both. Try something slightly more complicated instead:

      int func_m(int *p)
      {
          int x;
          asm volatile ("lwbrx %0, %y1" : "=r"(x) : "m"(*(p+1)));
          return x;
      }
      
      int func_z(int *p)
      {
          int x;
          asm volatile ("lwbrx %0, %y1" : "=r"(x) : "Z"(*(p+1)));
          return x;
      }
      

      This gives the following assembly:

      00000000 : <func_m>
         0:   7c 63 24 2c     lwbrx   r3,r3,r4
         4:   4e 80 00 20     blr
      
      00000008 : <func_z>
         8:   38 63 00 04     addi    r3,r3,4
         c:   7c 60 1c 2c     lwbrx   r3,0,r3
        10:   4e 80 00 20     blr
      

      I’m not quite sure what’s going on here, but func_m is obviously incorrect.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.