kaashif's blog

Programming, with some mathematics on the side

HP PA-RISC Assembly Crash Course

2019-04-18

Since I have access to a machine that has the PA-RISC architecture, I thought I'd compile some test programs and see what sort of assembly code produced. Some highlights:

  • A neat way to manage the stack pointer (and one surprise)
  • Every instruction seems to be shorthand for or
  • Completers - a weird way of giving switches to your instructions

PA-RISC is considerably less popular than x86, MIPS, PowerPC, even SPARC. And being a RISC architecture means that humans hardly ever wrote assembly for it themselves. Most of the time, programmers probably never even gave (past tense since PA-RISC is dead) their binaries a second glance. Or really even any kind of look.

Well, that's about to change! The first program we'll look at is, of course, hello world.

Hello, world!

I wanted to compile, as a showcase, a program with some nontrivialities. That means function calls, string literals and non-leaf and leaf functions (functions that do/don't call other functions). This will hopefully let us discover the quirks of PA-RISC in a controlled environment. Here is the first one, which has a leaf function call with no arguments and a function call with an argument:

#include <stdio.h>

int f() {
        return 0;
}

int main() {
        printf("Hello, world!\n");
        return f();
}

And here is the binary, compiled with gcc -O0 -g test.c and dumped with objdump -S:

000105a8 <f>:
#include <stdio.h>

int f() {
   105a8:       08 03 02 41     copy r3,r1
   105ac:       08 1e 02 43     copy sp,r3
   105b0:       6f c1 00 80     stw,ma r1,40(sp)
        return 0;
   105b4:       34 1c 00 00     ldi 0,ret0
}
   105b8:       34 7e 00 80     ldo 40(r3),sp
   105bc:       4f c3 3f 81     ldw,mb -40(sp),r3
   105c0:       e8 40 c0 02     bv,n r0(rp)

000105c4 <main>:

int main() {
   105c4:       6b c2 3f d9     stw rp,-14(sp)
   105c8:       08 03 02 41     copy r3,r1
   105cc:       08 1e 02 43     copy sp,r3
   105d0:       6f c1 00 80     stw,ma r1,40(sp)
        printf("Hello, world!\n");
   105d4:       23 88 10 00     ldil L%10800,ret0
   105d8:       37 9a 01 b0     ldo d8(ret0),r26
   105dc:       e8 5f 1a ed     b,l 10358 <_end_init+0x14>,rp
   105e0:       08 00 02 40     nop
        return f();
   105e4:       e8 5f 1f 7d     b,l 105a8 <f>,rp
   105e8:       08 00 02 40     nop
}
   105ec:       48 62 3f d9     ldw -14(r3),rp
   105f0:       34 7e 00 80     ldo 40(r3),sp
   105f4:       4f c3 3f 81     ldw,mb -40(sp),r3
   105f8:       e8 40 c0 02     bv,n r0(rp)

Only the relevant part is included. Now, we have to go through the PA-RISC ISA Reference Manual and decipher what all of this means.

Note that there are some examples of C programs and resulting assembly in that manual, but they aren't explained too much since the manual is supposed to be a reference, not a beginner's guide. It's also a bit long, over 400 pages.

Also, I have no idea how to get my hands on the C compilers and assemblers they used, so I can't verify any of their examples. Moving on.

Registers

All registers are 64 bits wide on PA-RISC 2.0 CPUs (like the one I have).

If you recall my article about SPARC assembly, you'll notice that it's almost entirely about register windows and related coolness. There is no such magic on PA-RISC, it is rather similar to x86 in that respect. That is, there are just a number of registers and you have to just remember what they're for.

Luckily, there are some helpful synonyms on page 28 of the manual. Here are the important ones:

  • ret0 is r28, the return value. This is set when a function wants to return something, as we will see.

  • sp is r30, the stack pointer. There is something weird about how this is used in the above code which I'll go over later. Can you guess what it is?

  • rp is r2, the return link. This is the return link.

Next comes the argument convention, which is a bit odd: r26 is arg0, r25 is arg1, r24 is arg2, r23 is arg3. Yes, it's numbered backwards for some unusual reason.

Now we can get started deciphering the code.

Which way does your stack grow?

On x86, the stack usually grows downwards. This means if you are at address 10 and you need more space, you, by convention, decrease the stack pointer (move it towards zero). The heap starts at the bottom and grows up. It's the same way on SPARC, PowerPC, MIPS and so on.

memory addresses growing -->
+---------------------------------------------------------------+
| heap grows -->                               <-- stack grows  |
+---------------------------------------------------------------+

On PA-RISC, somehow the convention is the opposite - you increase the stack pointer to allocate more memory. The heap starts at the top and grows down.

memory addresses growing -->
+---------------------------------------------------------------+
| stack grows -->                               <-- heap grows  |
+---------------------------------------------------------------+

This doesn't really matter and isn't a cool feature in any way. It's an interesting difference from the norm, though.

A leaf function

Leaf functions are simple, since we don't have to worry about setting up the registers for callees, we can just try our best to avoid messing things up for the caller and we're good.

int f() {
   105a8:       08 03 02 41     copy r3,r1
   105ac:       08 1e 02 43     copy sp,r3
   105b0:       6f c1 00 80     stw,ma r1,40(sp)

This is us saving the stack pointer. While copy may seem self-explanatory, it is actually a pseudo-operation, meaning the hardware doesn't know about it. Instead, copy x,y is shorthand for or x,0,y, which ors x with 0 and stores it in y.

stw,ma r1,40(sp) stores the value of the register r1 at sp+40. Note that we have the x86-like memory address addition syntax. We can't do multiplications, though, so there is no shortcut to accessing arrays like on x86, where you can write 5*eax+2 into a mov instruction. The stw instruction means "store word", fairly self explanatory. But what does ,ma mean?

In some PA-RISC instructions, there are two bits labeled m and a. If you use the completer (what the ,ma or ,mb part is called), then this sets them in certain ways. What exactly this means varies for each instruction.

In our case, ,ma means "modify after". This is referring to modifying the base address before/after we calculate the offset. Modify after means our offset is just the base, then we add the displacement to the base (actually modifying the base register). ,mb or modify before computes the base + displacement and uses this as both the final effective address and the value to write into the base register.

There's a diagram on page 113 of the manual.

This might seem like a pain, but this is essentially designed to make stack pointer manipulation a breeze: using modify before/after, the stack pointer can manage itself!

In this case, ,ma means the stack pointer is updated essentially automatically after we save r1.

Next, we need to return:

return 0;
   105b4:       34 1c 00 00     ldi 0,ret0

Again, this seems self explanatory: load 0 into ret0, right? But no, there is a little more going on here. The "instruction" ldi i,r (load immediate) is actually a pseudo-operation that generates an instruction ldo i(0),r. ldo d(b),t is the load offset instruction, which calculates the offset given by the expression d(b) and loads this into t.

In our case, ldi 0,ret0 calculates the offset 0(0), which is 0, and loads this into ret0. Due to the instruction encoding requiring all instructions to be 32 bits long (a common design decision in RISC architectures), the immediate d is limited to 14 bits in length.

   105b8:       34 7e 00 80     ldo 40(r3),sp
   105bc:       4f c3 3f 81     ldw,mb -40(sp),r3
   105c0:       e8 40 c0 02     bv,n r0(rp)

This loads 40+r3 into sp, then uses the ldw,mb pseudo-instruction to pop a value off the stack (updating the stack pointer appropriately) into r3. You'll notice that this is value we saved earlier. This is because r3 is callee-saved.

Also, r1 is caller-saved, so we don't have to worry about restoring it. That wasn't really that bad, right?

The main course

The main function showcases two features: calling a non-leaf function (printf) and calling a leaf function (f). Here we go:

000105c4 <main>:

int main() {
   105c4:       6b c2 3f d9     stw rp,-14(sp)
   105c8:       08 03 02 41     copy r3,r1
   105cc:       08 1e 02 43     copy sp,r3
   105d0:       6f c1 00 80     stw,ma r1,40(sp)

Again, we save r3 and update the stack appropriately.

printf("Hello, world!\n");
   105d4:       23 88 10 00     ldil L%10800,ret0
   105d8:       37 9a 01 b0     ldo d8(ret0),r26
   105dc:       e8 5f 1a ed     b,l 10358 <_end_init+0x14>,rp
   105e0:       08 00 02 40     nop

Here's the juicy bit. The string is stored in the data segment, so we use the ldil instruction to "load immediate into left". This means we load the immediate (some pointer into the data segment) into the left part of the ret0 register. The left part, in this case, is 32 bits long.

Next, we write the address of the string (imagine it's a char *) to r26, which is arg0, the first argument of printf.

The branch and link b,l instruction branches (i.e. unconditionally jumps to the address given) but also places the return point into the register rp, the link register.

The delay slot is an instruction that is executed before the branch/jump happens. In this case it's a nop, so nothing happens. But there is more to this nop than meets the eye: it's a pseudo-instruction! It really means or 0,0,0, which is a nop since nothing is changed.

        return f();
   105e4:       e8 5f 1f 7d     b,l 105a8 <f>,rp
   105e8:       08 00 02 40     nop
}

Using the branch and link instruction, it's very easy to call f. It sets ret0, so no need to set it ourselves. Now there's only one thing left to do...

   105ec:       48 62 3f d9     ldw -14(r3),rp
   105f0:       34 7e 00 80     ldo 40(r3),sp
   105f4:       4f c3 3f 81     ldw,mb -40(sp),r3
   105f8:       e8 40 c0 02     bv,n r0(rp)

We restore r3 and sp, the only caller-saved registers! There is a new instruction here, though, bv. This is a vectored branch, which sounds interesting. In actual fact, bv,n x(b) just means that we jump to b added to x left shifted by 3 bits.

That's a full program in PA-RISC assembly!

Conclusions

There are some commonalities with both x86 and SPARC.

SPARC:

  • Link registers
  • Everything is a pseudo-instruction
  • Delay slots

x86:

  • Two operand instructions
  • immediate(register) syntax, although no multiplications
  • Lots of arithmetic is done using instructions supposedly meant for calculating addresses.

Overally, I would say that PA-RISC isn't really that cool of an architecture at first glance. It doesn't have anything extra exciting like SPARC's register windows except completers maybe, but those are more confusing than anything.

There's probably tons I've missed out, but I have a feeling that there won't be hordes of HP aficionados chasing me down.