HP PA-RISC Assembly Crash Course
2019-04-18
Since I have access to a machine that has the PA-RISC architecture, I thought I'd compile some test programs and see what sort of assembly code produced. Some highlights:
- A neat way to manage the stack pointer (and one surprise)
- Every instruction seems to be shorthand for
or
- Completers - a weird way of giving switches to your instructions
PA-RISC is considerably less popular than x86, MIPS, PowerPC, even SPARC. And being a RISC architecture means that humans hardly ever wrote assembly for it themselves. Most of the time, programmers probably never even gave (past tense since PA-RISC is dead) their binaries a second glance. Or really even any kind of look.
Well, that's about to change! The first program we'll look at is, of course, hello world.
Hello, world!
I wanted to compile, as a showcase, a program with some nontrivialities. That means function calls, string literals and non-leaf and leaf functions (functions that do/don't call other functions). This will hopefully let us discover the quirks of PA-RISC in a controlled environment. Here is the first one, which has a leaf function call with no arguments and a function call with an argument:
#include <stdio.h>
int f() {
return 0;
}
int main() {
printf("Hello, world!\n");
return f();
}
And here is the binary, compiled with gcc -O0 -g test.c
and dumped
with objdump -S
:
000105a8 <f>:
#include <stdio.h>
int f() {
105a8: 08 03 02 41 copy r3,r1
105ac: 08 1e 02 43 copy sp,r3
105b0: 6f c1 00 80 stw,ma r1,40(sp)
return 0;
105b4: 34 1c 00 00 ldi 0,ret0
}
105b8: 34 7e 00 80 ldo 40(r3),sp
105bc: 4f c3 3f 81 ldw,mb -40(sp),r3
105c0: e8 40 c0 02 bv,n r0(rp)
000105c4 <main>:
int main() {
105c4: 6b c2 3f d9 stw rp,-14(sp)
105c8: 08 03 02 41 copy r3,r1
105cc: 08 1e 02 43 copy sp,r3
105d0: 6f c1 00 80 stw,ma r1,40(sp)
printf("Hello, world!\n");
105d4: 23 88 10 00 ldil L%10800,ret0
105d8: 37 9a 01 b0 ldo d8(ret0),r26
105dc: e8 5f 1a ed b,l 10358 <_end_init+0x14>,rp
105e0: 08 00 02 40 nop
return f();
105e4: e8 5f 1f 7d b,l 105a8 <f>,rp
105e8: 08 00 02 40 nop
}
105ec: 48 62 3f d9 ldw -14(r3),rp
105f0: 34 7e 00 80 ldo 40(r3),sp
105f4: 4f c3 3f 81 ldw,mb -40(sp),r3
105f8: e8 40 c0 02 bv,n r0(rp)
Only the relevant part is included. Now, we have to go through the PA-RISC ISA Reference Manual and decipher what all of this means.
Note that there are some examples of C programs and resulting assembly in that manual, but they aren't explained too much since the manual is supposed to be a reference, not a beginner's guide. It's also a bit long, over 400 pages.
Also, I have no idea how to get my hands on the C compilers and assemblers they used, so I can't verify any of their examples. Moving on.
Registers
All registers are 64 bits wide on PA-RISC 2.0 CPUs (like the one I have).
If you recall my article about SPARC assembly, you'll notice that it's almost entirely about register windows and related coolness. There is no such magic on PA-RISC, it is rather similar to x86 in that respect. That is, there are just a number of registers and you have to just remember what they're for.
Luckily, there are some helpful synonyms on page 28 of the manual. Here are the important ones:
ret0
isr28
, the return value. This is set when a function wants to return something, as we will see.sp
isr30
, the stack pointer. There is something weird about how this is used in the above code which I'll go over later. Can you guess what it is?rp
isr2
, the return link. This is the return link.
Next comes the argument convention, which is a bit odd: r26
is
arg0
, r25
is arg1
, r24
is arg2
, r23
is arg3
. Yes, it's
numbered backwards for some unusual reason.
Now we can get started deciphering the code.
Which way does your stack grow?
On x86, the stack usually grows downwards. This means if you are at address 10 and you need more space, you, by convention, decrease the stack pointer (move it towards zero). The heap starts at the bottom and grows up. It's the same way on SPARC, PowerPC, MIPS and so on.
memory addresses growing -->
+---------------------------------------------------------------+
| heap grows --> <-- stack grows |
+---------------------------------------------------------------+
On PA-RISC, somehow the convention is the opposite - you increase the stack pointer to allocate more memory. The heap starts at the top and grows down.
memory addresses growing -->
+---------------------------------------------------------------+
| stack grows --> <-- heap grows |
+---------------------------------------------------------------+
This doesn't really matter and isn't a cool feature in any way. It's an interesting difference from the norm, though.
A leaf function
Leaf functions are simple, since we don't have to worry about setting up the registers for callees, we can just try our best to avoid messing things up for the caller and we're good.
int f() {
105a8: 08 03 02 41 copy r3,r1
105ac: 08 1e 02 43 copy sp,r3
105b0: 6f c1 00 80 stw,ma r1,40(sp)
This is us saving the stack pointer. While copy
may seem
self-explanatory, it is actually a pseudo-operation, meaning the
hardware doesn't know about it. Instead, copy x,y
is shorthand for
or x,0,y
, which ors x with 0 and stores it in y.
stw,ma r1,40(sp)
stores the value of the register r1
at
sp+40
. Note that we have the x86-like memory address addition
syntax. We can't do multiplications, though, so there is no shortcut
to accessing arrays like on x86, where you can write 5*eax+2
into a
mov
instruction. The stw
instruction means "store word", fairly
self explanatory. But what does ,ma
mean?
In some PA-RISC instructions, there are two bits labeled m
and
a
. If you use the completer (what the ,ma
or ,mb
part is
called), then this sets them in certain ways. What exactly this means
varies for each instruction.
In our case, ,ma
means "modify after". This is referring to
modifying the base address before/after we calculate the
offset. Modify after means our offset is just the base, then we add
the displacement to the base (actually modifying the base
register). ,mb
or modify before computes the base + displacement and
uses this as both the final effective address and the value to write
into the base register.
There's a diagram on page 113 of the manual.
This might seem like a pain, but this is essentially designed to make stack pointer manipulation a breeze: using modify before/after, the stack pointer can manage itself!
In this case, ,ma
means the stack pointer is updated essentially
automatically after we save r1
.
Next, we need to return:
return 0;
105b4: 34 1c 00 00 ldi 0,ret0
Again, this seems self explanatory: load 0 into ret0
, right? But no,
there is a little more going on here. The "instruction" ldi i,r
(load immediate) is actually a pseudo-operation that generates an
instruction ldo i(0),r
. ldo d(b),t
is the load offset instruction,
which calculates the offset given by the expression d(b)
and loads
this into t
.
In our case, ldi 0,ret0
calculates the offset 0(0)
, which is 0,
and loads this into ret0
. Due to the instruction encoding requiring
all instructions to be 32 bits long (a common design decision in RISC
architectures), the immediate d
is limited to 14 bits in length.
105b8: 34 7e 00 80 ldo 40(r3),sp
105bc: 4f c3 3f 81 ldw,mb -40(sp),r3
105c0: e8 40 c0 02 bv,n r0(rp)
This loads 40+r3
into sp
, then uses the ldw,mb
pseudo-instruction to pop a value off the stack (updating the stack
pointer appropriately) into r3
. You'll notice that this is value we
saved earlier. This is because r3
is callee-saved.
Also, r1
is caller-saved, so we don't have to worry about restoring
it. That wasn't really that bad, right?
The main course
The main function showcases two features: calling a non-leaf function (printf) and calling a leaf function (f). Here we go:
000105c4 <main>:
int main() {
105c4: 6b c2 3f d9 stw rp,-14(sp)
105c8: 08 03 02 41 copy r3,r1
105cc: 08 1e 02 43 copy sp,r3
105d0: 6f c1 00 80 stw,ma r1,40(sp)
Again, we save r3
and update the stack appropriately.
printf("Hello, world!\n");
105d4: 23 88 10 00 ldil L%10800,ret0
105d8: 37 9a 01 b0 ldo d8(ret0),r26
105dc: e8 5f 1a ed b,l 10358 <_end_init+0x14>,rp
105e0: 08 00 02 40 nop
Here's the juicy bit. The string is stored in the data segment, so we
use the ldil
instruction to "load immediate into left". This means
we load the immediate (some pointer into the data segment) into the
left part of the ret0
register. The left part, in this case, is 32
bits long.
Next, we write the address of the string (imagine it's a char *
) to
r26
, which is arg0
, the first argument of printf.
The branch and link b,l
instruction branches (i.e. unconditionally
jumps to the address given) but also places the return point into the
register rp
, the link register.
The delay slot is an instruction that is executed before the
branch/jump happens. In this case it's a nop
, so nothing happens. But
there is more to this nop
than meets the eye: it's a
pseudo-instruction! It really means or 0,0,0
, which is a nop
since
nothing is changed.
return f();
105e4: e8 5f 1f 7d b,l 105a8 <f>,rp
105e8: 08 00 02 40 nop
}
Using the branch and link instruction, it's very easy to call f
. It
sets ret0
, so no need to set it ourselves. Now there's only one
thing left to do...
105ec: 48 62 3f d9 ldw -14(r3),rp
105f0: 34 7e 00 80 ldo 40(r3),sp
105f4: 4f c3 3f 81 ldw,mb -40(sp),r3
105f8: e8 40 c0 02 bv,n r0(rp)
We restore r3
and sp
, the only caller-saved registers! There is a
new instruction here, though, bv
. This is a vectored branch, which
sounds interesting. In actual fact, bv,n x(b)
just means that we
jump to b
added to x
left shifted by 3 bits.
That's a full program in PA-RISC assembly!
Conclusions
There are some commonalities with both x86 and SPARC.
SPARC:
- Link registers
- Everything is a pseudo-instruction
- Delay slots
x86:
- Two operand instructions
immediate(register)
syntax, although no multiplications- Lots of arithmetic is done using instructions supposedly meant for calculating addresses.
Overally, I would say that PA-RISC isn't really that cool of an architecture at first glance. It doesn't have anything extra exciting like SPARC's register windows except completers maybe, but those are more confusing than anything.
There's probably tons I've missed out, but I have a feeling that there won't be hordes of HP aficionados chasing me down.