kaashif's blog

Programming, with some mathematics on the side

Register windows: a cool feature of SPARC

2018-08-11

Everyone's studied x86 assembly (just objdump any program on your PC...) and maybe even some ARM or MIPS in a class somewhere, but there are a few features that exist in some CPUs that don't exist at all in any of these designs.

I'm talking about register windows! When you call a function on SPARC, the new function just magically gets its own registers neatly separated into input registers, output registers and local registers. You're allowed to mess up your local registers as much as you want and the CPU does all of the saving and swapping for you.

No more weird arbitrary calling conventions about r10 and r11 being caller-saved, rax being return, rqb being Cthulhu-saved, rpqwuqew being quantum entangled with r554 on Tuesdays...

History

You could just go to the Wikipedia article about these, there is some good info there. The basic rundown is that the idea of register windows originated with the Berkeley RISC design back in the first half of the 80s, then they were implemented in a few architectures of which SPARC is the only (barely) surviving example.

This post isn't supposed to be about history, it's supposed to be about actual nitty-gritty assembly code, so let's get to it.

How are the registers laid out?

Each window consists of 8 input registers (i0 to i7), 8 local registers (l0 to l7) and 8 output registers (o0 to o7).

There are also 8 global registers (g0 to g7) which are visible at all times. g0 is actually just 0 all the time and there is some spooky stuff going on with the others: g1 to g5 might change between caller and callee, so can't be used to pass parameters. g6 and g7 are reserved for OS use, so don't use them.

Then there's sp, the stack pointer, which is also global.

The registers are seen by functions like this:

input
local
output input
       local
       output input
              local
              output ...

This may be a bit confusing. By this diagram, I mean that if you are a function f and you call a function g, if g switches to the next window, g will "see" different registers. It will see your output registers as its input registers and it will have its own local and output registers.

This means a callee can clobber the caller's output registers (e.g. to return values), but cannot even see the caller's input and local registers. In fact, no other function can see your local registers if it's in a different register window.

This is nice because it's not like x86 where there is just a convention on which registers are for what: this differs between operating systems and no program really has to follow it. On SPARC there is an easy and powerful way for functions to have their own registers.

An actual program!

Let's write a program in C, here is the source code:

int g() {
        return 0;
}

int main(int argc, char *argv[]) {
        return g();
}

Compile with debug symbols and objdump it (I use -O1 because it gets rid of a lot of writing to memory that is pointless and unnecessary):

$ cc -g -O1 test.c
$ objdump -S a.out

There's a lot of output, here are the relevant bits:

0000000000000580 <g>:
int g() {
 580:   9c 03 bf 30     add  %sp, -208, %sp
        return 0;
}
 584:   90 10 20 00     clr  %o0
 588:   81 c3 e0 08     retl 
 58c:   9c 23 bf 30     sub  %sp, -208, %sp

0000000000000590 <main>:

int main(int argc, char *argv[]) {
 590:   9d e3 bf 30     save  %sp, -208, %sp
        return g();
 594:   40 08 01 7b     call  200b80 <g@plt>
 598:   01 00 00 00     nop 
}

Let's analyze this bit by bit, starting with the main function.

The save instruction

On SPARC, much like x86, the stack grows downwards. So if we want to grow the stack to give a new stack frame to our function, we want to subtract from the stack. At the start of our main function, we want to grow the stack by 208 bytes, so we want to subtract 208 from %sp.

We also want to move to a new register window, where we will be able to see our input parameters: argc will be i0 and argv will be i1. And of course, we'll have our own local and output registers.

This is exactly what this does:

 590:   9d e3 bf 30     save  %sp, -208, %sp

This is called "save" because it saves the previous register window, making it inaccessible unless we go back to the previous window.

We have a stack frame and register window. Now what?

Calling a function

Now we call some memory address where g is. This is similar to x86, it's really just a jump plus some convenience - you can return.

But where is the return address kept? On x86, it's kept on the stack. On SPARC (and many other RISC architectures you may be familiar with), it's stored in a link register.

When you call a function, the return address is written to your o7 register. So when the callee executes a save, it will be in its i7 register. No need to touch memory.

In fact, if you look it up in the SPARC V9 Manual, you'll find that call addr is literally synonymous with jmpl addr, %o7. jmpl means "jump and link" which writes the return address to the given register (in this case o7) then jumps to addr.

Why is there a nop? Delay slot!

This is due to something weird on SPARC known as the delay slot. When you do a branch, the branch doesn't happen right away, the CPU actually executes the instruction after the jmp or call or whatever, then branches. This means you can fill it with a useful instruction or just whack a nop in there if it's too confusing.

Inside g

We call g, then g executes this:

 580:   9c 03 bf 30     add  %sp, -208, %sp

This reserves some space on the stack without switching to a new register window. Notice that a new register window is not necessary since we do barely anything in g. In particular, we call no other functions, which means g is known as a "leaf" function.

Then we zero out the return value (we are returning 0):

 584:   90 10 20 00     clr  %o0

Notice that because we are not in another window, we just zero out our o0 which is the same as our caller's o0.

Now we need to return and we have a choice to make when we look up "return" in the manual: there are 2 return instructions, ret and retl. ret is for returning from functions that have gone to a new register window, it jumps to i7+8. retl is for leaf functions, it jumps to o7+8 (instructions are 8 bytes long, fixed). We're a leaf, so we use retl:

 588:   81 c3 e0 08     retl 

Remember the delay slot! Before the branch happens, we get 1 instruction to do something. Let's use it to throw away our stack frame:

 58c:   9c 23 bf 30     sub  %sp, -208, %sp

Now the return value, 0 is in o0 and main can just leave it there and do nothing more to return 0.

Summary

There are a few mildly interesting things in this post you may not have seen before:

  • Register windows: stopping all of that confusing callee/caller-saved business

  • Leaf/non-leaf functions: you can still just not use register windows if you don't need them

  • Link registers: if you only stick to x86, having a register for the return address is a bit different

  • The delay slot: a quirk of SPARC, originating from the time when pipelines were simple and this let you save some stalling when a branch instruction comes along. Not really necessary nowadays (speculative execution in particular lets the processor just guess what's coming up). That's a whole 'nother blog post, though.

Credits

All of this information comes in part from my own experimentation and writing programs, but it all derives from the SPARC V9 Architecture Manual in the end. Props to Sun for writing some good documentation.