kaashif's blog

Programming, with some mathematics on the side

Porting OpenJK to sparc64

2018-01-13

It's a little known fact that there is actually no way in C or C++ to do an unaligned access without invoking undefined behaviour. It's true! Read it yourself here:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned [...] for the referenced type, the behavior is undefined.

C11 (n1570) 6.3.2.3 p7

Sadly, the authors of many programs ignore this and rely on it working. Which it does, on x86, with little performance impact in most cases. On some architectures, like MIPS and PowerPC, unaligned access instructions exist but are slow. But on SPARC...unaligned access is impossible and leads to this:

$ openjk.sparc64
Bus error (core dumped)

Solving these issues with OpenJK is very difficult, especially considering Jedi Knight was never meant to run on SPARC (or indeed OpenBSD, but that's less of an issue).

Dealing with custom allocators

The Jedi Knight engine has a custom allocator, which is a bit of a pain considering it was written to RELY ON UNDEFINED BEHAVIOUR (read: uses unaligned reads). Let's examine the first SIGBUS we hit:

ZoneTailFromHeader(pMemory)->iMagic = ZONE_MAGIC;

This requires a bit of context. Currently, we are in the Z_Malloc function, with the signature:

void *Z_Malloc(int iSize, memtag_t eTag, qboolean bZeroit /* = qfalse */, int iUnusedAlign /* = 4 */)

This is a replacement for malloc, which takes a size, a tag saying what "sort" of memory this is, and a boolean saying whether to zero the memory.

pMemory is a pointer to a zoneHeader_t, which is this:

typedef struct zoneHeader_s
{
    int iMagic;
    memtag_t eTag;
    int iSize;
    struct zoneHeader_s *pNext;
    struct zoneHeader_s *pPrev;
} zoneHeader_t;

typedef struct
{
    int iMagic;

} zoneTail_t;

Why is this a problem? It's kind of subtle if you're not used to these kinds of bugs. We know that pMemory and all its members are aligned correctly: it was created earlier and initialised without any pointer casting or anything like that. The bug reveals itself when we look at ZoneTailFromHeader:

static inline zoneTail_t *ZoneTailFromHeader(zoneHeader_t *pHeader)
{
    return (zoneTail_t*) ( (char*)pHeader + sizeof(*pHeader) + pHeader->iSize );
}

So when we do this:

ZoneTailFromHeader(pMemory)->iMagic = ZONE_MAGIC;

What we're really doing is interpreting a pointer (pHeader) that was just cast to char * and had some values added to it. Interpreting pHeader as something (e.g. int * or char *) would actually be fine: malloc gives us memory satisfying the strictest alignment requirements and we just made pHeader with malloc, so that's OK.

The problem arises when we start adding stuff to pHeader. The result, which we interpret as a zoneTail_t, might actually not lie on any particular boundary. In this case, I was getting a SIGBUS when the engine tries to allocate a block of size 11. Obviously that won't end up on any boundary, so when we try to access an int on that misaligned boundary (iMagic is an int):

ZoneTailFromHeader(pMemory)->iMagic = ZONE_MAGIC;

We get a SIGBUS! Mystery solved.

The fix

The engine code in OpenJK (and the original code from Raven) is split into multiplayer and singleplayer. The mechanics of the game work differently in MP and there is no netcode in SP, so this makes some kind of sense. However, there is a huge downside: bugs which are noticed and fixed in one codebase might be overlooked in the other.

This isn't exactly what happened here, but it's close. In the SP code, in the SP version of Z_Malloc, I saw the following line:

    // Add in tracking info and round to a longword...  (ignore longword aligning now we're not using contiguous blocks)
    //
//  int iRealSize = (iSize + sizeof(zoneHeader_t) + sizeof(zoneTail_t) + 3) & 0xfffffffc;
    int iRealSize = (iSize + sizeof(zoneHeader_t) + sizeof(zoneTail_t));

This is interesting! What this does is round iRealSize (the size of the memory plus the zone metadata we add) up to the nearest multiple of 4. This doesn't quite work as written, since the zone layout is like this:

+--------+----------------------------------------------+------+
| header |       block of malloc'ed memory              | tail |
+--------+----------------------------------------------+------+

The SIGBUS comes when we access the tail. The end of the header is already aligned to a 4 byte boundary, just by looking at its members. The problem is the bit in the middle, which pushes the tail off of a 4 byte boundary.

We can solve this by rounding iSize:

// Round size of allocation up to multiple of 4 bytes
int iRoundedSize = (iSize + 3) & 0xfffffffc;
int iRealSize = (iRoundedSize + sizeof(zoneHeader_t) + sizeof(zoneTail_t));

Then replace iSize with iRoundedSize in the rest of the function.

Did it work?

Yes, kind of. There are a few other similar errors with the memory allocator. We also have some "static" memory blocks, where the entire zone layout is written into a struct, like:

#define DEF_STATIC(_char) {ZONE_MAGIC,TAG_STATIC,2,NULL,NULL},{_char,'\0'},{ZONE_MAGIC}

This was kind of hard to spot a priori, but after running the game under gdb, the SIGBUSes were easier to track down.

The fix here is just to add more NULLs so the ZONE_MAGIC isn't misaligned.

The game does run now. There were a heap of other similar errors (mostly casting char * to int *), but nothing impossible to fix.

Isn't this still a bit dodgy?

Yes. I don't even see why the zone magic thing even exists. Was memory corruption that big a deal? Maybe it was on the Gamecube or Xbox, where this code also ran.