kaashif's blog: Computers, with some mathematics on the side

macroexpand-1 for C++ Coroutines

2024-07-27T00:00:00Z

I went to a talk recently about C++ coroutines, and I don't think it was very good. The talk went through some examples of C++ coroutines and had a surface-level handwavy explanation of what "really happens" when compilers see a coroutine.

But a non handwavy explanation is really easy - you can just look at what the compiler does to coroutine code to see what's really happening. No analogy, no handwave, just looking at real code.

How do we do that without looking at LLVM IR or something? Easy - compile the binary then decompile it into normal C++ and see what it looks like! So let's do it!

The title is a reference to macroexpand-1 from Lisp. Looking at the source code resulting from a coroutine kind of reminds me of expanding a macro in Lisp.

This post is not intended to actually be accessible to beginners or readable for anyone, but it does illustrate an approach to demystifying coroutines that I like.

Minimal coroutine example

A minimal coroutine example needs to demonstrate suspending and resuming execution at a minimum. It's not interesting if we just have a single co_return and it's all optimized away.

Here's my example in its entirety. First, the coroutine type.

#include <iostream>
#include <coroutine>

struct Coroutine {
    struct promise_type;
    std::coroutine_handle<promise_type> handle;

    Coroutine(std::coroutine_handle<promise_type>&& x) : handle(x) {}

    struct promise_type {
        Coroutine get_return_object() { return Coroutine{std::coroutine_handle<promise_type>::from_promise(*this)}; }
        void unhandled_exception() noexcept {}
        void return_void() noexcept { }
        std::suspend_never initial_suspend() noexcept { return {}; }
        std::suspend_never final_suspend() noexcept { return {}; }
    };
};

This post is not a coroutine tutorial, so I won't explain this code in depth. The point of this post is to look at the real code you get. A few points:

The coroutine_handle is what you can call resume() on to resume the execution of the coroutine. It must keep track of the state of the coroutine when it was suspended.
The other methods like return_void are just required by the standard and the compiler, but we don't really do anything interesting in them.

Here's the coroutine itself:

Coroutine test_coroutine() {
    std::cout << "started!\n";
    co_await std::suspend_always{};
    std::cout << "returning!\n";
    co_return;
}

Very simple conceptually - when we call the coroutine, we print something, suspend, then when we resume we print something else and return.

suspend_always is an awaitable that just has await_ready defined as false, so using co_await on it always suspends the coroutine.

Finally, main:

int main() {
    auto coro = test_coroutine();
    std::cout << "main\n";
    coro.handle.resume();
    return 0;
}

Again, very simple - call the coroutine, it suspends, we print something to demonstrate that the coroutine really was suspended in the middle, then we resume it.

Decompiling

This is the point at which someone might be tempted to wax lyrical about state machines, pseudocode, draw an analogy, etc etc. No.

Let's compile the above and feed it into https://dogbolt.org/ which lets you run various decompilers on any executable you upload. I'll do that and walk through the nicest looking output I can find.

Compile and run the example:

$ g++ -o a.out.clang -std=c++20 coro.cpp
$ ./a.out
started!
main
returning!

Perfect! It works. Let's decompile it. Upload a.out to https://dogbolt.org/ to follow along. I looked at the output of all of the decompilers and I think dewolf is the most informative and readable for this particular case.

Dewolf can be found here: https://github.com/fkie-cad/dewolf.

What's really happening?

Now we can walk through the decompiled code, which looks like C-ish code and not C++20 coroutine code. You'll notice some name mangling but it's actually very readable!

Let's start at main:

int main() {
    int var_0;
    long var_3;
    long var_4;
    var_4 = test_coroutine(/* frame_ptr */ var_0);
    std::operator<<<std::char_traits<char>_>(/* __out */ std::cout, /* __s */ "main\n");
    var_3 = var_4;
    std::__n4861::coroutine_handle<Coroutine::promise_type>::resume(/* this */ &var_3);
    return 0;
}

Something small and hard to notice happened here - where we wrote test_coroutine to take no arguments, here it appears to take an argument frame_ptr.

This is key to how coroutines work - when you call a coroutine, in this case test_coroutine, the compiler rewrites your code to add a frame pointer argument. In this coroutine frame, we keep track of where the coroutine was when it was suspended, and local variables. This allows us to resume the coroutine with the same state, from the same place as when it was suspended.

Let's look at test_coroutine:

Coroutine test_coroutine(_Z14test_coroutinev.Frame * frame_ptr) {
    long(void *) ** var_0;
    var_0 = operator_new(/* sz */ 40UL);
    *(var_0 + 34L) = 0x1;
    *var_0 = test_coroutine_actor;
    *(var_0 + 8L) = test_coroutine_destroy;
    *(var_0 + 32L) = 0x0;
    test_coroutine_actor(var_0);
    return Coroutine::promise_type::get_return_object(/* this */ var_0 + 16L);
}

This was rewritten significantly. Notice that no calls to std::cout appear here. The real work has been moved to test_coroutine_actor.

What's left in test_coroutine is just setting up the coroutine frame with:

A pointer to the function that does the real work test_coroutine_actor.
A pointer to the cleanup function test_coroutine_destroy.
The initial state of the coroutine, 0x0.

We then call test_coroutine_actor, which is where the real work is. This is where the handwaving about a state machine ends and we can actually look at the real state machine the compiler gives us.

long test_coroutine_actor(void * arg1) {
    long var_9;
    void * var_0;
    void * var_1;
    void * var_2;
    void * var_3;
    void * var_4;
    void * var_5;
    var_1 = arg1 + 32L;
    if ((*var_1 & 1) == 0) {
        var_2 = arg1 + 38L;
        if (((unsigned short)*var_1 <= 6) && ((unsigned short)*var_1 != 6)) {
            var_3 = arg1 + 37L;
            var_4 = arg1 + 16L;
        }
        if (((unsigned short)*var_1 <= 4) && ((unsigned short)*var_1 != 4)) {
            var_0 = arg1 + 24L;
            var_2 = arg1 + 35L;
            var_5 = arg1 + 36L;
        }
        switch((unsigned short) *(var_1)) {
        case 0:
            *var_0 = data_0x1904(/* __a */ arg1);
            *var_2 = 0x0;
            Coroutine::promise_type::initial_suspend(/* this */ var_4);
            std::__n4861::suspend_never::await_ready(/* this */ var_5);
        case 2:
            *var_2 = 0x1;
            std::__n4861::suspend_never::await_resume(/* this */ var_5);
            std::operator<<<std::char_traits<char>_>(/* __out */ std::cout, /* __s */ "started!\n");
            std::__n4861::suspend_always::await_ready(/* this */ var_3);
            *var_1 = 0x4;
            data_0x18de(/* this */ var_0);
            std::__n4861::suspend_always::await_suspend(/* this */ var_3);
            return sub_158e(&var_9);
            break;
        case 4:
            std::__n4861::suspend_always::await_resume(/* this */ var_3);
            std::operator<<<std::char_traits<char>_>(/* __out */ std::cout, /* __s */ "returning!\n");
            Coroutine::promise_type::return_void(/* this */ var_4);
            *arg1 = 0x0;
            Coroutine::promise_type::final_suspend(/* this */ var_4);
            std::__n4861::suspend_never::await_ready(/* this */ var_2);
            break;
        }
        std::__n4861::suspend_never::await_resume(/* this */ var_2);
    }
    if (((*var_1 & 1) == 0) || ((unsigned short)*var_1 == 7) || ((unsigned short)*var_1 == 1) || ((unsigned short)*var_1 == 5) || ((unsigned short)*var_1 == 3)) {
        if ((unsigned char)*(arg1 + 34L) != 0) {
            operator_delete(/* ptr */ arg1);
        }
        arg1 = var_0 + 40L;
        return *arg1 - *arg1;
    }
}

This looks exactly like a standard switch/case state machine you might find in any C or C++ codebase, except with really poorly named variables.

We can see that the state variable arg1 contains a member at arg1 + 32L which indicates the point the coroutine has reached. Initially it's 0 so we execute the first case.

Case by case:

0: When writing the coroutine class, we set initial_suspend to return suspend_never - calling await_ready on that returns true and thus we don't suspend in the first case. No break means we fall through.
2: suspend_never does nothing on resume, we print started!, then we need to construct a coroutine handle to return when we suspend. This is critical - saving our state is what lets us resume. Our handle really only needs to keep track of one thing - where to resume. We want to resume at 4, which is the next state. That's what *var_1 = 0x4 saves.

We then return the coroutine handle.

This is where the first call to test_coroutine_actor ends.
4: When main calls handle.resume, handle.resume calls test_coroutine_actor with the saved frame from earlier, with state 4. That means the switch/case skips straight to case 4 and we print returning!. Next is a few lines of uninteresting cleanup.

The compiler isn't doing anything complex or clever here, it's just transforming your coroutine that uses co_* into a function that takes a state argument and has a switch/case.

Not magic at all!

Conclusion

Learning always benefits from motivation. When writing a state machine by hand to e.g. parse input or do something while waiting for non-blocking I/O, programmers often want something like language support for coroutines.

Problems with coroutines aside, I think a good coroutine talk would go something like this:

Start with the problem (whatever it is), show a pre C++20 solution, then show a C++20 solution, and finally show that C++20 coroutines are actually totally equivalent to something you could write yourself - coroutines are just syntactic sugar.

You can obviously write coroutines and use non blocking I/O or state machines even in C++98! It's just easier (sometimes) in 20.

Lots of coroutine talks do look like this, but some are beset by padding and nonsense that expand 20 minutes of content into an hour.

Again, no magic here. Don't handwave. You don't need to be a compiler engineer to understand this stuff and claiming otherwise is disingenuous. If you handwave and can't answer deeper questions, I'll lose respect for you. If half of your talk is filler but you can answer questions, that's fine but it's annoying.

Book Review: Children of Memory

2024-04-02T00:00:00Z

I recently read Children of Memory by Adrian Tchaikovsky. It's the third in the Children of Time series. Each book in the series shares a common backstory and general template:

Humanity sends out terraforming missions to the stars.
Earth collapses into war and disaster over hundreds of years.
Survivors send out ark ships to find and hopefully colonise the Edens waiting for them.
It doesn't go as planned: uplifted spiders are waiting for them, alien parasites, spacefaring octopuses, reality-bending alien technology does something weird.

I think this book was a bit disappointing and I'll explain why. Heavy spoilers, and this "review" makes no sense if you haven't read the book.

Tell, don't show

I really like the trope of a farm girl/boy living their normal, mundane life on a farm, and slowly discovering things aren't as they appear. I wish this book had more of that and that we learned what was going on with the characters on the colony rather than literally being shown what happened with the initial ark ship and colonisation through time skips.

This is almost the opposite of the usual complaint: people want books to show not tell. I think it would've been more engaging for us to learn about a hazy, barely remembered past through a little girl learning an oral history in a one room schoolhouse.

There was a great novel by Stephen Baxter with this kind of setup: Flux. A nomadic tribe of people are living inside a neutron star, but they don't know that. You work out what's really going on as they do. Baxter also did this kind of thing with Raft. I recommend both books! The setup is much weirder and sci-fi than the Children of Memory colony in both cases.

I tend to like books where you really have no clue what's going on. I realise this isn't everyone's preference - I've even seen Children of Memory reviews where people say it was too confusing and didn't explain what was going on enough.

More crow or less crow?

I liked the sentient crows but the story of how they gained sentience and the history of their society was nothing compared to the detailed treatment the spiders got in Children of Time or even what the octopuses got in Children of Ruin. I didn't really just want more of the crows necessarily, but injecting some crows as side characters with a shitty little side history means they feel like a gimmick at best.

Some possible solutions:

Make the crows a central part of the story rather than something that could just slot in or out. Spend much more time on them.
Remove them and have a separate book with a more detailed history and a different plot entirely.
Have them in the story as e.g. the Witch's familiars, but don't explain them.

I'm partial to (3). Have the crows, don't explain them. I don't know if we need every book to be a retread of Children of Time with a different species.

Missed opportunity for a depressing ending

The depressing and hopeless ending of the colony was great. Liff being a desperate little girl who was simply the last person to starve to death really tugged at my heart strings. I wish the book had the guts to end on this hopeless, soul-crushing note. Short stories ending on that kind of note have always been my favourite.

An example of a story with a memorably despairing tone is The Star by Arthur C Clarke. It's only a few pages long and I highly recommend finding it online and reading it. Very saddening.

Children of Memory hits you with a one-two punch of cop outs - first it was actually a simulation, then second Liff is actually brought out of the simulation into a cloned body and her pleas for food to eat are actually answered.

Totally unnecessary.

We get it already! Stop with the time loops!

Once we work out that there's a time loop or simulation, or something, we still go through another couple of iterations of the loop for no real reason. It's like if in the classic Star Trek: The Next Generation episode Cause and Effect, there were an extra two or three iterations of the time loop for 40 minutes in the middle of the episode for no reason.

Keep it tight! One loop where you don't know it's a loop and all of the context is set up, one loop where things are a bit weird and characters realise something is going on, then everything gets resolved.

Like I said, my preferred resolution would be that the colony was real, everyone died depressing and pointless deaths after a meagre existence. They tried their best and they failed.

This story seems like a much less emotionally impactful version of The Inner Light - instead of "it all happened long ago and none of them can be saved" we get "it's a simulation in a loop, let's clone all of the people and live together in a happy utopia".

I don't even get why cloning Liff is treated like saving her - those hundreds of iterations of death and suffering still happened and are still happening at the end! If the point is that simulations are sentient too, then cloning the simulations into real bodies doesn't do anything to save the simulations!

Kill the parasite!

I'm one of those people who thought the bad guys won at the end of Children of Ruin - the parasite is everywhere and could take control at some future date when it changes its mind. I do not for a second relate to it at all, I don't feel sorry for it when others distrust it, and I think the war against the parasite should never have ended.

There's a scene where Miranda hugs "Miranda" and consoles the parasite, saying it did a good job, I hated it. This is even after a scene where the parasite admits to itself it might take control of everyone at some point in the future. i.e. peoples' fears about it are rational.

I think the parasite perspective is interesting, but no-one else should even be remotely okay with it existing. Totally irrational.

I was also annoyed by a lot of "Miranda's" inner monologue, it felt very repetitive. Something being nonsense I can forgive, something being repetitive and a chore to read I can't forgive.

Verdict

Children of Time is really neat and has lots of cool ideas.

Children of Ruin has some cool ideas, a well done shift into horror halfway through, then a disappointing ending. Still well worth a read overall.

Children of Memory was just riddled with missed opportunities and I didn't like it nearly as much. I don't recommend reading it.

binfmt_misc: The magic behind Linux/Windows interop

2024-01-03T00:00:00Z

I was running something in WSL, as you do, then I thought about it for a second. When I'm doing this in WSL:

$ clip.exe < file.txt

How does that actually work? It turns out this is done using /init which is two things:

PID 1, it's the init system, the parent of all processes in WSL.
An "interpreter" for Windows executables. When you run clip.exe, that's the actual Windows binary you're running directly. This works via the binfmt_misc mechanism of Linux, which allows you to register runners for any binary with specific magic bytes.

/init is a bit hard to get at since it's a closed source component of WSL. We can get some idea of how it might work by looking at (1) a Microsoft blog post describing how this works at a high level and (2) cbwin, an open source implementation of this.

We can also do fun things, like make Java jars directly executable without needing to run them with java -jar. But beware - if you have "fully executable" jars with scripts embedded at the start (like the ones Spring Boot makes), binfmt_misc can't possibly be able to tell that they're jars.

But java -jar still works on them! Weird. Here are the questions we want to answer:

What happens when you run a "normal" Linux executable? What about a shell script?
How does Linux tell that clip.exe is a Windows executable, and how does it run from inside Linux?
How can Java tell that a shell script with some binary junk at the bottom is really a jar, but the Linux kernel (via binfmt_misc) can't?!

Answers are below.

What happens when you run a normal executable?

Let's take cat as an example. From inside your shell, you execute:

$ cat file.txt

Your shell will probably find cat in the PATH and use the execve system call to execute that file.

This is not mysterious at all. You can see the source code of execve here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/exec.c?id=HEAD#n2030.

This blog post isn't supposed to be a deep dive into execve, the point is that execve executes executables.

What about a shell script?

Believe it or not, also execve! execve reads the first two bytes of the given file, if they're #!, then the file gets executed in the way we're all familiar with.

If file.txt is given to execve with these contents:

#!/bin/sh

echo Hello

then execve will run /bin/sh file.txt, and we go back to the first case: a normal executable.

So far, so good, everyone should be familiar with this. The interesting part comes next.

What is binfmt_misc?

binfmt_misc is documented very well here: https://docs.kernel.org/admin-guide/binfmt-misc.html. At a high level, binfmt_misc is a feature of the Linux kernel that allows you to specify a rule matching either a filename suffix or magic bytes at an offset in the file, and an executable to use to run that file, similar to how a shell script is run.

For example, to match the .txt extension and cat the text file when "run", you could run:

$ sudo sh -c 'echo ":cattxt:E::txt::/bin/cat:" > /proc/sys/fs/binfmt_misc/register'
$ vim file.txt
$ chmod +x file.txt
$ ./file.txt
this is my file
it has content
hello

This isn't very useful. The next part is more interesting.

How does WSL tell clip.exe is a Windows executable?

Let's look at clip.exe:

$ vim /mnt/c/Windows/system32/clip.exe

Right at the start, you'll see the characters "MZ" - these are the first two bytes of any .exe file on DOS or Windows (and the initials of Mark Zbikowski).

MZ<90>^@^C^@^@^@^D^@^@^@ÿÿ^@^@¸^@^@^@^@^@^@^@@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@è^@^@^@^N^_º^N^@´   Í!¸^ALÍ!This program cannot be run in DOS mode.^M^M

...

Let's look at the binfmt_misc registrations (this example only works in WSL, of course):

$ ls /proc/sys/fs/binfmt_misc/
WSLInterop  register  status

It's too easy!

$ cat /proc/sys/fs/binfmt_misc/WSLInterop
enabled
interpreter /init
flags: PF
offset 0
magic 4d5a

And 4d5a is hex for "MZ". So when you execve a Windows executable like clip.exe, Linux will invoke /init to run clip.exe. The magic is thus inside /init.

/init is not open source. The blog post linked above has some hints and I encourage you to read it.

There's also https://github.com/ionescu007/lxss which contains some interesting proofs of concept for interacting across the Windows/Linux boundary.

How do fully executable jars work?

The interesting part about these is that they don't involve binfmt_misc at all, instead they use a different trick.

Go to https://start.spring.io/ and generate the example project. Add this section to the build.gradle to generate the "fully executable" jar:

bootJar {
  launchScript()
}

Run ./gradlew build to build the project. You get two jars:

$ ls build/libs/
demo-0.0.1-SNAPSHOT-plain.jar  demo-0.0.1-SNAPSHOT.jar

The first jar is not executable and has no main. The second jar is, with either java -jar or directly:

$ java -jar build/libs/demo-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v3.2.1)
...
^C
$ build/libs/demo-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v3.2.1)

But what gives, there was no binfmt_misc for Java jars?! The trick here is that the jar isn't a jar, it's a shell script:

$ less build/libs/demo-0.0.1-SNAPSHOT.jar
#!/bin/bash
...
<shell script>
...
exit 0
<what looks like binary data>

The binary data after the exit 0 is the jar. This is clever: when run directly, the shell script re-invokes the jar itself (the shell script itself!) with java -jar.

You can verify the binary data is a jar by looking at the magic bytes:

...
*)
  echo "Usage: $0 {start|stop|force-stop|restart|force-reload|status|run}"; exit 1;
esac

exit 0
PK^C^D^T^@^H^H^H^@
...

PK^C^D is exactly the magic byte string for a zip archive. A jar file is just a zip file with special contents.

This explains how directly invoking the jar executes it without involving binfmt_misc.

How does `java -jar` execute a jar with text at the start?

java isn't doing anything clever here, it just treats the jar as any other zip file - we can even extract the "fully executable" jar with unzip:

$ unzip build/libs/demo-0.0.1-SNAPSHOT.jar
Archive:  build/libs/demo-0.0.1-SNAPSHOT.jar
   creating: META-INF/
  inflating: META-INF/MANIFEST.MF
...

The cleverness here is in the zip file format itself, see https://en.wikipedia.org/wiki/ZIP_(file_format). A tool that reads a zip file must scan for the central directory data structure signature (some magic bytes) and read it from there. This means that we are allowed to have whatever preamble we want at the start of the file, including executable code, commonly used for self-extracting archives (e.g. an .exe you can run or open with your archive viewer).

This jar isn't self-extracting, but it is kind of self-running. I think it's a neat trick.

Conclusion: why we can't use `binfmt_misc` for jars

It's pretty common for fully executable jars to not have a .jar extension, since the whole point of being fully executable is that it's like a "normal" executable. This means we can't use binfmt_misc's extension matching.

We can't use the magic byte matching either since:

Jars are just zip files, they don't have any unique magic bytes! #! is at the start (which we can't and shouldn't hijack), and PK appears later, but we can't hijack that either, those are the zip file magic bytes and not all zip files are jars.
Even if there were jar specific magic bytes, we don't know the offset! The shell script at the start can be any length.

So binfmt_misc is useful for running files with a specific extension, magic bytes at a specific offset (e.g. Windows executables!) but jars don't have any of those.

Final verdict on `binfmt_misc`

binfmt_misc doesn't really seem incredibly useful if you ask me. One cool use case is registering QEMU as a handler for ARM executables while on an x86 machine, then you can run those binaries as if they were native. That doesn't seem like a real use case to me.

The WSL interop use case actually seems the most compelling to me, but is that a reason to have a whole kernel thing? I don't know.

Differences in backwards incompatibility between Rust and C++

2024-01-02T00:00:00Z

Why is the following change to a Rust struct backwards incompatible?

 struct S {
+    y: i8,
     x: i32,
 }

And why is the following change to a C++ struct backwards incompatible?

 struct S {
+    char y;
     int x;
 };

The answers are different and may surprise you. Rust provides fewer compile guarantees about structs by default and more guarantees in code interacting with structs than C++.

C++ is batshit crazy as always, providing guarantees no-one cares about while allowing you to invoke UB by accident.

Worth thinking about when deciding whether you need to bump the major version number of your Rust crate.

Why the C++ change is backwards incompatible

Let's start with the easy one. The C++ standard defines a standard layout struct as:

If you could write it in C, it's a standard layout class.

-- https://en.cppreference.com/w/cpp/language/classes#Standard-layout_class

Okay, that's not what it says - there are classes that are only expressible in C++ that are "standard layout" (e.g. using inheritance), but anything you could write in C is standard layout.

Standard layout classes have several guarantees: https://en.cppreference.com/w/cpp/language/data_members#Standard-layout but most importantly:

A pointer to an object of standard-layout class type can be reinterpret_cast to pointer to its first non-static non-bitfield data member

So given the struct:

struct S {
    int x;
};

the C++ standard guarantees that this code will work as expected:

int main() {
    S s{1};
    int x = *reinterpret_cast<int*>(&s);
    std::cout << x;
}

If we make the change above, this code compiles but is now undefined behaviour:

struct S {
    char y;
    int x;
};

int main() {
    S s{1};
    int x = *reinterpret_cast<int*>(&s);
    std::cout << x;
}

S* can be cast to a char* but not an int*! That's now UB.

What will likely actually happen on a little endian machine is that it will still print 1. We're reading the 0x01 at the start, and the rest of the struct is probably zeroes. But it's not guaranteed to be - the C++ standard says nothing about the presence or content of the padding in structs. It could be all 0xff!

This does start to "matter" on big endian machines (quotes because no-one cares), where the change is observable.

On a big endian machine where sizeof(int) == 4, the first struct may look like:

00 00 00 01

where reading the first 4 bytes gives an int with value 1, while the second struct looks like:

01 [00 00 00] 00 00 00 00

The [bytes] are padding bytes to make sure the int starts at an address divisible by 4.

Reading the first 4 bytes of this gives a completely different value: 0x01000000.

Two notes:

Struct members are guaranteed to be initialized to zero if the struct is partially initialized.
The contents of padding bytes aren't guaranteed to be 0x00.

Overall, this is as expected. The surprising part comes next.

Why the Rust change is backwards incompatible

Let's suppose you swapped two elements in a struct:

struct S {
    y: i8,
    x: i32,
}

struct S {
    x: i32,
    y: i8,
}

This is totally backwards compatible if you're using safe Rust! Rust makes almost no guarantees about the layout of a struct in memory! Unlike C and C++, Rust doesn't even guarantee the order of elements in memory matches the order in the struct declaration.

It's true!

From https://doc.rust-lang.org/reference/type-layout.html

The only data layout guarantees made by this representation are those required for soundness. They are:

The fields are properly aligned.

The fields do not overlap.

The alignment of the type is at least the maximum alignment of its fields.

There is no guarantee about order or pointer conversions.

If you scroll down in that page, you'll notice Rust has #[repr(C)] which gives you the same layout as C, if you want some stronger guarantees.

But why is our first change above still backwards incompatible? This code works:

struct S {
    x: i32,
}

pub fn main() {
    let my_s = S{ x: 1 };

    match my_s {
        S{x} => x,
    };
}

but this code doesn't compile:

struct S {
    x: i32,
    y: i8
}

pub fn main() {
    let my_s = S{ x: 1 };

    match my_s {
        S{x} => x,
    };
}

because the initialization is missing y, and the pattern match is missing y.

This is also totally unsurprising.

Conclusion

The moral here is:

C and C++'s standard layout provides some guarantees, but not really that many.
Rust's default struct memory layout provides almost no guarantees. But your Rust code is still safe, since the compiler can make sure you don't accidentally assume things about structs.

Technically, Rust might write winning lottery ticket numbers as padding at the start of your structs. Make sure to check.

Addendum: sizeof

One naive answer is that adding fields is backwards incompatible because the sizeof the struct changes.

This isn't exactly right since nothing in the C or C++ standards guarantees the sizeof a struct is the same even between compilations of the same program. This is trivially true because e.g. long may be 4 or 8 bytes depending on the platform and C implementation (long is always 4 bytes on Windows and is 8 bytes on 64 bit Linux, usually (yes, I know, the standard doesn't say anything about "bytes" in relation to int and long, only value ranges)).

So the struct sizeof could change arbitrarily depending on the time of day you compiled the program - you can't rely on that anyway.

Same with Rust:

In general, the size of a type is not stable across compilations

-- https://doc.rust-lang.org/std/mem/fn.size_of.html

Funnily enough, Rust guarantees that the std::mem::size_of a #[repr(C)] struct is stable if all members are also #[repr(C)]. So that guarantee is actually stronger than the C standard's guarantee.

This is unfair because GCC and Clang do guarantee a lot about struct layouts, it's just the C standard that doesn't. rustc and "the Rust language" are hard to separate because there's only really one real implementation that people use and no standard.

Comparing things unfairly is fun though.

How large are the arbitrage opportunities in Eve Online?

2023-09-23T00:00:00Z

I just noticed that we're now past the 10th anniversary of my first blog post, which I made on 2013-08-11! Maybe I'll write a retrospective. Moving on.

Recently, a friend suggested I'd be interested in EVE Online. For those not familiar, it's an MMO with lots of stuff but in particular, it has a player driven market economy. Prices are driven by market forces. Players place buy and sell orders for various items, other players fill those orders, market prices move over time.

It's exciting! CCP (the developers of EVE) even employ economists to help manage the in game money supply and inflation. Very exciting!

With any market the question is: how can I make the most money as quickly as possible for the least effort?

In real life, the answer may be to get a job. In EVE, I thought the answer might be to find and exploit arbitrages: market mispricings where someone is selling for a low price and someone else is buying for a high price.

Websites like eve-trading.net exist but don't let us answer the following questions:

Given an initial investment and fixed cargo space, which opportunity has the highest return (%) per jump? How does that vary with available capital? Warren Buffett famously said that having a small amount of money to invest is the trick to making good returns. Having $100B in cash is a curse - most "good" opportunities are simply too small.
Has the size of arbitrage opportunities changed over time?
Historically, where (in which systems) are the best opportunities?

These questions can be answered by downloading these datasets:

Market data from https://data.everef.net/market-orders/
Static data (e.g. about jump routes and how much cargo space items take up) from https://wiki.eveuniversity.org/Static_Data_Export

And doing some analysis. That's what I do in this post.

Getting the datasets

The market data from https://data.everef.net/market-orders/ is easily available, just download the files you want from your browser. For bulk downloads, follow the instructions here to use wget to download everything you want.

The static datasets don't require bulk download, just download them from your browser, you'll only need one file per dataset.

Setting up our analysis

I just used a Jupyter notebook and pandas to go through this dataset. The 2023 dataset up to the day of posting this is under 200 GiB, so it's not huge and can easily be analyzed on a laptop.

I also used the networkx package to create the EVE universe graph and calculate shortest paths.

The Jupyter notebook I'll be screenshotting graphs from is available in full from https://github.com/kaashif/eve-arbitrage-finder so I won't include the full code in this blog post. See there if you want to run the code.

Initially we'll just do all of the analysis for a single day, later we'll graph these over time.

Finding arbitrages

The conditions for an arbitrage are simple. We're looking for a sell order and a buy order where:

The sell order has a lower price than the buy order, so we can buy from the seller low and sell to a buyer high.
The sell order has a higher available quantity than the minimum quantity for the buy order.

The code to find these is incredibly simple:

if sell_price < buy_price and int(sell["volume_remain"]) > int(buy["min_volume"]):
    # arbitrage found!
    arbitrages.append((sell, buy))

And now we have our list of sell/buy pairs that give us some profit if we manage to execute. I know I ignored taxes, but let's just ignore those for this post.

Building the jump graph

In EVE, each system is connected by stargates, which you have to jump between. CCP provides a list of connected systems in the static data export linked above.

We can construct the universe graph:

with bz2.open("mapSolarSystemJumps.csv.bz2", mode="r") as data_csv:
    route_contents = data_csv.read()

route_contents_f = StringIO(bytes.decode(route_contents, "utf-8"))
route_reader = csv.DictReader(route_contents_f)

G = nx.Graph()

for row in route_reader:
    G.add_edge(row["fromSolarSystemID"], row["toSolarSystemID"])

Then to calculate the number of jumps between two points (solar system IDs):

len(nx.shortest_path(G, arb["from_system"], arb["to_system"]))

Very simple and fast - the universe graph isn't actually that big or dense.

How much do these opportunities return?

We would expect smaller initial investments to be able to return more: there are probably opportunities to double or triple 1M ISK but few to do the same with 100M.

We can graph the best return % in each snapshot (over the course of a day) in 2023 with 1M, 10M, 100M, and 1B in initial capital.

Holy shit, 1000% return for each arbitrage? The picture changes when we take into account jump distances and instead graph the return per jump for each opportunity:

But the concept stays the same - having more capital means lower returns. The returns just look a little less crazy now.

This graph takes into account compounding. i.e. a 100% return per jump with 2 jumps means you double your money twice for a 300% return overall. A simple division would mean the return is 300% overall, divide by two jumps gives 150% return per jump. We actually want to take the nth root of one plus the return, where n is the number of jumps.

Are these opportunities really risk free?

No. Hauling anything valuable through low or null sec systems is always dangerous. People camp gates and will kill you. That's the main risk.

The other risk is that the buy order you want to sell into will disappear or get corrected before you get there, leaving you holding the bag. This risk can be detected and mitigated. If the sell order you're buying from is mispriced (price significantly lower than the regional average) but the buy order isn't, you're probably fine. If the buy order does disappear, you can sell to another averagely priced order and still make the money.

If the buy order is mispriced too high, then there is risk.

Given historical market data, we can detect arbitrages and calculate probabilities of whether the buy order will disappear by checking whether it did - either someone fulfilled the order or it was corrected. Then we can assign an expected profit to each opportunity based on those probabilities.

I didn't implement this given there are (I think) better ways to make money, but it is interesting to think about.

If botting were allowed in EVE, no doubt all these opportunities would be gone. Or maybe people would regularly go full Knight Capital and create even more opportunities. Hard to say.

Conclusion

I'm pretty sure mining, PvP, doing raids, whatever, probably earns more money with much less analysis. This isn't the point. The analysis is the point!

I didn't answer all of the questions at the start.

Some things I want to implement in future:

Return per jump, taking into account your current position. A great opportunity near Jita isn't that great if you're 20 jumps from Jita.
Analysis of how fast a player can make money over time using a particular strategy. i.e. backtesting and simulating trading. Is this a waste of time long term?
Simulating other strategies e.g. making markets by placing buy and sell orders for the same item in the same place. Getting good enough at that may have higher returns for less risk.
Adding route security preferences and cargo size limits.
Determining whether the buy or sell side of the arbitrage is mispriced. If the sell is mispriced, you just have to buy at the low price and can sell at your leisure. If the buy is mispriced, you have to sprint from sell to buy - if it's gone (cancelled, amended) when you get there, you might just be fucked. I have to know whether to be nervous or not!

There's a lot of other stuff I'd like to know.

The code I'm using is also really slow, I haven't put any effort into optimising or profiling anything. The data could be re-encoded from .csv.bz2 (compressed text) files into Parquet or Avro files. Bzip2 is notoriously slow too, that can likely be improved.

The best route would probably be to convert all the files to a better file format (e.g. Parquet) then load everything and analyze it using e.g. Dask.

And jump distances could likely be pre-calculated.

Next time, I'll write a bit more about the technical details of analyzing a multi terabyte dataset with only 32 GB of memory, and try to get the analysis much faster so I can iterate. Right now I'm waiting minutes for a single day's worth of analysis.

Valuing converters in Sidereal Confluence

2023-08-08T00:00:00Z

(or: How I learnt to stop worrying and love opportunity cost)

I'm a fan of the board game Sidereal Confluence. It's a game where:

There are resources: small cubes, large cubes, ultratech (octagons)
You have converters that change sets of cubes into others (e.g. 2 white and one blue cube into 3 black cubes)
Your goal at the end of the game is to have the most victory points (VPs). Some converters make VPs, during the game you can research techs that give you VPs, and all resources you've accumulated convert to VPs at set rates.

The meat of the game is trading - players can trade anything, with any terms. The only rule is that trades are binding - you must honour your agreements. This leads to simple agreements like "I'll give you one white cube for one green cube", where the trade is obviously fair. You also sometimes get harder to value trades, like "This turn you give me one small white cube and next turn I'll give you one large black cube".

The rules have a guideline - three small cubes are worth two large cubes, which is worth the same as an ultratech. This helps with valuations.

One of my friends executed a strategy I found interesting:

He played the Eni Et, a race which has special converters. For example, a usual converter might take 3 small cubes and give 4 small cubes, for a ratio of 4/3, but an Eni Et converter might have a ratio of 2 or 3.
The catch is the Eni Et can't use their special converters and must trade them to other people who can.
He sold them permanently early on, valuing a 2 to 4 or 8 to 13 converter pretty highly. After all, they'll make dozens of cubes for you over the game, right?
He won. Did we pay too much for his converters?

Questions I want to answer quantitatively:

A simple one: if someone gives me a cube this turn in exchange for some number of cubes next turn, how many cubes should they demand?
A harder one: when buying an Eni Et converter permanently, how many cubes should I pay?

How to value future trades

Someone on BoardGameGeek has already done the hard work in this post. I don't know exactly how they did it, but they calculated average converter rates and color correction rates (the rate at which players trade non-convertible for convertible cubes). The reason that's difficult is that converters change as technologies are researched, and when exactly techs are done depends on the players, and which techs are drawn from the deck.

From those values, we can calculate the value of 1 small cube at the end of the game. See that forum post for the details, but the final results that I'll use are the values of 1 small cube at the end of the game on each turn:

Turn	Endgame Value per Cube
1	4.64
2	3.80
3	3.08
4	2.45
5	1.91
6	1.45

This means that on turn 1, on average (with average converters and trades), a single small cube will turn into 4.64 small cubes after all 6 turns are over.

We can now answer our first question. Assuming we want to keep our endgame value the same and we're paying 1 small cube on turn 1, we should demand 4.64 cubes of endgame value back on turn 2. On turn 2, each small cube is only worth 3.8 at endgame, so we need 4.64/3.8 = 1.22 small cubes back on turn 2.

We can make a table of the number of cubes we should demand on turn X+1 for a single cube on turn X.

Turn	# cubes to ask for next turn
1	1.2198
2	1.2355
3	1.2558
4	1.2815
5	1.3194
6	1.4500

One obvious fact is that the rate increases as the game goes on. This is because converters get better as the game goes on and techs get invented. If the best converter has a 1.5 ratio and someone invents a 1 to 10 converter, obviously you should start asking for more cubes in future trades for next turn - people will be able to pay.

Some conclusions:

I've played the Im'Dril Nomads a couple of times and they're very dependent on future trades. Usually people settle on a rate of something like 1 small cube this turn for 1 large cube next turn. This is a 1.5 rate, which is better than the "average" rate, strictly speaking.
These are rules of thumb. If, like the Im'Dril, future trades are crucial to run your high effiency converters, it's fine to give out favourable rates. These calculations assume you make average trades and run an average converter - almost no situations are average.
People will not generally agree to trades more than 1 turn in the future. Situations change too drastically - techs come out, new converters are invented, colonies are colonized. So more involved yield curve calculations are pointless except maybe if you're playing with bond traders.

Valuing Eni Et converters

Let's say the Eni Et are selling their 2 small cube to 4 small cube converter on turn 1. If you run it on all 6 turns, you'll make 12 small cubes.

The Eni Et will try to pitch that to you. It's really worth 12 cubes they'll say, I'll give you a good price - only 4 cubes! Maybe they'll hold an auction and get a bid of 5. It'll pay for itself quickly and let you do those hard techs they'll claim.

Let's make some assumptions:

You manage to run the converter every turn.
The converter you forego running is an average converter for that turn.

The average converter efficiency on each turn is:

Turn	Average Converter Efficiency
1	1.3
2	1.32
3	1.3
4	1.38
5	1.41
6	1.5

If you run the 2 to 4 converter and forego running an average converter, the number of extra cubes produced as a result of buying the converter is 4 - 2 * average converter efficiency. The cost of foregoing the average converter is the opportunity cost, the cost the Eni Et scammers want you to overlook.

We then multiply the extra cubes produced on each turn by the endgame value of a cube on that turn and sum all future extra cube production to find the value at endgame of a 2 to 4 converter.

Turn	Endgame Value per Cube	Avg Efficiency	Extra Cubes	Endgame Value	Converter endgame value	# cubes
1	4.64	1.3	1.4	6.50	22.73	4.90
2	3.80	1.32	1.36	5.17	16.23	4.27
3	3.08	1.3	1.4	4.31	11.06	3.59
4	2.45	1.38	1.24	3.04	6.75	2.75
5	1.91	1.41	1.18	2.26	3.71	1.94
6	1.45	1.5	1	1.45	1.45	1.00

By endgame the value of a 2 to 4 converter should be obvious. If everyone has a 2 to 3 converter as standard, then you should obviously pay at most 1 cube for a 2 to 4 converter.

The values on the previous turns are much less obvious, at least to me.

On turn 1 you should only pay 4.9 cubes for a 2 to 4 converter that you'll run every turn, even though it's going to produce 24 cubes from 12 you put in.

It gets even worse when you realise you won't always run the converter. If you run the converter on only 4/6 turns, you should only be paying 4/6 * 4.9 = 3.27 cubes for it.

I distinctly remember these converters being sold for far more than that, and not always being run.

Conclusions

Opportunity cost matters.

People pitching complex transactions as "really good deals" are probably tricking you.

A converter that you always run is like a 6 turn bond. You put in the cubes to run the converter once (buy the bond at par), collect fixed payments every turn (coupon payments), and once the 6 turns are over, you keep the cubes you put in (you get the principal back).

This kind of means that when the Eni Et sell you a converter, you're not buying a bond, you're buying the option to buy the bond, you don't get the converter with the initial investment on it already. Well, if you negotiate well you might.

The converter valuation was a discounted cash flow analysis of a security.

Normalizing cube values to endgame values is exactly the same thing as calculating present values, where the present is the end of the game. A cash flow further in the past is worth more, just as a cash flow further into the future is worth less.

Cubes aren't granular enough for every trade to be fair. Those decimal places getting shaved off in every trade are important. People very often trade 2 small cubes for one large, just because trading 1.5 small cubes isn't possible. If you're on the winning side of those trades, you'll be very happy.

The same is true for future trades: if you have the choice between running a 1.3 converter and loaning a cube to someone at a 1.5 rate - loan them the cube! They'll likely think it's a good deal if it lets them run a converter.

Even worse: if you have a pile of cubes waiting for a tech you can't do right away, don't leave it in a pile earning 0%, loan it out! All of these calculations are based on the "average" cube making 30% to 50% per turn!

I wish there were more data on Sidereal Confluence and how pros play it! I mean computer-readable transaction data, full game logs, that kind of thing. But people in real life aren't going to fill out logs and the game is much worse online, so I'm not sure that's ever going to happen.

Unless we get AI to play it or something.

Is implementing alloca(3) in C really impossible?

2023-07-13T00:00:00Z

alloca is a function provided by several C libraries (e.g. glibc) that lets you allocate a block of memory that will be freed when the calling function exits. It's usually done by allocating memory on the stack.

But here are a couple of questions:

No C standard or POSIX standard mentions alloca, so what "should" it really do?
Given that no C standard mentions the stack, is it even possible to implement alloca in C, or do you need assembly to move the stack pointer?
Given that compiling code with -fomit-frame-pointer usually results in addresses being expressed as relative to the stack pointer rather than the frame pointer, is it safe to move the stack pointer ourselves?

TL;DR: The answer is that you need special support from the compiler to implement alloca and you can't do it yourself, in C or assembly.

What should `alloca` do?

There's no standard to refer to, so let's look at the man pages. From Linux: https://man7.org/linux/man-pages/man3/alloca.3.html

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

From OpenBSD: https://man.openbsd.org/alloca

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed on return.

gnulib says something interesting: https://www.gnu.org/software/gnulib/manual/html_node/alloca.html

The alloca module provides for a function alloca which allocates memory on the stack, where the system allows it. A memory block allocated with alloca exists only until the function that calls alloca returns or exits abruptly.

There are a few systems where this is not possible: HP-UX systems, and some other platforms when the C++ compiler is used. On these platforms the alloca module provides a malloc based emulation. This emulation will not free a memory block immediately when the calling function returns, but rather will wait until the next alloca call from a function with the same or a shorter stack length. Thus, in some cases, a few memory blocks will be kept although they are not needed any more.

That's weird, OpenBSD and Linux both support HP PA-RISC but never said anything about a stack based alloca being impossible. That must be a quirk of HP-UX i.e. the OS rather than the hardware. I don't really have an explanation for that, since HP-UX actually does have alloca, and it does mention the stack: https://www.unix.com/man-page/hpux/3C/alloca/

Allocates space from the stack of the caller

Note that the stack on PA-RISC grows the opposite way to x86.

So alloca is consistent across a lot of Unices - allocates memory on the stack and frees it when the calling function exits.

Why should it be difficult to implement in C? Let's try it.

Can `alloca` be implemented in C?

No C standard mentions the stack, so if we take the stack-based definition of alloca seriously, it's completely impossible to implement in standards-compliant, platform-independent, compiler-independent C.

Let's forget about the stack.

Our goal is to be able to do this:

void f(size_t size) {
    char* arr = alloca(size);
    // ...
}

and for arr to be automatically freed at the end of f, that's it.

If we restrict ourselves to GCC and clang, and allow __attribute__((cleanup)) https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html, we can kind of do it:

void free_memory(void* ptr) {
    free(*(void**)ptr);
}

void f(size_t size) {
    __attribute__((cleanup(free_memory))) char* arr = malloc(size);
    // ...
}

This is okay, but isn't exactly what we're looking for. We can define some macros to make it nicer, but (1) this is on the heap, (2) we'd still need to add something to the start of our declarations to get this to work.

I don't think there's any way to get this to work.

Variable length arrays (VLAs) let us allocate arrays of variable length on the stack, but that is done through the compiler doing magic. Let's at least try to avoid that for the moment.

Can `alloca` be implemented in x86 assembly?

We can't implement alloca as a normal function, since functions get their own stacks. Using the System V x86_64 calling convention, rsp is callee preserved - we're not allowed to change the size of the calling function's stack.

Let's do it using inline assembly:

#define grow_stack(s) asm("sub rsp, " #s)

This doesn't work because although it does grow the stack, and we can store things there, we aren't restoring the original stack pointer before we return.

This means code as simple as this will segfault:

#define grow_stack(s) asm("sub rsp, " #s)

int main() {
    grow_stack(1);
    return 0;
}

The compiler knows nothing about our stack pointer manipulation:

int main() {
    size_t s = 1;
    grow_stack(1);
    printf("%zu", s);
    return 0;
}

Although we grew the stack, when we get to the printf, the compiler still believes s is at the top of the stack, printing s will print garbage if the compiler is using offsets from rsp. It might work otherwise. This depends on the optimization level, which is a hint that we shouldn't be doing this.

How is alloca actually implemented?

We need help from the compiler. The compiler needs to know when we call alloca so that it can adjust all references to the stack pointer after the "call" to alloca.

The glibc "implementation" of alloca is:

#ifdef    __GNUC__
# define alloca(size)  __builtin_alloca (size)
#endif /* GCC.  */

But in gnulib (the GNU portability library, intended to work with lots of compilers), there's an actual implementation of alloca in C. It works very differently and has different semantics to the normal alloca: https://github.com/coreutils/gnulib/blob/master/lib/alloca.c

/* (Mostly) portable implementation -- D A Gwyn

   This implementation of the PWB library alloca function,
   which is used to allocate space off the run-time stack so
   that it is automatically reclaimed upon procedure exit,
   was inspired by discussions with J. Q. Johnson of Cornell.
   J.Otto Tennant <jot@cray.com> contributed the Cray support.

   There are some preprocessor constants that can
   be defined when compiling for your specific system, for
   improved efficiency; however, the defaults should be okay.

   The general concept of this implementation is to keep
   track of all alloca-allocated blocks, and reclaim any
   that are found to be deeper in the stack than the current
   invocation.  This heuristic does not reclaim storage as
   soon as it becomes invalid, but it will do so eventually.

   As a special case, alloca(0) reclaims storage without
   allocating any.  It is a good idea to use alloca(0) in
   your main control loop, etc. to force garbage collection.  */

This doesn't reclaim on caller exit, it reclaims garbage on the next call to alloca. If you read through the implementation, you'll also notice it uses malloc and free, not the stack.

If GNU couldn't make alloca without cheating work, then we can't either.

Final word

There's a page here about the advantages of alloca: https://www.gnu.org/software/libc/manual/html_node/Advantages-of-Alloca.html

But I don't buy it - the unavoidable downside of alloca is that if you try to allocate too much memory, it'll fail. And it's very hard to tell if it failed, you don't just get back NULL like with malloc, your program will likely crash in some way.

This why every alloca man page warns against its use. From the OpenBSD man page:

The alloca() function is unsafe because it cannot ensure that the pointer returned points to a valid and usable block of memory. The allocation made may exceed the bounds of the stack, or even go further into other objects in memory, and alloca() cannot determine such an error. Avoid alloca() with large unbounded allocations.

Don't use alloca.

Booting the 1994 Dr Dobb's 386BSD 1.0 CD

2023-06-19T00:00:00Z

386BSD 1.0 was released in 1994 on a CD in an issue of Dr Dobb's Journal. There are guides on the internet on how to boot 386BSD 1.0 in QEMU, like http://gunkies.org/wiki/Talk:Installing_386BSD_1.0_on_Qemu but I don't think there are any guides on how to boot it like someone in 1994 would've booted it, from a real MS-DOS installation.

Rather funnily, 386BSD is listed as "theoretically bootable" here: https://gunkies.org/wiki/386BSD_1.0. And there's a post on WinWorld saying "Personally I have no idea how to boot it (honestly don't ask)" with no elaboration: https://forum.winworldpc.com/discussion/13240/offer-386bsd-reference-cd-rom.

It's time to put theory into practice and work out how to boot this OS. There are a couple of things I want to try:

DOSBox - maybe it'll work?
QEMU with MS-DOS 6.22
The instructions from gunkies.org

You can download the CD image here: https://archive.org/details/386BSD1.0 and follow along.

Also, RIP Bill Jolitz.

Poking around the CD image

Download that ISO above, then mount it:

$ sudo mount 386BSD-1.0.iso /mnt
$ ls /mnt
386bsd        arch      cd            etc       nbsd         setup1.ex_    SOFTSUB.TXT  vbrun200.dl_
386bsd.ddb    b         CONTRIB.TXT   INFO.TXT  RELEASE.TXT  setup.exe     tmp
386bsd.small  bin       COPYRGHT.TXT  install   root         setupkit.dl_  usr
a             boot.exe  dev           mnt       sbin         setup.lst     var

A few things to notice:

There are some EXEs around. We're probably supposed to run boot.exe from DOS:

  $ file boot.exe
  boot.exe: MS-DOS executable

And setup.exe from Windows:

  $ file setup.exe 
  setup.exe: MS-DOS executable, NE for MS Windows 3.x (EXE)

There are some BSD kernels. We can tell because they're executables and have a bunch of kernel-looking strings in them:

  $ file 386bsd
  386bsd: a.out little-endian 32-bit demand paged pure executable not stripped
  $ strings 386bsd
  ...
  /sbin/init
  ...
  %s: blkdev %d too big, not configured.
  %s: blkdev %d already used by %s, not configured.
  devif: config %s blkdev
  %s: chrdev %d too big, not configured.

These look like the messages you get when booting e.g. OpenBSD, which is a descendent of 386BSD.

There don't seem to be any instructions on the CD itself. We're probably supposed to read the magazine.

Reading the instructions

I cannot find any instructions anywhere. This section is a placeholder. If I find the issue of Dr Dobb's Jorunal or the instructions that came with the CD, I'll make another post.

Trying to boot it from DOSBox

First, install DOSBox:

$ sudo apt install dosbox

You need to change some settings in your DOSBox config which, for me, lives at ~/.dosbox/dosbox-0.74-3.conf. This is because boot.exe has a faulty (or maybe it's not faulty) Windows detection routine which aborts when you have certain DOSBox settings. A failed boot looks like this:

Z:\> imgmount E ~/tmp/3.iso -t iso
Z:\> E:
E:\> boot 386bsd
Text 466944
Data 20480
Start 0xfe000000
Cannot run from Windows DOS Shell

If you make sure the [dos] section looks like this:

[dos]
#            xms: Enable XMS support.
#            ems: Enable EMS support.
#            umb: Enable UMB support.
# keyboardlayout: Language code of the keyboard layout (or none).

xms=false
ems=false
umb=false
keyboardlayout=auto

Then you can get a bit further:

Text 466944
Data 20480
Start 0xfe000000
can't open emm
386BSD Release 1.0 by William & Lynne Jolitz [1.0.22 10/27/94 15:32]

After this we see some concerning probing messages:

probing for wd port 1f0
probing for fd port 3f0

And these probes don't find anything. It seems like the hardware DOSBox presents to the 386BSD kernel is too fake and too different from what's supported, meaning we can't boot.

panic: cannot mount root
press key to boot/dump

I don't think I'll be able to get this to work.

Running setup.exe

This is interesting, there's a setup.exe that does something mysterious. I installed Windows 3.11 into an MS-DOS 6.22 VM using QEMU and ran setup.exe to see what would happen.

A setup wizard runs and installs a bunch of manuals! This is what it looks like:

Kind of cool, but this doesn't actually help us run 386BSD.

Attempt 2: MS-DOS 6.22 in QEMU

I got a bit further with this but spoiler alert: I didn't actually manage to boot to a shell with this either.

Install QEMU:

$ sudo apt install qemu-system-x86

Get MS-DOS 6.22 install disk images: https://winworldpc.com/product/ms-dos/622.

Make a disk image:

$ qemu-img create msdos.disk 2G

2G is the maximum MS-DOS partition size.

Boot the first floppy and follow the instructions:

$ qemu-system-i386 -hda msdos.disk -fda disk1.img

When asked to swap the floppy, press CTRL-ALT-2, you'll see the QEMU monitor prompt:

QEMU 6.2.0 monitor - type 'help' for more information
(qemu)

Run this to change the floppy:

(qemu) change floppy0 disk2.img

And press CTRL-ALT-1 to switch back to the MS-DOS installer. Continue until MS-DOS is installed.

Get the MS-DOS CD Extensions: https://winworldpc.com/product/ms-cd-extensions-msc/125 and install them by running setup.exe. This is so you can use your CD drive.

$ qemu-system-i386 -hda msdos.disk -fda mscdex.img

In DOS:

C:\> A:
A:\> install

Then it'll hang for a few minutes, but that's fine. Remove the floppy and reboot.

Now start the VM with the CD inserted:

$ qemu-system-i386 -hda msdos.disk -cdrom 3.iso

You'll see the CD-ROM driver stuff on boot:

Booting from Hard Disk...
Starting MS-DOS...


HIMEM is testing extended memory...done.


----------------------------------------------------------------
|         E-IDE/ATAPI  CD-ROM device driver,  Ver 1.25         |
| Copyright (C) LG Electronics Inc. 1997. All rights reserved. |
----------------------------------------------------------------
Unit 0:  QEMU      QEMU DVD-ROM      Product Rev.: 2.5+
Transfer Mode      : Programmed I/O


C:\>C:\DOS\SMARTDRV.EXE /X
MSCDEX Version 2.23
Copyright (C) Microsoft Corp. 1986-1993. All rights reserved.
Drive D: = Driver MSCD000 unit 0
C:\>

There's a problem, the CD drivers and stuff take up too much conventional memory, and we can't boot 386BSD:

D:\>boot 386bsd
boot: need 23632 more bytes of conventional memory
boot: cannot allocate enough DOS program memory - reduce DOS size

The problem is that a whopping 101K of conventional memory is filled:

D:\>mem 

Memory Type        Total  =   Used  +   Free
----------------  -------   -------   -------
Conventional         639K      101K      538K
Upper                  0K        0K        0K
Reserved               0K        0K        0K
Extended (XMS)    64,512K    2,112K   62,400K
----------------  -------   -------   -------
Total memory      65,151K    2,213K   62,938K

Total under 1 MB     639K      101K      538K

We can fix this by using EMM386 to move some stuff out of conventional memory. Make config.sys look like this:

DEVICE=C:\DOS\HIMEM.SYS
DEVICE=C:\DOS\EMM386.EXE NOEMS
DOS=HIGH,UMB
FILES=30

LASTDRIVE=Z
DEVICEHIGH=C:\CDROM\GSCDROM.SYS /D:MSCD000 /

The key is that the CD driver and as much of DOS as possible needs to live in upper memory. This results in much less used conventional memory:

Memory Type        Total  =   Used  +   Free
----------------  -------   -------   -------
Conventional         640K       24K      616K
Upper                 99K       81K       18K
Reserved             384K      384K        0K
Extended (XMS)    64,413K    2,353K   62,060K
----------------  -------   -------   -------
Total memory      65,536K    2,842K   62,694K

Total under 1 MB     739K      105K      634K

Largest executable program size       616K (630,864 bytes)
Largest free upper memory block        18K  (18,048 bytes)
MS-DOS is resident in the high memory area.

And now we can boot 386BSD:

D:\>boot 386bsd
Text 466944
Data 20480
Start 0xfe000000
Warning: Too little RAM memory, running in degraded mode.
panic: pmap_ptalloc: kernel pmap
press key to boot/dump

We need to boot QEMU with more memory (8M should be plenty):

$ qemu-system-i386 -hda msdos.disk -cdrom 3.iso -m 8

Trying to boot again:

D:\>boot 386bsd
Text 466944
Data 20480
Start 0xfe000000
386BSD Release 1.0 by William & Lynne Jolitz. [1.0.22  10/27/94 15:32]
Copyright (c) 1989-1994 William F. Jolitz. All rights reserved.
clk:  irq0
pc: pc0 <color> port 60 irq1
aux:  port 310 irq12
wd: wd0 <QEMU HARDDISK>  port 1f0 irq14
fd: fd0:  port 3f0 irq6 drq2
com: com1: fifo port 3f8 irq4
lpt: lpt0  port 378 irq7
npx: npx: irq13
mcd:  port 300 irq10
wd0: cannot find label (no disk label)
panic: cannot mount root
press key to boot/dump

Now we're getting somewhere - 386BSD has scanned and detected the hard disk, although there's no BSD disklabel.

Trying the Gunkies instructions

At this point I'm curious as to whether the Gunkies instructions work: http://gunkies.org/wiki/Talk:Installing_386BSD_1.0_on_Qemu.

And they do!

$ qemu-system-i386 -fda ddbboot.img -hda disk -hdb 3.iso -m 8
A:\>boot 386bsd.ddb wd1d
Text 335872
Data 114688
Start 0xfe000000
can't open emm
386BSD Release 1.0 by William & Lynne Jolitz. [1.0.21  10/27/94 14:23]
Copyright (c) 1989-1994 William F. Jolitz. All rights reserved.
clk:  irq0
pc: pc0 <color> port 60 irq1
aux:  port 310 irq12
wd: wd0 <QEMU HARDDISK> wd1 <QEMU HARDDISK>  port 1f0 irq14
fd: fd0: 1.44M port 3f0 irq6 drq2
com: com1: fifo port 3f8 irq4
lpt: lpt0  port 378 irq7
npx: npx: irq13
mcd:  port 300 irq10
erase ^?, kill ^U, intr ^C

But the install hangs when partitioning the disk, when we try to create the backup superblocks:

# ./install
...
super-block backups (for fsck -b #) at:
32, 16224, 32416, 48608, 64800, 80992, 97184,

Then it just hangs.

Conclusion

Getting 386BSD 1.0 working is much more difficult than 0.1 and 0.2, which are much more well studied and patched.

There may be a problem with newer versions of QEMU, specifically SeaBIOS. It looks like all of the guides which end in success involve compiling e.g. QEMU 0.12.3 and using the PC BIOS included with that version.

If I do something non-trivial on 386BSD like compiling a program or running an FTP server, I'll make another post. Right now, I've almost got the installer to work, which is a somewhat unsatisfying place to stop.

Adding keyword arguments to Java with annotation processing

2023-03-27T00:00:00Z

Java is a language missing a lot of features. One of those missing features is keyword arguments. By that, I mean something that lets you call functions like this:

my_function(x=1, y=2, z=3)

Or even:

my_function(z=3, x=1, y=2)

That is, arguments that are named, can be reordered, and are non-optional at compile time. You might quibble: Python doesn't have compile time. But you can run mypy to check types and if you're missing a required keyword argument, mypy will fail.

Let's limit our scope to constructors, and aim that if given code like this:

package org.example;

@ReorderableStrictBuilder
public record MyBuiltClass(String first, String second, String third) {
}

We want to be able to construct an object something like this:

// Named arguments
var x = Builder.create().setFirst("1").setSecond("2").setThird("3").build();

// Reorderable
var x = Builder.create().setSecond("2").setThird("3").setFirst("1").build();

// Compile time error if you miss out any arguments - this shouldn't compile.
var x = Builder.create().setSecond("2").setThird("3").build();

Is that even possible? I tried to find out. First, let's look at some solutions I don't like.

Errors at runtime - the worst kind of "builder"

The way the builder pattern is usually done is pretty bad. Someone said one time that design patterns are just evidence that the language isn't powerful or expressive enough for the programmer's ideas. Any structure or regularity in the code is repetition, and could be eliminated with a powerful enough language or macro system.

I don't know if that's true, but the builder pattern is usually obviously unsafe or unergonomic. Here's an unsafe or just plainly incorrect example: JAXB code generation.

JAXB is a way to (among other things) generate Java classes from XSD schemas. In an XSD schema you can mark fields as required or optional. JAXB classes can be used in a pseudo-buildery way like so:

var x = new MyClass().withX(1).withZ(2);

But there's a problem - there's no compile-time enforcement of required parameters. Meaning your schema can have all required types, but you can just do new MyClass() and even serialize it. No-one will complain, except at runtime, maybe. If you're lucky.

Terrible, terrible, we obviously don't want to just delegate all checks to runtime.

The usual builder pattern

I haven't done much research on this, but from various tutorials online e.g. this one from DigitalOcean, you can see something typical probably.

The required arguments are in the constructor:

public ComputerBuilder(String hdd, String ram){
    this.HDD=hdd;
    this.RAM=ram;
}

You might think this is fine even for a small number of arguments, but they have the same type! You can swap them around by mistake, and it's hard to tell at the call site. Your IDE probably can't help you except by simulating keyword arguments.

You type something like:

var x = new ComputerBuilder(ram, hdd);

and you can't tell it's wrong looking at the file or in the diff. Your IDE (if it's IntelliJ, at least) might annotate this as:

var x = new ComputerBuilder(hdd: ram, ram: hdd);

and you can tell something's up. Fine, but that's showcasing a glaring deficiency in the language that the IDE papers over.

One solution which isn't naming the arguments is to name the types better, perhaps have an Hdd class and Ram class. Maybe a good idea, but it's a lot of work and introduces a lot of code. A topic for another blog post.

This kind of builder pattern is out - it's not even really a builder pattern, the core is just a plain old Java constructor.

Staged builder

There's another solution which will get us named arguments with a compile time error if we miss anything required - staged builders.

A great example is https://immutables.github.io/immutable.html#staged-builder which can generate staged builder code for you. The way it works is that each builder method returns a new builder type with only one method that lets you set the next required field.

Immutables has a great example:

// under the hood
public final class ImmutablePerson implements Person {
  ...
  public static NameBuildStage builder() { ... }
  public interface NameBuildStage { AgeBuildStage name(String name); }
  public interface AgeBuildStage { IsEmployedBuildStage age(int age); }
  public interface IsEmployedBuildStage { BuildFinal isEmployed(boolean isEmployed); }
  public interface BuildFinal { ImmutablePerson build(); }
}

And they can generate that code for you with an annotation. This is fine, the errors are at compile time and you have to name all of your arguments. Maybe this is good enough, but we do lose the reorderability of the arguments. And there are a lot of extra types around.

Let's file that one away and move on.

My attempt at a builder annotation

I saw this StackOverflow answer and it was something I hadn't seen before in Java (I haven't seen all that much Java code). It kind of reminded me of template metaprogramming in C++. The example, as written, is this:

public static void test() {
    // Compile Error!
    Complex c1 = new Complex(Complex.Builder.create().setFirst("1").setSecond("2"));

    // Compile Error!
    Complex c2 = new Complex(Complex.Builder.create().setFirst("1").setThird("3"));

    // Works!, all params supplied.
    Complex c3 = new Complex(Complex.Builder.create().setFirst("1").setSecond("2").setThird("3"));
}

The interface isn't exactly what we wanted, but it's close. The trick is to simulate a non-type boolean template parameter to keep track of which values have been set:

public static class Builder<Has1,Has2,Has3> {
    public static class False {}
    public static class True {}
    
    private Builder() {}
        public static Builder<False,False,False> create() {
        return new Builder<>();
    }

    public Builder<True,Has2,Has3> setFirst(String first) {
        this.first = first;
        return (Builder<True,Has2,Has3>)this;
    }
    // ...
}

Then we only expose a constructor for the final result that takes the builder with all type parameters set to true:

public Complex(Builder<True,True,True> builder) {
    first = builder.first;
    second = builder.second;
    third = builder.third;
}

I thought that was pretty clever. Ideally I'd want something like:

public static class Builder<Has1,Has2,Has3> {
    if (Has1 && Has2 && Has3)
    public Complex build() {
        return new Complex(first, second, third);
    }

i.e. the build method only gets exposed after all fields are populated, but Java can't do that kind of thing. C++ can with e.g. enable_if.

I wrote an annotation processor here: https://github.com/kaashif/java-keyword-args/ which given:

@ReorderableStrictBuilder
public record MyBuiltClass(String first, String second, String third) {
}

generates this builder code:

public class MyBuiltClassBuilder<HasFirst, HasSecond, HasThird> {
    private MyBuiltClassBuilder() {}
    private static class True {}
    private static class False {}
    public static MyBuiltClassBuilder<False, False, False> create() {
        return new MyBuiltClassBuilder<False, False, False>();
    }
    private java.lang.String first;
    public MyBuiltClassBuilder<True, HasSecond, HasThird> setFirst(java.lang.String arg) {
        this.first = arg;
        return (MyBuiltClassBuilder<True, HasSecond, HasThird>) this;
    }
    private java.lang.String second;
    public MyBuiltClassBuilder<HasFirst, True, HasThird> setSecond(java.lang.String arg) {
        this.second = arg;
        return (MyBuiltClassBuilder<HasFirst, True, HasThird>) this;
    }
    private java.lang.String third;
    public MyBuiltClassBuilder<HasFirst, HasSecond, True> setThird(java.lang.String arg) {
        this.third = arg;
        return (MyBuiltClassBuilder<HasFirst, HasSecond, True>) this;
    }
    public static MyBuiltClass build(MyBuiltClassBuilder<True, True, True> builder) {
        return new MyBuiltClass(builder.first, builder.second, builder.third);
    }
}

Which seems to give us everything we wanted, except that the API is a little ugly:

MyBuiltClass c1 = MyBuiltClassBuilder.build(MyBuiltClassBuilder.create().setFirst("1").setSecond("2").setThird("3"));

It's really not that bad of a tradeoff given that we can enforce at compile-time that all setters must be called at least once.

The worst part of this process was using JavaPoet to generate this code. For those unfamiliar, Java annotation processors output text - they're barely a level above C preprocessor macros. JavaPoet is a library that helps with that. It was easier to just generate the text myself.

The billion dollar elephant in the room

Oh, we have compile-time enforced required parameters do we? WRONG! You can still do something like builder.setFirst(null) and no-one can stop you! This is sad.

No builder pattern will save you from null in Java. Switch to Kotlin.

Conclusion

I think you should probably just use Immutables - it's a great library with a great staged builder annotation. I think that's the best off-the-shelf solution even if it does mean we lose reorderability of arguments.

Writing an annotation processor in Java isn't that painful actually. Take a look at the code! https://github.com/kaashif/java-keyword-args/ This is my first time doing anything like this in Java, it was fun to learn about.

The problem with using splice(2) for a faster cat(1)

2023-03-12T00:00:00Z

A few weeks ago, I was reading a Hacker News post about a clipboard manager. I can't remember which one exactly, but an example is gpaste - they let you have a clipboard history, view that history, persist things to disk if you want, and so on.

One comment caught my eye: it asked why clipboard managers didn't use the splice(2) syscall. After all, splice allows copying the contents of a file descriptor to a pipe without any copies between userspace and kernelspace.

Indeed, replacing a read-write combo with splice does yield massive performance gains, and we can benchmark that. That got me thinking: why don't other tools use splice too, like cat? What are the performance gains? Are there any edge cases where it doesn't work? How can we profile this?

There are blog posts from a while ago lamenting the lack of usage of splice, e.g. https://endler.dev/2018/fastcat/ and interestingly enough, things may have changed since 2018 (specifically, in 2021), giving us new reasons to avoid splice.

The conclusion is basically that splice isn't generic enough, the details are pretty interesting.

What's our performance metric?

The basic question we're trying to answer is how fast can a program take a filename and write the contents to stdout? We're measuring performance in bits per second.

One important point is that we want to benchmark with the kernel read cache warmed, i.e. we run the benchmarks a few times until the number settles down. This is important because the only difference between any of our methods will be a memory-to-memory copy, which is always going to be multiple times slower than a disk-to-memory read, even with DMA.

Warming the read cache means everything is memory-to-memory and differences in how we do that will show up.

I'll create a file with 10,000M of zeroes and benchmark cat using pv as follows:

$ dd if=/dev/zero of=10g_zero bs=1M count=10000
$ cat 10g_zero | pv > /dev/null
...
$ !! # Repeat to warm cache
9.77GiB 0:00:02 [4.72GiB/s] [   <=>                                                             ]

So 4.72GiB/s is the number to beat!

read-write implementation

This is the dumb way you'd write a file to stdout. Make a buffer, open the file, read it out in chunks, and write those chunks to stdout. The only thing to tune here is really the buffer size I think. 32k seems to get the best performance on my machine.

Here's the code, no error handling:

#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>


int main(int argc, char* argv[]) {
    size_t buf_size = 32 * 1024;
    char *buf = malloc(buf_size);

    char *fname = argv[1];
    int fd = open(fname, O_RDONLY);

    while (1) {
        ssize_t bytes_read = read(fd, buf, buf_size);

        if (bytes_read == 0) {
            return EXIT_SUCCESS;
        }

        write(STDOUT_FILENO, buf, bytes_read);
    }
}

I called this slow.c. Here's the benchmark:

$ ./slow 10g_zero | pv > /dev/null
9.77GiB 0:00:01 [7.38GiB/s] [  <=>                                                              ]

So that's actually faster than cat already. 7.38 GiB/s vs 4.72 GiB/s. But this is doing unnecessary memory-to-memory copies from kernelspace to userspace on read, then from userspace to kernelspace on write. Our ideal solution would just move (not even copy) pages from the file to stdout, with all buffers owned by the kernel.

splice implementation

The splice implementation is a bit more complex, but not much. Looking at the man page for splice with man 2 splice, we can see the description:

splice() moves data between two file descriptors without copying between kernel address
space and user address space.  It transfers up to len bytes of data from the  file  de-
scriptor  fd_in  to  the file descriptor fd_out, where one of the file descriptors must
refer to a pipe.

Here's my code for my splice-based cat:

#define _GNU_SOURCE

#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

int main(int argc, char *argv[]) {
    size_t buf_size = 16 * 1024;

    char *fname = argv[1];

    int fd = open(fname, O_RDONLY);
    off64_t offset = 0;

    while (1) {
        ssize_t bytes_spliced = splice(fd, &offset, STDOUT_FILENO, NULL, buf_size, SPLICE_F_MOVE | SPLICE_F_MORE);

        if (bytes_spliced == 0) {
            return EXIT_SUCCESS;
        }

        if (bytes_spliced < 0) {
            fprintf(stderr, "%s\n", strerror(errno));
            return bytes_spliced;
        }
    }
}

I called this fast.c.

Some notes about this:

#define _GNU_SOURCE gives us access to splice, which is a non-standard (where the standard is POSIX) extension to fcntl.h. This is one reason splice probably isn't used more widely - it's not portable.
The flag SPLICE_F_MOVE is literally a no-op, it used to be a hint to the kernel to move pages where possible, but now does literally nothing. I added it because I do want a move, but I know it does nothing.
SPLICE_F_MORE is a hint saying more data is coming in a future splice. It's true for most splices in our case (all but the last). Not sure how useful it is outside of socket programming, where it's sometimes not obvious to the kernel that more data is coming.

Enough with the notes! Let's see some performance numbers!

$ ./fast 10g_zero | pv > /dev/null
9.77GiB 0:00:00 [26.8GiB/s] [ <=>                                                               ]

Whoa, holy shit, 26.8 GiB/s? That's more than 5.6x as fast as cat! This warrants some further investigation.

Profiling, fast and slow

This section title is a reference to "Thinking, Fast and Slow" by Daniel Kahneman, which I haven't read.

fast is so fast I feel like we have to look into it to make sure nothing weird is going on.

We can use perf to profile our programs and see where we're spending time. You can install it by installing the linux-tools version specific to your kernel version. I'm on Ubuntu so I needed to do:

$ sudo apt install linux-tools-5.19.0-32-generic

Let's look at cat first. Here's the command to run your program and record performance in perf.data:

$ sudo perf record -- cat ../10g_zero > /dev/null

Why sudo? Without sudo, perf says something about kernel symbols and symbol map restrictions if you're not root, so I just run everything here as root. Sue me. It's not like we're running untrusted code here!

To generate a breakdown with the percentage of time spent in each function:

$ sudo perf report

For the above case, the report looks like:

Overhead  Command  Shared Object      Symbol
  75.43%  cat      [kernel.kallsyms]  [k] copy_user_generic_string                              
   3.22%  cat      [kernel.kallsyms]  [k] filemap_read                                          
   2.75%  cat      [kernel.kallsyms]  [k] filemap_get_read_batch

Then a bunch of negligible <1% stuff.

The function copy_user_generic_string copies to/from userspace. It's clear that's what's taking the vast majority of time. The perf report for slow looks the same:

Overhead  Command  Shared Object         Symbol
  70.53%  slow     [kernel.kallsyms]     [k] copy_user_generic_string
   3.86%  slow     [kernel.kallsyms]     [k] filemap_read
   3.82%  slow     [kernel.kallsyms]     [k] filemap_get_read_batch

This is as expected. Let's look at the perf report for fast:

$ sudo perf record ../fast ../10g_zero > /dev/null
Invalid argument

Oh, that's because at least one of the input and output have to be a pipe and in this case, both are files. Let's just throw a cat in there:

$ sudo perf record ../fast ../10g_zero | cat > /dev/null
Invalid argument

Huh? What? This is annoying, maybe perf does something dodgy to stdout so we can't splice to it? Let's try making perf output to a file:

$ sudo perf record -o perf.out -- ../fast ../10g_zero | cat > /dev/null

That finally works. What an ordeal. The report looks like this:

Overhead  Command  Shared Object         Symbol
  60.55%  fast     [kernel.kallsyms]     [k] mutex_spin_on_owner                                ▒
   7.86%  fast     [kernel.kallsyms]     [k] filemap_get_read_batch                             ▒
   2.95%  fast     [kernel.kallsyms]     [k] copy_page_to_iter                                  ▒
   2.86%  fast     [kernel.kallsyms]     [k] __mutex_lock.constprop.0                           ▒
   2.47%  fast     [kernel.kallsyms]     [k] copy_user_generic_string

Notice how little time we're spending copying pages between user and kernel. It's clear that the stories of increased performance are true.

The final straw: why splice isn't more widely used

Our journey has led us to a few reasons why splice isn't used more widely:

Not portable: this is kind of a non-reason because everyone just uses Linux, but maybe someone cares about this.
Not general: you can't splice between files and files (you can just use sendfile for that anyway), or sockets and sockets, you need to have a pipe at one of the ends of the splice. This means file-to-file operations like cat f1 f2 f3 > f4 are impossible with splice.
Not universally supported: not all filesystems actually let you splice to/from them. It's possible to try a fast implementation and fall back to a slow one if we're on a non-splice filesystem, but that adds complexity for little gain.

And here's the kicker IMO: there still are bugs. Here's one, you still can't splice from /dev/zero to a pipe:

$ ./fast /dev/zero | pv > /dev/null
Invalid argument

Here's a thread on the kernel mailing list about that: https://lore.kernel.org/all/202105071116.638258236E@keescook/t/. It's slightly unfair to call this a bug since it was intentional - the death of generic splice was a planned affair:

The general loss of generic splice read/write is known.

The ultimate reason for this /dev/zero funkiness is that there's no real demand for it to work, I guess. Instead of directly using /dev/zero, I used actual zero files.

Conclusion

My advice is to use splice where you can, but keep in mind its drawbacks and lack of generality. If you control the types of fds passed in and the filesystem, then you can really go crazy and experience almost zero-copy file copies.

But if you're writing a general tool in the vein of cat or tee, it's probably best to stay away from splice unless you really handle all of the weird cases.

Searching for Planet X with the Z3 solver

2023-02-26T00:00:00Z

A few weeks ago, I played the board game "The Search for Planet X". The premise is that you have a circular board divided into 12 sectors, each containing one object. That object could be an asteroid, a gas cloud, and so on, but most importantly, it could be Planet X. Which object is in each space is hidden at the start of the game and you're racing your opponents to discover Planet X by scanning sectors and deducing information using a set of rules like "an asteroid is always adjacent to another asteroid". The winner is the first player to correctly guess the location of Planet X and the two adjacent objects.

The full rules can be found here: https://foxtrotgames.com/planetx/.

I don't find these kinds of games very fun, but it did get me thinking: what's the best strategy? How many possible boards are there, and how hard is this game?

This meant I had to write a program to:

Generate all possible boards
Come up with various strategies to pick the best action
See how good those strategies are

The source code is here: https://github.com/kaashif/search-for-planet-x-solver.

Step 1: Generating all possible boards

My first attempt at doing this was to randomly assign locations for comets, then the asteroids, then the gas clouds, and so on, and just run that process for a long while. Not very intelligent.

A better way is to use the Z3 theorem prover, encode the logic rules describing the possible boards, and generate all models satisfying those rules.

This is actually pretty easy, Z3 has a set of very ergonomic Python bindings.

To encode the board state for the standard board with 12 sectors, we can create 12 variables $X_i$, all integers ranging from 1 to 6 (representing the 6 types of object), and apply the constraints from the rulebook. For example, the rule saying that an asteroid must be adjacent to another asteroid can be encoded as:

z3.Implies(
    X[i] == ObjectType.ASTEROID.value,
    z3.Or(
        X[prev_sector(i)] == ObjectType.ASTEROID.value,
        X[next_sector(i)] == ObjectType.ASTEROID.value
    )
)

Once we have all of the constraints, we can generate a model i.e. a set of values for the variables satisfying the constraints.

solver = z3.Solver()
solver.add(*constraints)
model = solver.model()

But how do we get the other models? The trick is to save that model and add a new constraint saying that we only want new models, i.e. at least one variable is different from the model we've just seen. Then we can just generate models until the constraint set is unsatisfiable.

while solver.check() == z3.sat:
    model = solver.model()
    models.append(model)

    # At least one of the variables is different
    solver.add(
        z3.Or([
            x != model[x]
            for x in X
        ])
    )

Using this method, I get 4446 possible boards. There may also be an extra constraint present in any real game that it's always possible to know the location of Planet X. The reason that's sometimes not possible is that Planet X shows up in scans as "empty", and there are two "truly empty" sectors. It's possible to work out if a sector is truly empty most of the time, since gas clouds must be adjacent to a truly empty sector - if there are three empty sectors and only one of them is away from a gas cloud, that's Planet X.

It's unclear to me whether that constraint really exists, so I didn't use it.

Step 2: Picking the best actions

There are a few types of action possible:

You can survey a range of sectors for a particular kind of object and get back the number of objects of that type in your range, this takes less time the wider the search - it costs 3 days for a survey of 4, 5, or 6 sectors.
You can target a sector and know what's there - that takes 4 days.

Those two are easy to implement, since the set of possibilities is limited and well-defined. Target is just bad and never a good action given its time cost, as we'll see later, so I didn't implement it.

You can research a topic, getting a clue in the form of a new logic rule, e.g. "Planet X is next to a comet". I didn't implement this because I don't have the full set of research clues. The best I have is some clues I extracted from a set of PDFs I found on BoardGameGeek, but clues are very particular to the board - the creators don't give clues that narrow things down too much given the board. e.g. you won't get "Planet X is next to a comet" when the comets are next to each other.

I did have a fun time writing a script that uses pdfminer and some regexes to extract the clues from those PDFs, but there were only 50 boards in that set - not a big enough sample to be able to derive how research clues depend on the board.

What's the best action?

Intuitively, you might say something like you want the most information for the time taken. It turns out that has a rigorous definition: the self-information of an event $x$ is $-log_b(p(x))$ where $b$ is any base we choose. If we choose $b=2$, we measure information in Shannons where 1 Shannon is the information content of 1 bit. I'll just say "bit" from now on.

We're trying to guess the board, and we can reasonably say that each possibility for a board has equal probability.

I claim (without proof, and I'm not going to prove it because I'm not completely sure it's true) that the best action to take is the one with the largest expected self-information. The expected information of a random variable $X$ is called its entropy and is denoted by $\mathcal{H}(X)$.

This makes some sense because you'd obviously never take an action with a guaranteed outcome since that tells you nothing. The entropy in that case is zero. On the other end of the spectrum, knowing which way a 50/50 choice went tells you much less than knowing which way a uniform choice among a billion choices went. Intuitively, playing the lottery isn't expected to have a surprising outcome so if there's one universe where you win and a million where you lose, playing the lottery isn't a great way to narrow down which universe you're in.

Now we're in good shape, our algorithm is:

Generate all boards.
Pick the action $A$ which maximises $\mathcal{H}(A)$ and execute it.
Filter down the set of possible boards given the result.
Continue until we know where Planet X is and what its two adjacent objects are.

Step 3: How does it perform?

Without doing any research and just doing surveys, we find Planet X in ~30 days on average. That's after running a few thousand simulations.

It's nice to get some confirmation that some players' intuitions are good: the best action in terms of entropy per day is to make the widest possible search for asteroids. This makes sense since there are four asteroids, there are a wide range of possible answers - you can learn a lot.

Another confirmed intuition is that research is really good. Although I didn't implement research actions in our algorithm, we can calculate the expected information gain from some research clues by hand. Some clues give on the order of 2 or 3 bits of information per day compared to the best survey with an entropy of less than 1 bit per day.

It may be possible to kind of guess which research topics are usually better than others using our small sample of 50 sets of rules. If we knew the full set of rules for all boards, we could:

Know when to pick certain research projects given our current information
Deduce which board we're on from the clues - the clues are part of the board.

(1) is in keeping with the spirit of the game. (2) seems like cheating. In any case, doing an immediate research, a survey, then another research is sure to be a good move. Intuitively it seems like research projects for more numerous objects are better - the asteroid research projects in particular may be a strong starter.

Conclusions

We don't really have enough information (haha) to reliably test the performance of our algorithm against the Planet X web app "AI". That's because we don't have enough information on the strongest move: research.

Given that, there's only one possible next step: write a program to scrape the web app for all research projects, boards, and bot moves.

The bot moves are pre-determined, so given those we know whether we can beat the bot or not on that board. I think it's probably not fair to use advance knowledge of the bot's moves to deduce information about the board - the bot's moves take no time for us and would tell us something, so those moves would be infinitely good. That also wouldn't work against a real player, unless we knew exactly what their algorithm is (an easy countermeasure: randomly pick a slightly suboptimal move).

I'm thinking of using something like https://playwright.dev/, which I've already successfully used to auto-play some online puzzle games.

Why does Mockito need JVM bytecode generation?

2023-01-23T00:00:00Z

Mockito is a pretty popular Java mocking library. It lets you write code like this:

MyClass mockObject = mock(MyClass.class);
when(mockObject.myMethod(1)).thenReturn("one");

Which is pretty cool, even if it's a bit magic. It's not really that magical, conceptually - Mockito simply intercepts method calls and keeps track of which methods have been called globally, and with what arguments. The call to .thenReturn effectively writes to global state, so that the next call to mockObject.myMethod(1) will have the right behaviour.

My question is simple: Mockito uses bytecode generation libraries (cglib or bytebuddy) to construct the proxies - why do we need to go to those lengths? Can't we get by with something more mundane, meaning either in the standard library or higher level (where I consider JVM bytecode to be low level)?

Trying to implement Mockito.mock without anything fancy

The most magic method is mock, of course. That's what takes the class that we want to mock and makes an object which does all of the Mockito magic.

We could read through all of the code here: https://github.com/mockito/mockito/tree/f48d794ad14982a134fd14dd2aef03477b699dc6/src/main/java/org/mockito/internal/creation/bytebuddy and try to understand what it's doing line by line, but that's no fun. It seems like it'd be more fun to try to implement Mockito.mock without all of this bytecode generation voodoo and see where we run into trouble.

We can use https://docs.oracle.com/javase/7/docs/api/java/lang/reflect/Proxy.html to do the heavy lifting.

Let's call our mocker class Mocker, we need three methods: mock, which will take a class and give us a mock object; when, which will take any object and return a Mocker that we can call some expectation-setting methods on; and thenReturn, which allows us to set expectations.

when is really just to make things read a bit nicer, and so we can match the Mockito example.

Mocker might look like:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Proxy;
import java.util.HashMap;
import java.util.Map;

public class Mocker {
    private static Map<CallKey, Object> callKeyToReturnValue = new HashMap<>();
    public static <T> T mock(Class<T> classToMock) {
        final InvocationHandler handler = new MockInvocationHandler(callKeyToReturnValue);

        return (T) Proxy.newProxyInstance(
                classToMock.getClassLoader(),
                new Class[]{ classToMock },
                handler);
    }

    public static Mocker when(Object mockReturnValue) {
        return new Mocker();
    }

    public void thenReturn(Object value) {
        final CallKey callKey = MockInvocationHandler.lastInvocation;
        callKeyToReturnValue.put(callKey, value);
    }
}

Where CallKey is defined as:

import java.util.Arrays;
import java.util.Objects;

public record CallKey(String methodName, Object[] args) {
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        CallKey callKey = (CallKey) o;
        return Objects.equals(methodName, callKey.methodName) && Arrays.equals(args, callKey.args);
    }

    @Override
    public int hashCode() {
        int result = Objects.hash(methodName);
        result = 31 * result + Arrays.hashCode(args);
        return result;
    }
}

(Thank you, IntelliJ, for generating these methods for me!)

CallKey represents a method call as a method name and a list of arguments. This is how we set expectations - we just map CallKeys to the expected values. More complex behaviours would require more work - executing some action on an invocation, for example.

We need to override equals and hashCode because we want two arg arrays with different identity but equal elements to compare as equal.

MockInvocationHandler is where we fetch expected values:

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;
import java.util.Map;

public record MockInvocationHandler(Map<CallKey, Object> callKeyToExpectedValue) implements InvocationHandler {
    static CallKey lastInvocation;

    @Override
    public Object invoke(Object proxy, Method method, Object[] args) {
        final var callKey = new CallKey(method.getName(), args);
        lastInvocation = callKey;
        return callKeyToExpectedValue.get(callKey);
    }
}

When we invoke a method on the mock object, we return the expected value if one exists, otherwise null.

Yes, I know, it would be great to just error out if there's no expectation set. That's what GoogleTest does (in C++), and I wish Mockito did that. Anyway, I went with returning null here because it's simpler. Sue me.

Let's try it out! We can define a simple class to use for our example:

public class MyClass {
    public String myMethod(int i) {
        return String.valueOf(i);
    }
}

And let's write a simple main method:

public class Main {
    public static void main(String[] args) {
        // mock object
        final MyClass mockObject = Mocker.mock(MyClass.class);

        // set an expectation (different from the real implementation!)
        Mocker.when(mockObject.myMethod(1)).thenReturn("one");

        // should print "one"
        System.out.println(mockObject.myMethod(1));

        // should print null - we didn't set any expectation
        System.out.println(mockObject.myMethod(2));
    }
}

And let's run it!

Exception in thread "main" java.lang.IllegalArgumentException: MyClass is not an interface
    at java.base/java.lang.reflect.Proxy$ProxyBuilder.validateProxyInterfaces(Proxy.java:706)
    at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:648)
    at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:656)
    at java.base/java.lang.reflect.Proxy.lambda$getProxyConstructor$0(Proxy.java:429)
    at java.base/jdk.internal.loader.AbstractClassLoaderValue$Memoizer.get(AbstractClassLoaderValue.java:329)
    at java.base/jdk.internal.loader.AbstractClassLoaderValue.computeIfAbsent(AbstractClassLoaderValue.java:205)
    at java.base/java.lang.reflect.Proxy.getProxyConstructor(Proxy.java:427)
    at java.base/java.lang.reflect.Proxy.newProxyInstance(Proxy.java:1037)
    at Mocker.mock(Mocker.java:11)
    at Main.main(Main.java:10)

And there's the problem - java.lang.reflect.Proxy only works on interfaces, not classes. This isn't so bad, we can still test out our implementation by making MyClass an interface, then we get the expected output:

one
null

This sucks, we had to edit our code to be mockable. You don't have to do that with Mockito.

How do you mock a class, not an interface?

In our example, if we wanted to avoid Mockito, we'd have to make an interface MyInterface which is implemented by MyClass, and mock MyInterface - annoying! We really just want to have MyClass and mock it. The magic of Mockito is that you don't have to add all these extra interfaces to your code just for the purposes of testing.

Mockito does this by effectively re-implementing java.lang.reflect.Proxy, but making it work for classes as well interfaces.

If you skim over https://github.com/mockito/mockito/blob/f48d794ad14982a134fd14dd2aef03477b699dc6/src/main/java/org/mockito/internal/creation/bytebuddy/SubclassBytecodeGenerator.java#L126 you effectively see what you'd do if you were writing a mock class by hand, written as bytecode generating code using ByteBuddy.

Conclusions

Mockito needs ByteBuddy because Java lacks macros, meaning we need to resort to insane bytecode generation hacks just to generate proxies for classes. Maybe that's not the conclusion you were expecting, but them's the facts.

Clojure doesn't need this nonsense because it has macros and can generate code in a sane way.

Rust also has macros good enough for a great mocking library: https://docs.rs/mockall/latest/mockall/ and what would the equivalent of JVM bytecode generation be in a compile-to-native language, anyway? My first thought would be assembly, but that doesn't seem like it makes any sense.

In C++, there really isn't any mocking library as good as Mockito. This is because there are certain things a library can never do. gmock will never let you mock a class with non-virtual methods and use that polymorphically in place of real instances - it's just fundamentally impossible. The gmock cookbook suggests templatizing your code, which is terrible: https://github.com/google/googletest/blob/main/docs/gmock_cook_book.md#mocking-non-virtual-methods-mockingnonvirtualmethods.

Anyway, use Mockito, it's great! Ignore the bytecode wizard behind the curtain, he's on your side!

Java doesn't really get immutability

2022-10-23T00:00:00Z

This is a post from the perspective of a new Java programmer, so it is 100% likely that the concerns here are well-known and already addressed. Or at least discussed.

Java, as a language, doesn't get (understand) immutability and "delivers" it in a way that grants almost none of the benefits of immutability in other languages, like C++ or Rust. I picked those examples to show that the lesson was learnt a long time ago (C++) and the lesson is still valid and a good idea (Rust).

Java can, in some sense, be forgiven of its crimes because it's a pretty old language and is stuck with backwards compatibility. But that doesn't mean it doesn't commit those crimes.

The primary benefit of immutability is that the programmer knows that value cannot be changed, so they no longer need to think about what would happen if it did.

Java doesn't give you that and worse, it pretends that it does. Let's look at some examples of Java lies and deception.

final

final is basically, from my perspective, useless. It protects against reassignment, which isn't even nearly the most common type of mutability.

final List<Integer> myList = new ArrayList<>();
myList.add(1);
myList.add(2);

I mean, no-one even pretends to think final is supposed to stop this, but this kind of mutation is 99% of all mutation, and final doesn't stop it, so what's the point of final?

There's also a rule that lambdas can't capture variables that aren't final or effectively final because of possible race conditions. That's fine, but you can capture a mutable object just fine, so that restriction seems to be completely pointless.

An example continuing on from before. This isn't fine because i changes:

int i = 1;
Supplier<Integer> badCapture = () -> myList.get(i);
i++;

Giving the error:

error: local variables referenced from a lambda expression must be final or effectively final

But this, despite being exactly the same thing, is fine:

class MyInteger {
    int i = 1;

    public void incr() {
        i++;
    }
}

var i = new MyInteger();
Supplier<Integer> badCapture = () -> myList.get(i.i);
i.incr();

Now tell me, does that make any sense? If the rationale of banning captures of non-final variables is that they might change, what's the rationale of allowing captures of other variables that change in slightly different ways?

Whatever it is (and I'm sure there is one, somewhere), it adds to the list of reasons to avoid Java. Either the language is so hobbled that true safety measures can't be implemented, or everyone thinks this is fine.

Immutable data structures

But (I hear you say) in my example I used an ArrayList, which is mutable! Why not use an immutable list instead? Sure, let's give that a try:

final List<Integer> myList = List.of();
myList.add(1);
myList.add(2);

This compiles, but when I run it:

Exception in thread "main" java.lang.UnsupportedOperationException
    at java.base/java.util.ImmutableCollections.uoe(ImmutableCollections.java:142)
    at java.base/java.util.ImmutableCollections$AbstractImmutableCollection.add(ImmutableCollections.java:147)
    at Main.main(example.java:9)

Oh, but I just used it wrong, right? NO! Java is wrong! This dreadful "implementation" of immutable data structures violates the basic tenets of programming. You can no longer treat a List as a List, because secretly it might be immutable and fail at runtime!

This means if you're calling a library function:

public int doACalculation(List<Integer> values);

You can't tell from the type signature whether it's safe to pass an immutable list in, despite the fact that would implement the interface List<Integer>. You haven't been saved by Java's immutable lists, they've just created a whole different problem.

The name for the principle Java egregiously violates here is the Liskov substitution principle: https://en.wikipedia.org/wiki/Liskov_substitution_principle:

It is based on the concept of "substitutability" - a principle in object-oriented programming stating that an object (such as a class) may be replaced by a sub-object (such as a class that extends the first class) without breaking the program.

Java breaks this because you can't replace a List<T> with any other List<T> in all situations. If you mutate, you can only take mutable lists. If you don't, you can take any list.

This is ridiculous because mutable operations are an obvious superset of immutable operations. Immutable and mutable lists can both be read, but only mutable ones can be written.

Mutable lists should be an extension of immutable lists!

Kotlin kind of tried to fix this by making immutable collections the default (with an interface actually not supporting writing), and mutable ones an extension of that, but the pain is still there.

Kotlin is still missing decent language level immutability controls (like C++'s const), probably due to some bullshit (but entirely reasonable) concern about Java interop.

Conclusion

Please do not put up with the incredible farce that is Java immutablity. If at all possible use some language that made a serious attempt to fix some of Java's problems, like Kotlin or (if at all possible, again) something descended from languages that already solved this problem, e.g. Rust. It's more likely that you can move to Kotlin if you're using Java, so do that. It's really easy.

I can't say with any authority if Kotlin is really the best we can do to fix the mistakes of Java, but it tries.

This isn't a very optimistic ending.

Don't hide things from people reading your code

2022-10-16T00:00:00Z

People write code that relies on all sorts of implicit or obfuscated knowledge. In the worst case, people write code that requires any caller to read through the entire source to work out how to use it or what it does.

What confuses me is that people often seem to do this intentionally, it's like they want to require omniscient knowledge of the codebase for anyone wanting to call or write tests for their code.

I can hear you saying that's ridiculous, and telling me to ask literally anyone whether they think they have the entire codebase in their heads: they'll definitely say no.

Everyone will say fitting a huge codebase into their mental working memory is impossible, but actions speak louder than words. Many people (I see it all the time) constantly choose programming patterns and idioms that only make sense if you think everyone coming after you will have read, digested, and memorised all of the code. There are a few really important ones:

Using nullability
Using mutable state
Using global (or static) variables

If you ever have to familiarise yourself with a codebase, or ever misremember anything, you should try to encourage authors to avoid these as much as possible. But people don't! They love to make things hard for reviewers and future generations of code readers.

In the same breath as complaining about something a coworker has written, people will go on to make the same faulty assumptions and obfuscate their code, perhaps in a slightly different but materially equivalent way.

Let's look at what the problems are. How to convince your coworkers to stop is left as an exercise for the reader.

Nullability

This is a Java-focused blog post, but this equally well applies to any language where things can be null - C++ has null pointers and std::optional, Python has None that people like to pass around. Even Rust often has people passing around None std::options and going to great pains to handle that as unsafely as possible.

It's well known that null was itself a mistake in Java, and this is one of the reasons I agree with that. Here's some code that a programmer trying to confuse callers might write:

public Pet getPet() {
    if (!hasPet()) {
        // Means there's no pet
        return null;
    }

    if (hasDog()) {
        return getDog();
    }

    if (hasCat()) {
        return getCat();
    }

    // Means the pet type is unsupported by this function
    return null;
}

That seems reasonable. It even has comments. But the intention of this code is only encoded in comments, not in the null values, which mean different things but are the same to any caller. Suppose later the class is extended:

public void adoptPet(Snake snake);

And a caller writes code expecting their snake pet to come back from getPet:

petOwner.adoptPet(new Snake());
// ...
Pet pet = petOwner.getPet();

Obviously pet is null, but why? The original author of the getPet method assumed several things:

Future contributors would know what null meant and know where to look for the true meaning of null.
No-one would ever forget to update getPet to work for more than cats and dogs.

Solving (2) would be a different blog post entirely (sum types and enforced exhaustive checking), but (1) has made diagnosing the problem unnecessarily difficult - why is the value null? Comments don't appear in compiled Java classes, if this quirk isn't in the PetOwner documentation, it may be impossible to diagnose.

The solution is to avoid implicitly assigning meaning to null. You won't even remember what you meant in 2 years when getPet gives you back null. If there's an error, please please please don't just return null, throw an informative exception. And no - logging and returning null is not as good as an exception.

This applies in exactly the same way to Optional.empty(). Just because you're being explicit that you can return a meaningless empty Optional doesn't mean that's actually much better for readability than returning null.

Generic nullability (e.g. null reference or Optional without any context) is never good in my opinion. Here are some alternatives:

In Rust, prefer Result (which can include a message) over Option
In Java, prefer exceptions over null and Optional
In Haskell, prefer Either (which can include a message) over Maybe
In Python, prefer exceptions over None

I hate generic nulls with a passion. If you see a null, it always means context has been thrown away! Don't assume that future programmers will remember what your null means or even be able to read your code to work it out.

A common theme in this post is that requiring callers to read through your source code to be able to use your code is bad - they won't do it. The caller should be violently and non-optionally confronted with relevant information through type signatures or exceptions.

Mutability

Java doesn't have support for immutability at the language level. This was in my opinion a huge mistake - C++ had const when Java was being designed! Even worse, some Java programmers believe that final is a substitute for immutability. It's not.

If an object is immutable, you only have to look at where the constructor was called to know the state of the object.

If an object is mutable, you have to read all code referencing that object, plus have full knowledge of the order that code is going to be called in. That requires full knowledge of the codebase and gets impossible to manage after a certain point.

Here's the interface of an object that relies on mutability:

public class Executor {
    public void setParameter(String name, String value);
    public void execute();
}

The caller will set some parameters, then execute. Fine, how hard could that be? Let's suppose this is the caller:

public class FileManager {
    final Executor executor = new Executor();

    public void sendFile(File file) {
        executor.setParameter("filename", file.name());
        executor.execute();
    }

    public void deleteFile(File file) {
        executor.setParameter("deletion", "true");
        executor.execute(); 
    }
}

There's an obvious problem here - when we call deleteFile, we don't set the filename. Is that fine? Maybe, if we always call sendFile for a given file then we delete it right after.

The problem with the Executor is that it holds state, and that state changes. When you're reading the code and you want to know what executor is, or what will happen when you call executor.execute(), you don't just have to read that line of code, you have to read all lines of code that could possibly be executed before it.

(Note: final doesn't save us here and does almost nothing to help - Java has no language level immutability enforcement)

The solution here is to change the interface of Executor to avoid requiring mutation:

public class Executor {
    public void execute(Map<String, String> parameters);
}

Yes, you now have to provide a parameter map every time you call execute. But this means that the outcome of execute should only depend on one line of code: the line where you call execute.

This isn't exactly true because Java will still allow Executor to mutate itself, and there's no way to stop that. In C++, you can stop that by making execute const:

class Executor {
public:
    void execute(const std::map<std::string, std::string>&) const;
};

const gives us the guarantee that execute won't mutate its argument or itself. That means we can free ourselves from needing to hold the entire codebase in our minds just to work out what this one line does.

Rust has immutability by default, so this problem is solved by default there. You can unsolve it by making things mutable if you want, but that's discouraged.

Although Java doesn't allow you to make those kinds of guarantees at the language level, teams who enforce immutability (except where needed e.g. for high performance) will have an easier time understanding isolated lines of code.

In particular, this is a godsend for pull request reviews - you no longer need to build up a mental model of the temporal ordering of everything that happens before every line of code. You can skip around and know no state is changing in the lines you skip over, which is helpful since looking at diffs inherently involves skipping over many lines.

Global variables

Singleton instances. Global variables. Static variables. These are all faces of the same demon, which is already well known to be evil. Testing is made particularly hard when the global variables and implicit dependencies are widespread, and that's the angle I'm coming at this from.

Suppose we have a database connection we want to share throughout our application. You can get it through a static method:

public class DatabaseConnection {
    public static DatabaseConnection getInstance();
}

And we have another class that already exists and has a lot of code:

public class Server {
    public Server(OtherServiceClient client) {
        // ...
    }

    public void serveRequest() {
        // ...
    }

    public void cleanUp() {
        // ...
    }
}

Now let's suppose we write some unit tests:

public class ServerTest {
    @Test
    void testServeRequest() {
        var server = new Server(mock(OtherServiceClient.class));

        // ...
    }
}

Has anything gone wrong here? Is this unit test isolated? We don't know unless we read the code under test. There may or may not be a hidden, undeclared dependency of Server on the database, and we'd need to mock that for our unit tests.

In the worst case, a test author might write a unit test, happen to run it in the right environment, and think they've isolated their test. In reality, they're writing and cleaning out a real database somewhere.

I thought by this point everyone knew that sharing global state was bad, but in Java land, somehow "Singleton" isn't a four-letter word like it should be.

If mutability is banned (which outlaws "setter injection" (which is utterly disgusting)) and global variables are banned (which outlaws implicit dependencies on shared state), the only route left is passing dependencies in as part of the constructor. i.e.

public class Server {
    public Server(OtherServiceClient client, DatabaseConnection connection) {
        // ...
    }
    // ...
}

There's no reason there can't still be a single DatabaseConnection, the only change here is that we're explicitly declaring a dependency on DatabaseConnection, and now there's no way a test author can accidentally forget to mock a database, they must explicitly provide a database.

Conclusion

I consider these truths self-evident and will broach no criticism. Mutability hurts readability. Nulls hurt readability and usability. Globals hurt readability and testability. The most effective way to avoid these is to use a language where those problems have been solved - use languages that enforce immutability, allow a wide range of meaningful non-null return values (exceptions are fine, I guess), and never use global state!

Just say no to mutable state! It confuses you (even if you don't think it does) and definitely confuses your code reviewers.

Why object to the death of Venice?

2022-08-23T00:00:00Z

While in Venice, I picked up a book, If Venice Dies by Salvatore Settis. The main thrust of the book is that tourism is bad, Venice has died or is on the cusp of death, and changes need to be made to remedy the situation.

This is the kind of book that only works if you already agree with its premise, and read it as outrage porn rather than as a well-motivated, well-explained argument. That might seem a bit uncharitable, but the author makes a lot of claims that are only backed up by vibes and impassioned rhetorical questions like "Wouldn't that be a tragedy?" rather than any kind of reasoning.

There are two ways to look at each of the claims:

The author wants to enforce their vision of what Venice should be, and came up with various justifications post hoc to make it seem like it's in everyone's interests.
The author actually believes their own arguments, which largely amount to fluff, vibes, and vaguely authoritarian policy prescriptions.

We can go through the book and see if (1) the sinister explanation or (2) the naive explanation fits better.

There's of course the third explanation: that the author doesn't believe in any of it and just wrote the book for a quick buck, which we'll ignore. Settis is a lifelong archaelogist and art historian, so it's at least credible that he does believe in the conclusions of the book.

Anyway, let's look at some of the unmotivated fluff. I don't expect this blog post to be particularly entertaining or anything more than a rant.

Sometimes cities die

The first chapter is about the death of Athens, how it devolved into a barely populated village and its history was lost:

Yet nothing could be further from the truth: when Michael Choniates, who hailed from Constantinople, was appointed archbishop of Athens in the late twelfth century, he was astonished by the ignorance of the Athenians, who were unaware of their city's former glories, and weren't able to tell foreign visitors about their still intact temples, nor could they point out the places where Socrates, Plato, and Aristotle had preached their doctrines.

[...]

When Athens was conquered by the Ottoman Turks in 1456-and the Parthenon-Church was turned into a mosque-the city even lost its name. What remained was a wretched village with a few huts scattered among the ruins, while the local population, which had been reduced to a few thousand, had started to call the city Satines-or Sethines-a bastardization that Rome was never subjected to.

This stuff is kind of sad. But it's hard to see what the problem is or what the solution is - are there things missing from the Italian school curriculum that the author wishes to add? Should schools in Venice teach more Venetian history? These questions aren't asked or answered, so it's hard to understand the solution the author is proposing.

I think the point of the chapter is that the author thinks some cities look nice and he wants them to stay around after he dies:

No: on the contrary, we the living should nurture beauty on a daily basis if we want some of it to survive, so that we may enjoy it and ensure its survival after our death.

He uses the word "we" instead of "I" for some reason, presumably as a device to get us to agree with him.

There's no real argument or policy proposal made in this chapter, it appears to just be here to set the tone - cities die. I don't think Venice is at risk of dying so I consider this chapter to be irrelevant at best and an attempt at emotional manipulation at worst.

Population decline and tourism are bad

The tiny island of Venice used to have a population much larger than it does right now, peaking at maybe 150,000 a few hundred years ago, declining to 50,000 today. Why is this a problem? The author doesn't explain.

Although the extant demographic data is less reliable, the plague of 1348 proved to be equally devastating, after which it's estimated that the population dropped from about 120,000 to 58,000: just a little over today's figures. Yet starting in the 1970s, a new kind of plague broke out in Venice.

Settis apparently thinks population decline due to people moving away or not being born is in some way comparable to people literally dying of a plague. That wouldn't be such a problem if there were some other argument about why population decline is bad, but there isn't! The analogy of plague is used again and again without any sense of proportion.

Even worse, apparently tourism has "devastated" Venice like a "bomb" and has "annihilated" it. Someone should tell the tourists that the city has been destroyed and there's nothing to look at.

Why is it so bad that the population of Venice has migrated to the mainland and prefers to have more space, perhaps a bigger house and a garden, and so on? Settis also thinks that a tourism monoculture is harmful but doesn't explain why:

A tourist monoculture now dominates a city which banishes its native citizens and shackles the survival of those who remain to their willingness to serve.

This looks like it's about how Settis views it as beneath him to serve others and assumes we share his opinion. Unless he's a subsistence farmer, Settis is "shackled" to serving others too, which he doesn't seem to realise. There's also a diatribe about how we must return Venice's people to Venice when evidently they don't even want to live there.

He wants Venice's population to increase because he thinks it'd be cool, I guess.

By the way, a tourist monoculture might be bad (see: the pandemic) but it's by far the most productive industry Venice can offer. Trying to shoehorn anything else into Venice would reduce its prosperity greatly.

Big cities are bad

The author decries megalopolises and talks about how every city seems destined to become some kind of megacity like Tokyo or Shanghai.

I did enjoy how Settis mentioned Trantor from Asimov's Foundation series, it's one of my favourites. It makes sense that Settis would have read it, considering it's about the fall of a civilisation and the abandonment of a city.

A perverse continuity has established itself between the megalopolises and the shanty towns. Thus, was Isaac Asimov's ecumenopolis-the city-planet of Trantor featured in the science fiction Foundation series which numbered 40 billion inhabitants-a nightmare or a prophecy?

I don't know, I think it's neither, maybe it's a dream. This chapter has one of the worse examples of the repeated rhetorical question form of "argument":

Yet do we really want to think of this phenomenon as inevitable and assume it will conquer the world, supplanting all other urban forms? Or would it instead be worthwhile to keep other alternatives in mind when we think of the city of the future, analyzing its characteristics and effects on history and the notion of livability in our present times? Do we wish to nurture or destroy the multiplicity and diversity of urban forms?

Reading this is hilarious if you come up with different answers to the author. I consider large cities very livable, I'm even moving to New York.

There's another whole chapter about how skyscrapers are bad because the author thinks they're ugly and stupid. I agree that the government subsidising skyscraper construction as a vanity project is stupid, but that's not at all the angle Settis takes.

The acropolis of skyscrapers dominates the historic city from its heights, having situated itself there in a commanding position that calculatedly relegates the historic city to the sidelines.

He feels that the historic city, which hasn't been destroyed, is being humiliated in some way, and that's enough for him to ban them. Great. There can't be any supporting evidence for this since it's inherently subjective.

There's this bit which may be some kind of projection of Settis's own insecurities onto Venice:

But could Venice eventually be surrounded by a ring of skyscrapers, as envisioned in Aqualta, or will its historic center be dwarfed by a single high-rise, turning the former into an old dwarf who gets stared down by a young, muscled giant?

Is Settis really the old dwarf? What a bizarre analogy. The single high-rise he's talking about here is a tower on the mainland, in a derelict industrial zone, not the island itself. You'd barely be able to see it from the island.

Money matters, or does it?

This is really where the book starts to take a turn. Previously Settis limited himself to talking about how the city doesn't conform to his desires, and how he finds tall buildings ugly (except if old), but now he starts making economic statements.

We should counter the approximations of self-styled appraisers with the seriously pondered reflections of others on the true value of cultural heritage. One need only cross the Alps and head into France. A report entitled The Economy of the Immaterial: The Growth of Tomorrow, which was drawn up by Maurice Levy and Jean-Pierre Jouyet, reflects on immaterial values (meaning priceless ones) as the basis of all future growth.

[...]

The report was commissioned by the French Ministry of the Economy in 2006 under the presidency of Jacques Chirac and concluded that immaterial values are "concealing a huge potential for growth, which can stimulate the French economy by generating hundreds of thousands of jobs, while simultaneously preserving others that would otherwise be put at risk."

So immaterial values are good because they can stimulate the economy and create jobs. Isn't that exactly what's going on right now? The immaterial value of Venice is creating jobs and income! But Settis condemns that, he only approves of certain kinds of jobs and income, I guess.

Or should we just ignore that report? Literally the next paragraph says:

Yet Venice is now threatened by what John Maynard Keynes once termed the "parody of an accountant's nightmare," in other words the abject, prejudiced view that everything should have a price tag, or better yet, that money is the only thing that matters:

This is pretty much incoherent. Immutable values will produce lots of money, except money doesn't matter. Sure, but I don't see what we should take away from that.

Policy

It goes on and on, there is a ton of fluff about how Venice has been humiliated and degraded by having copies of it made, and how theme parks have gondolas in them, but the real meat of the book comes when Settis makes some policy recommendations.

No development for tourism

Settis says he wants what's best for Venice's destiny, but that seems to be code for conforming to what he deems acceptable, and he doesn't want tourism:

Venice must know how to creatively construct its own destiny, tailoring each change it makes according to the best possible future for its citizens, and not what the tourists or real estate agencies want.

This implies that tourism isn't best for Venice, but never backs that up.

Ban development completely, actually

Settis supports plans to ban development in a belt around cities' historic limits due to some idea of the importance of the meeting of city and countryside:

It's time to "limit the endless expansion of suburban sprawl by returning cities to their margins," as Zanardi has written, while at the same time "soldering the historic center to its periphery" and reestablishing the connection between the city and its citizens.

Luckily this proposal can't apply to Venice since it has been naturally limited by the lagoon.

Enshrine a right to the city

He supports a right to the city, similar to the one Brazil recognised, which:

guarantee[s] the right to sustainable cities, understood as the right to urban land, housing, environmental sanitation, urban infrastructure, transportation and public services, to work and leisure for current and future generations; democratic administration by means of participation of the population and of the representative associations of the various segments of the community in the formulation, execution and monitoring of urban development projects, plans and programs.

This is incredibly vague but is presented as Settis as a panacea. It's nothing but a tool to block any and all development. Settis claims he doesn't want to block all progress, but only advocates laws and "rights" that'll enable all progress to be blocked.

The wishy-washiness of this part of the book is exemplified:

Even in Venice, the health of democracy is determined by how successful citizens prove in defending their rights, which include the common good and the social functions of property.

The social functions of property! What does that actually mean? Who gets to decide what the social functions of property are? Its owners or others? People voting to use the coercive power of the state to restrict what property owners can do is not moral unless the owner's plans will materially harm them. Having to look at a building on the horizon is not harmful in the way that a polluting factory is.

Tourism is bad

Yes, this same proposal to have the government distort things away from tourism crops up again:

In Venice's case, the work available to local residents can't be restricted to the tourist monoculture, but has to be worthy of the immense civic capital the city has accumulated over centuries. The social function of property, regardless of its ownership, can't be solely determined by increasing real estate value while decimating the local population and condemning the city to die. It must nurture creative and productive enterprises, repopulate the city with young people, and loosen the tourist monoculture's stranglehold.

The work isn't restricted to the tourist monoculture. Anyone is free to do whatever they want. But if they don't generate enough income to pay the bills, they'll have to leave.

The only solution that will eliminate the tourist monoculture is for the government to subsidise other industries and displace tourist-oriented industries. Why is that necessary? Because tourism is not worthy of Venice! Catering to tourists is unworthy of Venetians! This is enough to justify taking money from productive Italians and using it to subsidise Venetians to do something other than serve tourists.

An architectural code of ethics

Settis wants architects to have to take a Vitruvian Oath, like the Hippocratic Oath, wherein architects swear not to design ugly buildings. Yes, I'm serious.

We could easily adopt every single one of the professional requirements that Vitruvius lists in his book and compile them into a "Vitruvian Oath," turning it into the perfect equivalent of the Hippocratic Oath.

[...]

If those who build in Venice knew how to marry practice and theory, no architectural design would so flagrantly ignore the physical conditions and construction practices unique to that city.

The various elements of the proposed oath are extremely vague and impossible to criticise, except in how vague they are.

We should encourage people to live in Venice

Right at the end of the book, we get a very confused chapter which describes immense corruption in the MOSE project to protect Venice from flooding, then proceeds to advocate a range of subsidies and tax breaks the government should deploy to revive Venice:

In Venice's case, this new pact will have to begin from a strong sense of commitment to spur politicians and public institutions to adopt a more creative outlook toward the city, to bring the historic city back to life and gear it toward the future, the means to create a new kind of politics to stem the perverse logic causing the exodus of citizens, and to encourage the young to remain via strong incentives such as tax breaks. It would also mean curbing the rampant proliferation of second homes and the transformation of buildings into nothing more than hotels. It would mean encouraging manufacturing and private enterprise as well as generating opportunities for a wider range of creative jobs. It would mean reunifying the historic city, lagoon, and mainland by differentiating their functions, making more agricultural land available and investing in new fisheries, reutilizing old, vacant buildings, incentivizing research, launching new professional training schemes and apprenticeships and investing in universities, chiefly by making it affordable for students to actually live in the city. It would mean developing new models, analyzing situations, evaluating options, and emphasizing initiatives of a higher caliber (like the universities and the Biennale) and not just enslaving the city to "uncontrollable market forces." It would mean enshrining the right to the city and the common good as our first priority.

Why is manufacturing specifically singled out as a "good" kind of enterprise? Why are second homes bad? Why are hotels bad? Why are creative jobs good? Who is going to re-utilize the abandoned buildings, when they have been abandoned due to being useless? Why is it inherently good for students to live in Venice?

This list of proposals is unmotivated and unexplained.

Conclusion

Settis wants the government to ban stuff he doesn't like (tourism, tall buildings, subways) and subsidise stuff he does like (creatives, people living in Venice). He finds tourism degrading and would prefer Venice be a military (yes, he explicitly uses the word "military") and economic powerhouse like it was a thousand years ago.

As usual with these kinds of people, they get a raft of academics from their bubble to rant and rave about how great their book is, ignoring that it only works if you already agree with the conclusions.

The only motivations given are hyperbolic, borderline insane analogies about how skyscrapers are muscled young men overlooking dwarves, and how someone moving out of Venice is equivalent to being killed by a plague.

Truly crazy stuff. I guess I bought the book, so the author wins in the end.

We sent the worst YCombinator application possible

2022-08-02T00:00:00Z

There are many mistakes we made when halfheartedly trying to get funding for our startup. The worst was that we didn't actually have a business at all - we had no users yet due to not having regulatory approval for our financial product, and no practical plan to get that other than sending applications and hoping. The second worst was that our funding applications were really bad in many ways.

Not having a business is obviously a bigger problem than anything else, but being unable to convince investors is a big deal too, especially if you're really bad at it.

Here's a list of mistakes we made in our YCombinator application.

Preamble: How to avoid these mistakes

I watched this video: https://www.ycombinator.com/library/6t-how-to-apply-and-succeed-at-y-combinator but then we made all the mistakes and did the opposite of that advice.

I don't know what happened in my cofounder's case, but I think my excuse is that I was too focused on writing software rather than building a business. It should be as easy as building a real business, then laying out the facts so that YC can understand what you're doing.

I'll try to avoid much more commentary on how to succeed, since I didn't. Instead, let's focus on how to fail, which I do know something about.

Poor spelling and grammar

This should go without saying, but apparently not. My co-founder, a non-native speaker, wrote the application and gave it to me to proofread. I'm a native English speaker, but I was writing code on a 36 hour sleep cycle revolving around exhaustion, so I didn't spot many of the mistakes.

I spotted some of the mistakes, but this just led to unproductive arguments. For example: we had "we've done X in USA". It should be "the USA" or, more naturally, "the US". We talked about this mistake, but I hit my cofounder with the old "trust me I'm right" and brushed aside any request for explanation.

Our final application still had the mistake.

I think there's something deeper going on here - if there isn't a deep level of trust between cofounders, you can't get much done. Even if we both had the same level of English proficiency, something else would've gone wrong down the road and produced a suboptimal outcome.

This is why firms exist in the first place, it's easier for incentives to be aligned and for people to trust each other within a firm. A firm without trust may as well not exist.

Stretching the definition of user

An example: how many active users did we have? None, since we didn't have regulatory approval yet. But we put down 114 users, since hundreds of people had registered their interest and thus..."used" our website, right? I mean that's obvious bullshittery, don't do that.

The worst part is that we started to treat "interest" as a key performance indicator (KPI). We got excited and thought we were accomplishing something when thousands of people came to our site and signed up.

We hadn't done anything! We shouldn't have believed our own lies about user numbers. There's only one kind of user that matters, and that's one that's giving you money, because revenue and profit are the only real KPIs.

Embellishing progress

We had contacted several regulators about approval of our product. We had something that seemed to be workable. That was good. But we hadn't received approval from anyone yet. Indeed, we hadn't received replies from any regulators yet at the time of sending the application.

We said we were in "talks" with regulators, and wrote a long list, including regulators we hadn't contacted yet but we "surely" would. We of course didn't indicate that we hadn't contacted all regulators in our list.

We also claimed to have a minimal viable product (MVP), and that was true from a technical perspective. But from a regulatory perspective (the important one), it was unclear that our methods of sending money would be allowed. So it wasn't viable at all. We still claimed to have an MVP. This was obviously false, since we didn't have any paying users.

Delusional projections

An obsession with projections many years out is always a bad sign, and we didn't have that. We had to come up with some numbers for the "How much could you make?" question for the YC application, and there were some facts (the size of the market, average fees) but the "possible" market share we picked was completely made up.

We also made up a future growth rate (modest, we thought) and "demonstrated" that we'd be making billions within 5 years.

We knew the projections were unrealistic, but then something funny happened: we started to believe them just a bit. The act of projecting that we'd be billionaires made us think it'd be true, to a tiny extent. It's almost literally like writing "I'll be a billionaire" into a cell in Excel - meaningless in reality, but it may have some psychological effect.

This made us confident (for no reason) in our projected billions despite being pre-revenue. I can't help but think this kind of delusion has to hurt an application, it sure was obvious in how we phrased our predictions.

Complete humour incompatibility

This is important. I can laugh at something with my friends and say "that's fucking hilarious, can you believe that?" My cofounder and I never did that.

This made the "tell us something surprising or amusing" question very difficult.

I made a suggestion - it's literally cheaper to fly to some countries with a suitcase full of cash than it is to send money electronically. And it'll even be faster! And there are flights on weekends, but sometimes no bank transfers! Crazy, right?

We couldn't find a single thing we both found funny, and ended up writing that we'd both started and quit (different) degree programs.

Not being able to agree on something interesting or funny was in itself interesting to me. I wasn't able to run a company with someone I hadn't shared a real laugh with, despite not thinking that was particularly important beforehand.

A good lesson for me, I think.

Conclusion

Being friends with your cofounder is far, far, far more important than literally anything else. Doesn't matter what skills either of you have, whether they complement each other, whether you have the right backgrounds, etc.

The company can't survive if the cofounders don't like and trust each other. That means the founders should've known each other for a while and ideally have built something non-trivial together, so that they can trust in each other's abilities.

I think YC's co-founder matching is very stupid if you have any other choices. It should be a last desperate resort when you've asked everyone you know, and everyone they know too. It probably won't work and in the worst case, you only realise it's not working after months of work and slowly building distrust.

What does it mean for someone to "deserve" success?

2022-07-31T00:00:00Z

I recently read a couple of blog posts about deserving success, and I found them very interesting, mostly because of what they tell me about the people writing them.

I have some thoughts on these posts but reading them back, I think they're nonsense. I'll post them anyway. Here are the posts:

One from someone who hates the word "deserve" because they believe luck plays a much larger part in success than people want to believe: https://moontowermeta.com/my-personal-trigger/
A follow up from the same author on why talking about "deserving" things makes their skin crawl: https://moontowermeta.com/why-deserve-makes-my-skin-crawl/
And a series of blog posts where the author argues they don't deserve their success, full socialism wouldn't work, and that the way to make things better doesn't involve taxing him more: https://russroberts.medium.com/do-i-deserve-what-i-have-part-i-6553091dd85c

The first two are written by an options trader and the last by an economist, both wealthy. Both seem to feel it's obvious that they don't "deserve" their success, but neither seem to actually attempt to define whatever it is they're talking about.

I don't think it's possible to define "deserve" in a way that matches our intuitions but also means those two people don't deserve their success at all.

I'll try to define "deserve" but it'll probably go really badly.

Did you deserve to be born?

Russ Roberts (the economist) seems to take it for granted that you can't deserve something if you had to be lucky to get it:

I work pretty hard at what I do. But do I deserve credit for perseverance or grit? Or are they just another part of my genetic inheritance? So hard to say.

So do I deserve the life I have?

Of course not. I am so lucky.

He also considers it part of his "luck" that he was born to loving parents in a rich country. Putting these two together, you had to be lucky to be born at all, and therefore can't deserve anything. I don't think that's what he actually believes though.

The Moontower guy says something even more extreme:

At a society level, appreciating the role of chance is ultimately about empathy. It's the recognition that you could have hatched from an egg anywhere in the world in any time in history. Our policies should not amplify the extremes of cosmic dice but instead balance them.

Other people were/will be born, but the chance that I "could have" been them is exactly zero for any reasonable definition of "I" and "them". The only way this makes sense is if he believes that before a fetus becomes conscious, there's a random selection process among some set of possible consciousnesses and one "wins".

That may be a useful device for thinking up rules for a society (see the veil of ignorance - the idea that you should construct rules for a society without knowledge of where you'd end up in that society), mainly because doing that ensures that everyone will accept the rules as "fair".

But it's just a thought experiment, there aren't actually (as far as we know) free-floating consciousnesses that end up randomly embedded in babies. It's bizarre to me to state that you could have been another person, because I couldn't have been! What does that actually mean? I don't think he'd be able to even explain that if pressed.

Both blog post authors say things that lead me to believe they can't think anyone deserves anything at all. I think the Moontower guy really does think he believes that, but he may hesitate to apply his principle to the bitter logical end.

People who "obviously" deserve things

I generally have an intuition about what the results should be and try to work out what my principles really are based on that. I'll try to answer some questions using (1) my intuition and (2) the "cosmic dice" idea where no-one deserves anything.

I'll use the answers to (1) to try to work out what I mean by "deserve".

I am constructing a strawman out of (2) and will intentionally make the answers ridiculous.

Do rich people deserve success?

It depends on how they became rich. Stealing? No. The lottery? No. Producing something valuable to others? Yes. Trading stocks? Maybe, if they actually had reasons behind their trades and those reasons were correct.
No, they were lucky to be born as a person with those abilities or in that circumstance. Next.

That's too vague and not distasteful enough on either side to be interesting.

Did the architect of the Holocaust, Reinhard Heydrich, deserve assassination?

Yes, he did something terrible and deserved to die.
No, he was unlucky enough to be born as a person predisposed to genocide, and didn't deserve assassination.

Note that this is different from saying that he shouldn't be assassinated, it's just saying that he doesn't deserve it. I don't know if that makes sense or not, or when I should start talking about "cosmic dice".

Do welfare recipients deserve the money?

Maybe, depending on whether they really need it or not. A single mother who's just been walked out on with kids to feed? Yes. Someone gaming the system and living in subsidised housing when they don't need it? No.
No, because, uh...I'm actually finding it pretty hard to decide what strawman to place here. "They don't deserve welfare" is basically synonymous with "don't give them welfare", and both authors clearly think we should support the poor. But you still have to get lucky enough to live in a country with a welfare state to get welfare, so presumably they don't think recipients deserve it, but that we should still do it.

Do poor people deserve to be poor?

It depends. Lied to a lot of people and got sued for everything? You probably deserve poverty. Murdered people and now you can't get a job? I think that's deserved. Born into poverty and abuse? That seems undeserved.
This one's easy since they explicitly state it - no, poor people don't deserve to be poor. Even ones that are there due to their own actions are really there due to the initial conditions of their lives, which were out of their control.

Cheaters don't deserve to win

People in the real world think others deserve things more or less depending on how much their actions affected the results, and crucially, whether they played by the rules.

Cheats never deserve anything. Even if they spent a lot of time and effort honing their cheating skills.

Here are some examples:

People who cheat at board games don't deserve to win. Opinions about this may vary depending on which rules are the rules we're going by. The cheater has broken the game rules, but not the law. But their win was entirely due to their actions, and there might've been little or no luck involved. They still cheated.
Bank robbers don't deserve the money they steal. This is despite going to all that trouble to plan and execute a heist, which is a lot of work. If they don't "get lucky" and just planned really well, don't they deserve the money? Well no, because they're breaking a lot of rules.

That helps us find more people who don't deserve things, but that doesn't separate me from a cosmic dice believer who thinks no-one deserves anything.

Getting some desert

Deserving things is a spectrum. The two authors claim they don't deserve their success, but that's not a useful way to think about the world and doesn't match anyone's intuition. I don't even think it matches their intuitions.

I think a lot of people (maybe the authors included) would agree with the following statements:

People who earn income by working deserve it more than those who inherited wealth.
The guilty deserve punishment more than the innocent.

This is all possible without making the absolute statement "murderers deserve punishment" or "successful traders deserve success", which the authors must disagree with.

Desert is a half open interval ranging from zero to 100%. We can all think of people who don't deserve things at all and people who deserve things more than others, but so much is an accident of birth it's impossible to think of anything that is absolutely 100% deserved.

Summary

I don't know if any of that made sense, but I'll still go on saying that I deserve X or that others deserve Y, as long as they played by the rules and didn't just gamble their way to success.

That's a definition of what it means to deserve, not a value judgement, by the way.

I'm moving to New York, here's why

2022-06-05T00:00:00Z

No-one reads this blog, so this is a safe place to make a public announcement I don't really want anyone to see.

I'll be moving to New York in a few months when my visa goes through.

There are many reasons I'm moving to New York, the main ones are:

More money
Looking to run away from my startup failure
And the main one, I'm looking for something cool to do because I'm bored.

Read on for some elaboration.

Here's the elaboration: \$\$\$\$\$\$\$\$\$\$\$\$. I think that about sums it up.

But seriously, here are some reasons I want to get out of the UK and into the US.

More money

Software engineers are paid significantly more in the US than in the UK, especially in hubs like San Francisco and New York. It's not even close. The best career move any software engineer can make is moving to the US.

Salaries in Europe are generally disgraceful. My salary isn't bad, but there is no reason I should be leaving hundreds of thousands of dollars on the table to stay in the UK. The percentage raise from switching jobs would be significantly less than the raise from just moving to the US within the same company.

Maybe if I had a family I'd stay here, but I don't. I think that's the only case where millions or billions of dollars in lifetime earnings is a price people would actually pay.

Running away from my problems

I tried to be a solo startup founder for a bit, all my ideas were ultimately stupid and I went about evaluating them wrong way (see earlier blog posts on this).

I then tried to co-found a payments startup with someone I communicated with mainly online, but that didn't work out since we just weren't close enough to really trust each other or be able to support each other. That meant as things got worse, neither of us really had anyone to really open up to. Or, at least, I didn't.

After that didn't work out, I went back to my old job with the express purpose of moving to the US.

The problem I'm trying to fix is that it feels like I took a huge leap into the unknown (founding a startup), cracked my skull against a wall I should've seen (bad solo strategy, wrong co-founder), and was about to fall back into the same old job doing the same old stuff, living in the same old place.

The only way I can convince myself my life is going in an interesting direction (i.e. is worth living) is to do something drastic.

I'm bored

Is this really what life is? Just going around doing the same types of things until you die? The least I could do is change it up, a quarter of a century in more or less one place is enough.

I know people who are gearing up to buy a house and settle down. I could buy a house, but that feels more like tying a noose around my neck - tying up all of my capital in a single asset I don't even really need or want.

Once you have a house, that's it, you're tied down. The asset isn't particularly liquid, so there's a barrier to escape.

Settling down sounds like something a corpse with lead weights tied to it does at the bottom of a lake. People with partners and families see things differently, no doubt, but I don't have those.

I'm a practising neoliberal

It boils down to this. My religion, my ideology is liberalism. I truly believe that immigration and the price system allow workers to be allocated where they will be the most productive, and that workers actually do follow these signals.

In order to practise what I preach, I have to look at what the labor market is telling me and act accordingly.

Other options I considered

I considered going to work for CERN, who I did interview with several times while working at my job the first time around. Ultimately I gave up on that once I ran the numbers and saw the effect on my lifetime earnings. CERN pays well, but not as well as some companies in the US, or even Switzerland.

I considered trying another startup, but unfortunately that felt like more of the same. Founding a startup doesn't feel great a lot of the time, and having the right co-founder for mutual support (and/or shared delusion) is extremely valuable. And there's a huge pool of investors and co-founders I haven't tapped, across the pond in the US. I think the kind of borderline delusional conviction a startup founder needs is more common in the US. It's something I used to have and could build up again.

I also considered working at a startup, but that would've delayed my move to the US by at least a year. You need a year at the company to L1B visa into the US, and my 2 years at Bloomberg still count despite the gap. And I would've probably jumped at an okay looking offer just to have the company go bankrupt a few months later.

Conclusion

https://www.youtube.com/watch?v=EEjq8ZoyXuQ

LeetCode on a Z80 CPU from 1976

2022-02-10T00:00:00Z

A while ago, I soldered together a Z80 homebrew computer and ran some programs on it, then wrote a blog post about it.

I did end up running some toy programs and some BASIC games like Super Star Trek, but that got me thinking - how hard would it be to solve a real algorithmic problem on it?

It turns out it wasn't hard at all, thanks to the Z80 development kit (z88dk) and the hexload BASIC program that allows running arbitrary binaries from BASIC, without needing to reprogram the ROM. That's very handy since I don't have a ROM programmer!

Comparing compiled Z80 assembly to modern x86 assembly for the same C program is pretty cool - the Z80 is a slightly extended 8080 instruction set, so you might expect the programs to look pretty similar, but x86 is really a whole different beast.

Step 0: Building the computer

Step 0 is building the computer itself - look at that blog post I linked above for some tips, but I really recommend checking out the RC2014 website for the latest advice.

The computer has 8K ROM, 32K RAM, runs at 7.3728MHz and communicates over serial at 115,200 baud. It's not exactly a speed demon, but it's surprising what such a "slow" computer can do - that's still 7 million clock cycles per second.

Step 1: Writing the program

The problem I chose was the eternal classic Two Sum. The problem is as follows:

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

You can return the answer in any order.

This is a really simple problem with an easy O(n²) solution, and a slightly less easy O(n) solution. The quadratic solution is to loop over all pairs and check them: this requires no extra data structures and indeed no extra libraries, it's just looping over an array.

The linear solution is to construct a hash set of the numbers in the array then loop over the array, checking if the required summand to get to the target result is in the array.

The linear solution isn't all that complicated, but involves a lot of extra stuff (there's no C standard library hash set). Here's my quadratic solution:

#include <stdio.h>
#include <stdlib.h>

void two_sum(int target, int length, int *nums) {
    for (int i = 0; i < length-1; i++) {
        for (int j = i+1; j < length; j++) {
            if (nums[i] + nums[j] == target) {
                printf("%d, %d\n", i, j);
                return;
            }
        }
    }
}

int main() {
    printf("twosum\n");

    int target;
    scanf("%d", &target);

    int length;
    scanf("%d", &length);

    int *nums = calloc(length, sizeof(int));

    for (int i = 0; i < length; i++) {
        scanf("%d", nums+i);
    }

    two_sum(target, length, nums);

    free(nums);
}

The first thing to notice is that I'm just using C standard library functions like printf and calloc - z88dk lets you write "normal" C code almost as if you're writing code for a PC. The difference is that the z88dk standard library is mostly hand-written Z80 assembly, with over 250k lines of code.

Take a look at the z88dk source on GitHub, it's crazy the amount of work that's gone into this. They claim it's the largest repo of Z80 assembler online, and I believe them.

That means we literally can just compile the same program for Z80 and x86 and compare them!

Step 2: Compiling the program

Once you install z88dk, you'll have the zcc compiler. Make sure to run make install to install the compiler to /usr/local, and you can invoke it with zcc. Full instructions are here: https://github.com/z88dk/z88dk/wiki/installation

Assuming the source code above is in twosum.c, you can compile it with this:

$ zcc +rc2014 -subtype=basic -clib=new twosum.c -o twosum -create-app --c-code-in-asm

This uses the "new" C library. I couldn't get the recommended command from the hexload instructions page to work, which used -clib=sdcc_iy, it complained of a missing binary called zsdcpp - presumably a C preprocessor. Even after symlinking a C preprocessor to have that name, I was still getting errors, so I gave up and used the command above, which worked great.

This outputs a few .bin files and an .ihx file - we are interested in the .ihx file.

Step 3: Uploading the program to the computer

I have an RC2014 running Microsoft BASIC from 1978 and a serial connection. There's no way to run an HTTP server on my PC and download the files to the RC2014 or copy the file from a disk - there isn't even any non-volatile storage except the ROM, and certainly no network.

This is where hexload comes in: https://github.com/RC2014Z80/RC2014/tree/master/BASIC-Programs/hexload.

Hexload is a program that allows you to transfer binaries encoded in Intel HEX format (which is just hex encoded binary data with some metadata and other stuff, see the gritty details here) over a serial line to a computer running Microsoft BASIC, then run them.

The key mechanism behind this is that, after we write the binary to memory, BASIC has a command USR(x) that tells the CPU to jump to an address with a user program and start executing. That is, we boot to BASIC, load the program into memory with hexload, then run

PRINT USR(0)

to run our program. This is pretty exciting. Full instructions are on the hexload page linked above, including the slowprint.py program needed to make sure the program isn't printed too fast (which would cause characters to be dropped).

First, I connect the serial cable, power on the RC2014, then connect with screen to my USB FTDI serial cable (sudo since I couldn't be bothered to fix permissions on the device file):

$ sudo screen /dev/ttyUSB0 115200

Then we press the reset button on the clock board or on the backplane to show the boot prompt:

Z80 SBC By Grant Searle

Memory top?

According to the hexload instructions, I put in 35071:

Z80 SBC By Grant Searle

Memory top? 35071
Z80 BASIC Ver 4.7b
Copyright (C) 1978 by Microsoft
1916 Bytes free
Ok

Now we're ready to send a BASIC program over the wire. For that we can use slowprint.py.

First we send hexload over:

$ python3 slowprint.py < hexload.bas | sudo tee /dev/ttyUSB0

I use tee so we can see if there are any discrepancies between what's sent and received. A few times one or two characters were missed and there was an SN Error in BASIC. If you get that, reset the RC2014 and try again, it'll probably work next time.

Once that finishes, you'l be greeted with this:

Loading Data
Start Address: 8900
End Address:   89D7
USR(0) -> HexLoad
HEX LOADER by Filippo Bergamasco & feilipu for z88dk
:

And now we can start sending over our two sum program. Run this:

$ python3 slowprint.py < twosum.ihx | sudo tee /dev/ttyUSB0

The upload may take a while, but after it's done, the program should automatically start executing and we can input our example (the first example from LeetCode):

twosum

9
4
2
7
11
15
0, 1
 0
Ok

That is, the target was 9 and the input array was [2,7,11,15]. We got 0, 1, which is correct - indices 0 and 1 are 2 and 7 which do add up to 9.

Z80 vs x86

That's great, but what does the machine code look like compared to the devil we know - x86? And how big is it?

For comparison, I compiled twosum.c on my PC, optimizing binary size, with glibc and gcc 11:

$ cc twosum.c -g -Os -o twosum.x86
$ objdump -M intel -S twosum.x86

This prints the assembly we get from compiling twosum.c on a PC. The most interesting part to me is the two_sum function, since it should be rather similar - a few for loops, a branch, and a function call. How big could the differences be? I won't paste both binaries here in full, but there are a few interesting points.

Using the stack is horrific

On Z80, only push and pop can read or write directly to the stack. You may wonder why we can't just do:

ld hl, (sp)

to load whatever's at the stack pointer into the hl register, and the answer is that you just can't - the Z80 had to maintain binary compatibility with the Intel 8080 (an 8-bit CPU) and didn't have that much freedom in how it could extend the ld instruction. That's why, when we want to read something from the stack into hl, we end up doing this:

pop hl
push hl

which does a whole lot of reading and writing to the stack just to achieve the effect of reading the top of the stack into hl. Performance-focused Z80 programmers wrote assembler by hand and avoided doing things like this. One might even say that C was too slow since compilers weren't advanced enough to beat the best assembler programmers.

The x86 code didn't even use the stack at all - we have a full 16 8-byte registers available, all general purpose. There are even more non-general purpose registers available like the stack pointer, base pointer, instruction pointer, various SIMD registers, etc.

This is in comparison to the Z80 which only really has 4 16-bit general purpose registers, and not all of them can be used with all instructions.

As an example, say we want to compile this statement in C:

int j = i+1;

(part of the for loop in the two_sum function)

In x86 it's simple:

lea r9,[rax+0x1]

We say we're keeping j in r9 and i in rax, we write rax+1 to r9. Easy.

With the Z80 we're severely constrained in how many registers there are, and we have to keep i and j both on the stack.

pop  hl
push    hl
inc hl
push    hl

So i was at the top of the stack. we set hl to i with the pop/push dance, increment hl, and push hl to the stack, so we're keeping j at the top of the stack.

I can see how all these extra registers make life easier.

Optimising for size hurts speed, a lot

This may seem obvious, but on the Z80 this is true to an extreme. When you have a limited space, like 32K of memory, you have to work hard to fit BASIC and whatever your program is into memory. That means it's often required to call a function where x86 (with no concern about space) can just inline.

One example is the l_gint function, which you can see here: https://github.com/z88dk/z88dk/blob/87003c95c1f3d9be8d4704beff94010159989ec2/libsrc/_DEVELOPMENT/l/sccz80/9-common/l_gint.asm. It's so trivial as to be ridiculous, it's just

l_gint:

   ld a,(hl+)
   ld h,(hl)
   ld l,a

   ret

This just does some loads and increments hl. Why a function? Because this sequence of operations is really common, and it makes for a much smaller binary if these operations are called as a function rather than inlined. This is absolutely required, as the Z80's primitive instruction set already makes it incredibly difficult to get anything done in a reasonable number of instructions.

As an example, take this line of C:

if (nums[i] + nums[j] == target) {

This is just reading from some memory locations, adding, comparing and perhaps jumping. Easy. Yes, on x86:

mov    r8d,DWORD PTR [rdx+rcx*4]
add    r8d,DWORD PTR [rdx+rax*4]
cmp    r8d,edi
jne    1221 <two_sum+0x18>

We read the two values from memory, indexing into the array with the lovely CISC mov and add instructions, and jump past the printf if the sum isn't the target.

On Z80, it's a complete nightmare, and would be even worse without these l_gint-style functions to reduce the size:

    ld   hl,6    ;const
    call    l_gintspsp  ;
    ld  hl,4    ;const
    add hl,sp
    call    l_gint  ;
    add hl,hl
    pop de
    add hl,de
    ld  e,(hl)
    inc hl
    ld  d,(hl)
    push    de
    ld  hl,8    ;const
    call    l_gintspsp  ;
    ld  hl,4    ;const
    add hl,sp
    call    l_gint  ;
    add hl,hl
    pop de
    add hl,de
    call    l_gint  ;
    pop de
    add hl,de
    ex  de,hl
    ld  hl,10   ;const
    add hl,sp
    call    l_gint  ;
    call    l_eq
    jp  nc,i_8

All these function calls are slow, even if they save space. By the way, Z80 assembly is a complete disaster to write by hand. x86 assembly is much nicer in comparison.

It's important to note: the space optimizations work very well and although the code looks terrible for the Z80, it's actually much smaller:

$ du -h twosum.bin
8.0K    twosum.bin
$ du -h twosum.x86
20K twosum.x86

That's 8K for the Z80 code and 20K for the x86 code.

Conclusion

It is incredible that anyone ever wrote software for the Z80 in pure assembler. Hats off to Bill Gates and Paul Allen for actually starting a company based on software written for the Intel 8080, in assembler.

Like holy shit, Altair BASIC fit into 4K of memory and was an actual programming language kids could write programs in!

I could have maybe stomached writing software in assembler once more advanced 16-bit CPUs like the 8086 or 286 came around, but I'm afraid that writing real programs for the Z80 or even worse, 6502, seems really tedious.

I'm in shock that the much more powerful Motorola 68000 didn't become the standard.

Anyway, we've come a long way in 50 years, maybe in another 50 people will wonder how we coped with Python and Java.

Don't stick to what you're good at - my startup catastrophes

2022-02-01T00:00:00Z

The conventional wisdom is that if you're starting a business you should do something you're good at or know something about. If you're a software engineer, you should probably be the guy running engineering. If you're in sales, you should probably be the sales guy. I mean, that makes sense.

This is going to sound stupid and obvious, but people have a strong bias towards doing things they enjoy and are good at. And when you're doing those things it feels good and even worse, it feels productive.

Why is that bad? Because feeling productive can have almost no correlation to being productive.

I quit my job to do things that felt more productive. Spoiler alert: they were dismal failures and produced nothing except some bitter lessons! Isn't that fun?

I'm going to dissect some of my failed businesses:

I tried to re-sell NordVPN activation keys, passing on some of the bulk discount. What could go wrong - people already buy NordVPN and this is just NordVPN but cheaper, right? Not quite, which was a rather painful lesson.
I co-founded a remittance service (think TransferWise, Western Union) that was faster, cheaper, and aimed mainly at transfers from the UK to under-served African countries. We had people lining up to use it, we had all of the tech in place to actually provide the service, it was 10x cheaper, 10x faster, worked on weekends, and the Financial Conduct Authority would've never approved it. Oh.

Why did I keep (or even start) working on them? Because writing software felt good. It felt productive. But it wasn't! I'll go through what I think the warning signs were and how you (and I) can avoid making the same mistakes.

Warning sign 0: I hadn't talked to users before building software

When starting out with my VPN resale business, the first thing I did was build the website and set up payments with Stripe. That seems reasonable, doesn't it? I thought that people would love to buy a cheaper NordVPN plan. After all, NordVPN has lots of customers, so if I sell literally the same product but cheaper, I'll sell more. The only problem is getting the word out, right?

No, there's something missing from that plan: actually talking to potential users and seeing if they want it! After building the website, I did talk to several potential users and saw some gigantic red flags that I didn't recognise. Here are some people I talked to:

Myself! I already have a VPN, but I bought like a 5 year subscription to another provider. I didn't even want to buy my own product since I already had a VPN and didn't need another.
Friends: my friends with VPNs weren't interested since they...already had VPNs, and my friends without didn't care. But I pitched my idea to them and they said "That sounds great, the potential market is huge!" "NordVPN is great, I hear they make a lot of money!" but none of them bought anything from me.
People on Twitter: I ran a few ads on Twitter and some people retweeted, trying to alert NordVPN to the scam I was running, since it's obviously impossible that I could be selling for cheaper than buying directly. No-one bought anything, even after I contacted them and explained how I was doing it (NordVPN gives discounts to resellers who buy in bulk).
People on Reddit: I was banned from some subs for shilling my product. I tried to be subtle about it but I got caught! Whoops.

I never encountered anyone who actually wanted to buy the product, despite encountering many people who literally said to me that they "would" buy it. You should at least find one user before you build the product! And it is crucial that they agree to pay you, and then actually do pay you at some point, likely after you build the product.

I stuck with this VPN reselling idea because I enjoyed building it. I enjoyed building a web app, I enjoyed integrating with NordVPN's API, I enjoyed integrating with Stripe and PayPal. I should've tried to find users and got them to buy something before I spent all this time building software.

Building software felt productive, but didn't produce anything valuable. In fact, the value was negative since I spent weeks doing work that was ultimately just thrown away. If I had just spoken to people and really tried to find customers before building anything, I'd have avoided that wasted time.

Taking a day to speak to people and avoid two weeks of waste is an insanely high level of productivity. I knew talking to users was important, but I guess I didn't really get it until I felt that sinking feeling of despair when I realised I had wasted weeks of effort.

Is there a magic way I could've realised I was wasting my time? Yes. It turns out I just had to believe my own eyes.

Warning sign 1: I didn't believe my KPI

KPI means Key Performance Indicator.

For a startup where you're selling something (a service, a product), the only KPI that really makes sense is revenue. You can count users, hits, downloads, etc, all you want, but those don't mean anything unless they pay you. If they're not paying you, you aren't validating the core hypothesis which is that people will pay you for something!

You might think it's enough for people to say "I would pay for that" but it's not! They're lying! Read this book: http://momtestbook.com/. Your KPI is revenue, your experiments should be simple to evaluate: success means revenue goes up, failure means it doesn't. Vague non-committal compliments should not be taken as a good sign.

Yes, that means you are failing whenever your revenue isn't growing. It feels bad to realise this. It's easier to convince yourself that you built this thing, and that's progress. That's what I did: my eyes told me my KPI was zero, but I chose not to believe them. Instead, I was measuring my progress by the amount of work I had put into my minimal viable product (MVP). After all, your MVP needs to scale, and it needs to work for complete strangers.

Wrong! So wrong! Don't measure inputs (time, code), measure outputs (revenue)!

MVPs don't even have to involve code. I later realised that I could just ask friends and family to pay me for a VPN plan via a Stripe invoice, and I could send them the key. The first sale would've moved the needle and been real progress - it would've increased the KPI. After weeks of intensive coding, I had that epiphany and just started to try to sell by just speaking to people.

I asked several people to buy a plan and they all said no. Not in as many words: some people changed the topic, deflected, claimed they'd buy at an unspecified future date, and so on. These are all no. Rejection hurts so I moved on to finishing the site, and plastering Google, Reddit, and Twitter with ads. Thousands of people went to my site, hundreds went to the checkout page, no-one bought anything. Interaction with users led me to believe that everyone thought it was some kind of scam, or they weren't actually interested.

From another perspective, I ran thousands of experiments, all indicating my hypothesis that people would buy VPNs from some random site just because it was cheaper was false. I even tried planting trees with a portion of the revenue, but that didn't work either - that's just a nonsense gimmick or marketing ploy that doesn't really solve a valuable problem for anyone.

After revenue failed to budge, I eventually learnt the lesson that if my KPI was telling me my idea wasn't working, I should move on. Again, the reason it took me so long to realise this was that the novelty of running ads, building a website, building a checkout, and so on, hadn't worn off.

I just like building stuff and would do it for free. So that's what I ended up doing inadvertently.

This warning relates to market risk - the risk that the market won't need or buy your product. The product itself was simple enough, the MVP was trivial (speaking to people and getting them to buy something). Even the website wasn't that hard to build, although it wasn't minimal. I could easily build the product but no-one wanted it.

There's another kind of risk - the risk that one can't even deliver the product. That's what I learnt about in my next attempt.

Warning sign 2: High barriers to entry and product risk

I moved on. While using YCombinator's Startup School forum, I met a few possible co-founders, but only one had a problem he was solving which I personally had. I had this problem and never even considered that I could be the one to solve it. This sense that I could actually improve my own life in a valuable way was intoxicating. My problem was sending money to Africa cheaply and quickly.

I'm a dual citizen of the UK and Mauritius (a lovely little island in the Indian Ocean). Sending money via banks incurs foreign exchange fees and vaguely defined fees for whatever it is that banks do (it turns out they do a lot). Fintech startups like Wise didn't offer transfers to Mauritius, and the ones that did (e.g. Remitly) offered pretty bad FX rates. Money sent on Friday would arrive the next week despite the fact that domestic bank transfers did work on weekends.

This was a problem I had personally. It was a high value problem since a lot of people send a lot of money back home, and do it regularly. We spoke to over a hundred people (friends, family, and relatives) who not only wanted a solution, but were willing to trust us with their money if we could deliver. This is all great!

Next step: we spent a few months building the product. AAH NO! The KPI is still at zero, we're supposed to get people to pay for something. Why didn't we, despite the lesson I ostensibly learnt earlier?

One word: regulation.

Consumer fintech is extremely regulated, especially in the UK. It would have been literally illegal for us to start charging people and moving their money abroad without registering with the Financial Conduct Authority. To register, we had to prove that we had the ability to build and deliver the product. They weren't concerned about market risk, they were concerned about product risk.

We focused on the tech, building an Android app, a website, a backend integrating with Plaid and Visa, bank APIs, all kinds of stuff. It all looked good, it seemed technically possible to initiate a transfer and actually send money quickly, using something resembling a digital hawala system.

We realised too late, after months of work, that the FCA would never approve such a product. Not one built by us, a couple of guys with no experience in regulatory compliance and no idea what we actually had to do to comply with the regulations. We thought we knew what we had to do to comply, but as it turns out, we didn't.

What went wrong?

Talking to users felt good but didn't actually move our KPI. You can breathe only when that number goes up. Talking to users is a basic requirement that we fulfilled, but it's not enough.
Building the software felt good and we forgot about our KPI. Is it really this easy to build a working fintech service? It sure is, technology sure is great! We must be doing well, look at all of this code!
We never considered, not for a second, that we couldn't build the product. We knew we had the technical ability (and were right), but we had a big case of the Dunning-Kruger effect - being a computer whiz doesn't actually grant you magic powers to convince the regulator that you're totally not building a money laundering service.

It's tempting to blame the regulator, but there are reasons for these regulations. We were really killed by our lack of appreciation for product risk. We failed to seriously consider issues outside our expertise that meant we just couldn't deliver the product.

It's probably for the best. I once saw a video of Michael Seibel (co-founder of justin.tv, later called Twitch) and he said that once you mess up with people's money, that's it. You get the Wikipedia page with "your company messed up big time" and maybe even broke the law. Googling your name brings up the fact that you can't be trusted with customers' money. That's permanent, it doesn't go away, and is an outcome that we avoided at least.

Conclusion

There's a clear trajectory here and clear lessons learnt, so these experiences weren't all bad. The key points are startlingly simple, to the point that I feel like an idiot even writing this stuff down.

Here's the checklist:

Talk to users
Get them to pay you
Don't break the law

Here are some things that you shouldn't do unless ABSOLUTELY necessary:

Build software
Run ads
Spend money

Note that this stuff seems somewhat easier if you have a B2B startup, since the number of users to speak to is usually much smaller, and they are much more willing to pay you if your product is valuable to them. I haven't personally done that, maybe I'll try that at some point.

I console myself with something I saw in a lecture from Steve Jobs (I know, I know): you can't fake scar tissue. I've got a lot of very real (figurative) scar tissue now, and I got it all crawling through the razor-blade riddled crawlspace that is the real world.

Hunt people down and solve their problems!

Why doesn't GCC do this "easy" NRVO optimization?

2022-01-25T00:00:00Z

Since C++17, us C++ programmers have rejoiced in the fact that when we return something from a function, the standard now guarantees that the value won't be copied. This is known as return value optimization (RVO) or copy/move elision, and happens in cases like this:

MyType myfunc() {
    // ...
    return MyType{arg1, arg2};
}

That is, MyType is constructed once and never copied or moved.

But some compilers still don't perform RVO in some cases. It turns out this is because RVO refers only to when you return unnamed values. Named RVO is apparently not considered RVO by the standard's definition. Named means something like:

    MyType x{};
    x.do_something();
    return x;

And gcc (11.2) doesn't always perform NRVO, even if it "obviously" can. Why? Do other compilers do better? I tried to find out.

An example that works

Let's look at a code sample to see what the problem isn't, i.e. an example where NRVO does happen. We have this code:

void doSomething(char*);

struct MyType {
    char buffer[100000];
};

MyType wantNrvo() {
    MyType x;
    x.buffer[0] = '\0';
    return x;
}

int main() {
    auto x = wantNrvo();
    doSomething(x.buffer);
}

And we are interested in what the wantNrvo function compiles down to with -O3, we want gcc to give it its best shot. That's why we add the doSomething function, to stop x being optimised away. We could have instead added volatile to x, but I hardly ever see that in real code - that would just confuse us. A function call is more realistic.

Anyway, the generated assembly is:

wantNrvo():
        mov     BYTE PTR [rdi], 0
        mov     rax, rdi
        ret
main:
        sub     rsp, 100008
        mov     rdi, rsp
        mov     BYTE PTR [rsp], 0
        call    doSomething(char*)
        xor     eax, eax
        add     rsp, 100008
        ret

We can see there's obviously no copying of the huge 100,000-byte buffer there, since there's no memcpy (or equivalent) anywhere. That's what we're looking for.

It's interesting to note that NRVO means the caller has to make some space for the object to be returned on the stack before the function is even run, then the object is built in the space given.

The function is being passed the location to build the object as rdi. Thus, RVO can be viewed as a rewriting of our function to take a pointer to a block of memory big enough for MyType, and using placement new.

That might look like this:

void wantNrvo(char *memory) {
    MyType *x = new(memory) MyType{};
    x->buffer[0] = '\0';
}

The point is that there is one single block of memory where the return value is constructed, and this is allocated by the caller. That seems fine: the caller knows the size of the block and can allocate it - what could go wrong?

What goes wrong

Sometimes it's hard to tell which object out of multiple objects is going to be returned, meaning we don't know which object to construct in the return value area.

That is kind of non-obvious, so here's an example:

MyType wantNrvo(bool test) {
    MyType x;
    MyType y;
    x.buffer[0] = '\0';
    return test ? x : y;
}

There is obviously no way to know at compile time which of x or y we should construct in the return value area. Using our knowledge of this program, you might think: hey, why don't we just check test and decide which of x and y to construct in the return value? Alas, gcc, even with -O3, isn't that smart. This is the machine code produced:

wantNrvo(bool):
        sub     rsp, 200008
        mov     r9d, esi
        mov     edx, 100000
        lea     rax, [rsp+100000]
        test    r9b, r9b
        mov     rsi, rsp
        mov     BYTE PTR [rsp], 0
        cmove   rsi, rax
        call    memcpy
        add     rsp, 200008
        ret

It's there, plain as day: memcpy! The huge buffer is being copied!

It gets worse!

There are other, even more obvious cases where gcc fails. Maybe the problem is that we're construction two objects, then picking one to return. Maybe that's confusing for some reason. We can attempt to rewrite the code:

MyType wantNrvo(bool test) {
    if (test) {
        MyType x;
        x.buffer[0] = '\0';
        return x;
    } else {
        MyType y;
        return y;
    }
}

This code is starting to get a bit contrived. But let's see what that compiles to:

wantNrvo(bool) [clone .part.0]:
        sub     rsp, 100008
        mov     edx, 100000
        mov     rsi, rsp
        call    memcpy
        add     rsp, 100008
        ret
wantNrvo(bool):
        push    r12
        mov     r12, rdi
        test    sil, sil
        je      .L5
        mov     rax, r12
        mov     BYTE PTR [rdi], 0
        pop     r12
        ret
.L5:
        call    wantNrvo(bool) [clone .part.0]
        mov     rax, r12
        pop     r12
        ret

Now this is bizarre! There are two branches of this function:

The test = true branch which uses NRVO and writes directly to the block of memory pointed to by rdi.
The test = false branch which allocates new memory on the stack, and copies the new, just-allocated memory into the area pointed to by rdi with memcpy.

Surely this is just optimized badly because we don't actually call it, right? Don't be so sure, when we call it it gets inlined, but still uses memcpy:

int main() {
    auto x = wantNrvo(false);
    doSomething(x.buffer);
}

main:
        sub     rsp, 200008
        mov     edx, 100000
        lea     rsi, [rsp+100000]
        mov     rdi, rsp
        call    memcpy
        mov     rdi, rsp
        call    doSomething(char*)
        xor     eax, eax
        add     rsp, 200008
        ret

Why, God? Why?

I am not a compiler author. I have never written a compiler and am much dumber than those behind gcc's NRVO.

But NRVO is pretty important, and I expect it sometimes! That's backed up by this comment on this gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51571

I would like to strongly oppose the notion that this "just a missed optimisation and not very critical, really".

NRVO is not just "an optimisation". It's actually one that is explcitly permitted to change observable behaviour of the program and it's extremely powerful.

And it it required for performant C++. Just try to return a std::vector by value to see the importance of this optimisation. This is not missed optimisation. This is premature pessimisation.

You could just as well stop all optimisation work for the C++ frontend until this is implemented, because any other optimisation effords are dwarfed by the overhead when NRVO is expected by the developer but not applied.

Please make this a top priority. Every C++ program will benefit both in text size and in runtime performance - dramatically.

-- Marc Mutz

But then again, there are often hyperbolic, borderline rude comments left on projects saying "make this a top priority", so who knows?

Let's see clang's business card

Let's see clang's machine code, does it do any better? Yes!

(using clang 13.0.0)

wantNrvo(bool):                           # @wantNrvo(bool)
        mov     rax, rdi
        test    esi, esi
        je      .LBB0_2
        mov     byte ptr [rax], 0
.LBB0_2:
        ret

This is exactly the perfect machine code we want. Give us some memory in rdi, collapse those branches down to writing or not writing a nul byte to it. Easy, no memcpy!

Lessons to learn

Although you might see some text on cppreference.com saying:

Return value optimization is mandatory and no longer considered as copy elision; see above. (since C++17)

-- https://en.cppreference.com/w/cpp/language/copy_elision

This doesn't refer to NRVO! Assuming that NRVO will happen can be very dangerous, resulting in copies of huge objects, and that hurts performance a lot.

Try as hard as possible to return prvalues to take advantage of C++17's guaranteed RVO.

It's a dog-eat-RVO world out there. Be careful.

Representing complex numbers exactly on a computer

2020-06-03T00:00:00Z

There are a lot of ways to represent numerical values on a computer, you've got the various fixed-size integer types and floating point types, but you also have arbitrary precision arithmetic, where the size of the number is limited only by the memory of the machine.

To represent the real numbers, $\mathbb{R}$, programmers often choose floating point numbers. But of course, floating point is terrible: multiplication isn't associative or commutative, neither is addition, you don't really have reciprocals, and so on. Useless for exact arithmetic of the kind you need when doing algebra.

This blog post is about a way to represent cyclotomic fields, which provide (among other things):

All rational numbers
$\sqrt{m}$ for any integer $m$
All of the field axioms: multiplication and addition with proper associativity, commutativity, and invertibility
Enough numbers to represent any finite group

They're still a bit tricky to represent on a computer though, as you'll see.

What is a cyclotomic field?

I refer to $K$ as a cyclotomic field if there exists some $n > 0$ such that $K = \mathbb{Q}(\zeta_n)$. That is, $K$ is the rational numbers with one primitive nth root of unity added. Essentially, that means that we have $\mathbb{Q}$ plus all solutions to the equation $x^n - 1 = 0$, and we take the closure under all field operations (addition, multiplication, inversion).

Why is this a field? The proof of this requires a bit of mathematical knowledge, but it's because our definition of $K$ was a bit non rigorous. In fact, $K$ is the splitting field of $x^n - 1$ over $\mathbb{Q}$. There's a decent amount of Galois theory machinery you need to build up before it becomes clear why $K$ should exist, but essentially you just factor $x^n - 1$ into irreducible factors and, at each stage, add a root to $K_i$ by constructing $K_{i+1} = K_i[x]/(f_i(x))$. The details aren't as important as the result, which is that $K$ exists and is a field.

At this point, we can prove that we have all of the field axioms. Great! We also have all roots of rationals by the Kronecker-Weber theorem (non-trivial to prove, but a nice result). What do I mean by "we have enough numbers to represent any finite group"?

If you read my other blog post or know a thing or two about representation theory, you'll know what a representation is. The cool thing is that you can represent any finite group with coefficients in some cyclotomic field $K$.

More specifically, if $G$ is a finite group with elements whose order have least common multiple $m$, then if $K$ contains the $m$th roots of unity, we can realise all representations of $G$ using coefficients from $K$. This is a theorem due to Brauer which I saw in a textbook on representation theory by Serre.

How do you represent one in a computer?

The answer to this is simple, you just represent it the same way as you do on paper - as a sum of powers of $\zeta_n$ with coefficients from $\mathbb{Q}$. So if we have $z \in K$, we can represent an element as:

$$z = \sum_{i=0}^{n-1} q_i \zeta_n^i$$

where $q_i \in \mathbb{Q}$. So we can just store a vector of the coefficients $q_i$, right?

WRONG! If you did this, the representations of elements would not be unique, so you wouldn't be able to check equality. The problem is that ${ \zeta_n^i : 0 \leq i < n }$ isn't a basis for $K$ as a $\mathbb{Q}$-vector space. This is obvious if you factor the polynomial defining $K$ a bit:

$$\frac{x^n - 1}{x-1} = 1 + x + \ldots + x^{n-1}$$

So we have a linear dependency: $-1 = x + \ldots + x^{n-1}$. We can we can try to fix this by seeing if ${ \zeta_n^i : 0 < i < n }$ is a basis. Since we got rid of the constant term, that's it, this is a basis now, right?

WRONG! Well, wrong in all except one case. If $n=p$ prime, then this actually is a basis. This is because all of the roots are primitive, $p$ is coprime to all smaller numbers (except 1), and thus $K/\mathbb{Q}$ is a degree $p-1$ extension i.e. a dimension $p-1$ vector space over $\mathbb{Q}$. That's not exactly rigorous, but you get the idea.

From this point, we assume $n=p$ prime, so this basis is actually a basis. Otherwise we have to choose a better basis and things get complicated.

Operating in the field

It's clear how to compute $z + z'$ and $-z$, you just add the list of coefficients in $\mathbb{Q}$. $zz'$ isn't much more difficult, you just write out the sums and multiply them:

$$ \begin{align} zz' &= \left(\sum_{i=1}^{p-1} q_i \zeta_p^i\right)\left(\sum_{i=1}^{p-1} q_i' \zeta_p^i\right) \\ &= \sum_{i, j} q_i q_j' \zeta_p^{i + j} \\ \end{align} $$

After computing this, you just reduce all of the $i + j$ modulo $p$, and get rid of any constant terms produced using the $-1 = x + \ldots + x^{p-1}$ relation.

The only kind of tricky operation is multiplicative inversion, that is, computing $1/z$. At this point we must rely on some Galois theory. The Galois group $G = \text{Gal}(K/\mathbb{Q})$ is, in this case, the group $\mathbb{Z}_p^\times$ of multiplicative units of the integers modulo $p$, or the additive group of the integers modulo $p-1$.

$G$ is generated by any nontrivial field automorphism. We can determine a field automorphism by saying where $\zeta_p$ goes, and it can go to any other nontrivial root. Since $p$ is prime, all of the maps given by $\zeta_p \mapsto \zeta_p^i$ for $1 \leq i \leq p-1$ are invertible field automorphisms. Let's just pick $f \in G$ to be $\zeta_p \mapsto \zeta_p^2$, although we could pick any nontrivial image.

If we just multiply all elements in the $G$-orbit of $z$, then we get something that's fixed by all of $G$, so must live in $\mathbb{Q}$. Let $g \in G$, then:

$$g\left(\prod_{h \in G}h(z)\right) = \prod_{h \in G}gh(z) = \prod_{h \in G}h(z)$$

where the last equality is just a relabelling of the elements of $G$.

Thus $\prod_{i=0}^{p-2}f^i(z) = q \in \mathbb{Q}$. And now we have a formula for the inverse of $z$, since:

$$q^{-1}\prod_{i=0}^{p-2}f^i(z) = q^{-1}q = 1$$

But also:

$$q^{-1}\prod_{i=0}^{p-2}f^i(z) = z\left(q^{-1}\prod_{i=1}^{p-2}f^i(z)\right)$$

So finally, in terms of only taking reciprocals of rationals, multiplication, and addition:

$$z^{-1} = q^{-1}\prod_{i=1}^{p-2}f^i(z)$$

Nice! Now we have all of the field axioms.

Other things to explore

How do we represent the list of coefficients? Sparsely with a hash map, densely with an array, maybe sorted with respect to the exponents using a heap?

And how can this work for non-prime $n$? Like I said earlier, the key is to pick a good basis. The obvious one is ${ \zeta_n^i : 0 < i < n, \text{gcd}(n, i) = 1}$, but there is a much nicer basis we can choose, see if you can work it out. I definitely couldn't, but I'll write another blog post about it at some point.

There is a lot of existing work in this field (haha) out there, but particular thanks go to the authors of the GAP computer algebra system for providing an open source implementation of cyclotomic fields.

Writing a parser for a function call is surprisingly hard

2020-05-17T00:00:00Z

I was recently trying to get to grips with the GAP programming language. For those not familiar, it's the programming language for the GAP computational algebra system. It has tons of algorithms implemented for group theory, representation theory, algebraic number theory, and so on. I was thinking about implementing a TypeScript-style transpiler so I could program with some types, and the first step is to parse the syntax.

To get the most elegant parser, I went for a parser written in Haskell using Parsec, which is an elegant library for LL(1) parsers.

The first problem I ran into was that GAP supports several function call syntaxes:

f(x, y, z); # positional
g(p := 1, q := 2); # named-ish parameters
h(x, y, z : p := 1); # mixed

This is surprisingly non-trivial to parse in general! The path is fraught with infinite recursion, ambiguities, and backtracking.

First problem: infinite recursion

First, we need to understand how Parsec works, it's an LL(1) parser. That means it can be used to parse context-free languages, it tries to perform a leftmost derivation, with 1 token of lookahead. It's easier to explain this with an example. I'll use the notation of production rules for formal languages.

We want to say that a function call is some expression, with a parenthesis delimited argument list which is how we tell we're calling something. So we have things like f(x) but also f(x)(y) which is calling the result of f(x) with the argument y. Using start symbol $E$ (for "expression"), the function call grammar looks like:

$$ \begin{align} E &\to E(A) \\ E &\to L \\ A &\to E, A \\ A &\to E \end{align} $$

where $L$ can be any literal, and $A$ represents an argument list.

We'd write this in Parsec as something like:

data Expr = Variable String | FunctionCall Expr [Expr]

expression = functionCall <|> (Variable <$> identifier)
functionCall = FunctionCall <$> expression <*> (parens $ commaSep expression)

where we're parsing a string into the data type Expr.

The parameters to FuncCall are: the expression we're calling, and the argument list. This is a cut down example, so I say we only have variables as the literals.

The problem comes when we try to run this on a string:

parse expression "" "f(x)"

This will just hang at 100% CPU usage. Why? Because the definition of an expression is immediately left-recursive. Parsec is LL(1) so tries to recurse as far as possible on the left when parsing, and we can just keep going: try to parse an expression, the first step is to check if it's a function call. To check if it's a function call, we need to parse an expression, the first step to that is check if it's a function call, and so on.

What's the solution? We can cleverly transform the grammar to avoid the immediate left recursion.

When we actually see a function expression, what do we see? A non-function call expression on the far left, then a lot of argument lists. As production rules:

$$ \begin{align} E &\to LM \\ E &\to L \\ M &\to M(A) \\ M &\to (A) \end{align} $$

where $A$ is the same as before. In Parsec:

expression = functionCall <|> (Variable <$> identifier)
functionCall = foldl' FuncCall <$> (Variable <$> identifier) <*> many1 argList
  where argList = parens $ commaSep expression

Some explanation: if we have a string "f(x)(y)" this is parsed as:

Variable "f" when we see the f
FuncCall (Variable "f") [Variable "x"] when we see the argument list (x).
FuncCall (FuncCall (Variable "f") [Variable "x"]) [Variable "y"] when we see the argument list (y).

Perfect! This avoids the infinite recursion. Next problem!

Horrific named argument grammar

To implement the keyword arguments, we need to make a few changes to the argList parser. The obvious thing to do is just implement the alternatives in a big list:

argList :: Parser ([Expr], [(String, Expr)])
argList = parens $ do
    args <- commaSep expression
    if length args > 0
    then colon
    else return ()
    named <- commaSep namedArg
  where namedArg = (,) <$> identifier <* reservedOp ":=" <*> expression

where argList now returns a tuple of the non-named arguments and the named arguments.

But this doesn't work! How do you parse the following argument list:

(x := 1)

The "correct" parser would see this as a single named argument, but our parser currently sees this as a variable x, then the colon separating the non-named from named arguments, then an =. That's a syntax error - the parser fails!

The solution is to try the cases in order, explicitly.

argList :: Parser ([Expr], [(String, Expr)])
argList = parens $ try justNamed <|> try both <|> justNotNamed
  where
    justNamed = (,) [] $ commaSep1 namedArg
    namedArg = (,) <$> identifier <* reservedOp ":=" <*> expression
    justNotNamed = (,) <$> commaSep expression <*> pure []
    both = do
      notNamed <- justNotNamed
      colon
      named <- justNamed
      return (fst notNamed, snd named)

Ah, there we go.

Conclusion

Parsing is non-trivial, surprisingly. To see my code in action, check out the full source of my parser for GAP (not yet fully tested).

Please don't complain that the code in this post doesn't work, look at that repo for the real working source!

Decomposing representations: a slice of computational group theory

2020-04-19T00:00:00Z

For my Master's degree, I (helped greatly by my supervisor) implemented some algorithms and even invented some new algorithms to decompose representations of finite groups. I wrote an extremely long (well, relative to other things I've written) and technical thesis about this, but I find myself increasingly unable to understand what any of it means or why I even have a degree.

I thought being forced into a short-form blog post would help me remember whatever it is I spent a few years studying to do. There are some foundational questions:

What is a group?
What is a representation?
What is a decomposition of a representation?

And some more interesting questions, involving some computational tricks relevant to a wider audience:

Why is this useful?
How do you get a computer to do it?
How do you get a computer to do it, quickly?

These are the questions I'll attempt to answer in this blog post. It'll be fun!

What is a group?

Any undergraduate mathematician should know this, but if you don't do algebra and need a refresher, read the Wikipedia page). For non-mathematicians, a group is:

A set $G$ (for example, the real numbers or the rational numbers)
With a binary operation $\ast$ (like addition or multiplication) which takes two elements of $G$ and gives you another element of $G$
Such that $\ast$ is associative, meaning bracketing doesn't matter - $(a \ast b) \ast c = a \ast (b \ast c)$. True for multiplication or addition, but NOT for subtraction
Such that $G$ has an identity element with respect to $\ast$ - an element $e$ such that applying it with $\ast$ does nothing, like multiplying by 1 or adding 0
Such that elements of $G$ all have inverses with respect to $\ast$, meaning any element $g \in G$ has another element $g^{-1} \in G$ such that combining them with $\ast$ gives the identity. For addition, the inverse of $x$ is $-x$. For multiplication, it's $\frac1{x}$.

Examples of groups include the real numbers $\mathbb{R}$ with addition and the non-zero rational numbers $\mathbb{Q}^\ast$ with multiplication.

There are also finite examples, like the set ${+1, -1}$ with multiplication, and the set of permutations of $n$ elements with composition - $S_n$, the symmetric group on $n$ elements.

What is a representation?

A representation $\rho : G \to \text{GL}(V)$ of a group $G$ is a homomorphism (a function respecting the group structure) from $G$ to the group of linear automorphisms of a vector space $V$. You can imagine "vector space" to mean $\mathbb{C}^n$ (the space of vectors with $n$ entries in the complex numbers), since that's usually what we're using.

"Linear automorphism" is a coordinate-independent way of saying "invertible matrix". So you can rephrase the definition of a representation as a map from a group $G$ to $n \times n$ matrices (the $n$ is fixed for all elements), such that the group structure is respected.

Sue me if you don't like it, but you'll never get your vector space into a computer without picking a basis, meaning you are forced to think of linear maps as matrices.

What is a decomposition of a representation?

An irreducible representation is a representation that doesn't have any subrepresentations. What does that mean? We could go through the definitions, but for complex representations of finite groups, irreducible means indecomposable: you can't write the representation as a direct sum of other representations. That is the same thing as saying you can't simultaneously block diagonalise all $\rho(g)$ for $g \in G$ such that the blocks are smaller than the whole matrix.

Now that the basics are out of the way, we can ask an interesting question:

Why is this useful?

Irreducible representations come up in all sorts of places:

Solving problems in quantum chemistry involving symmetries (oh, hello $S_n$)
Solving completely computationally intractable optimization problems by spotting interesting symmetries ($S_n$ again?), then reducing them
Turning groups into linear algebra, then studying them through how their representations behave (this is just a vague statement encompassing all of representation theory)

Anything with symmetries probably has something to do with groups and thus, something to do with representations.

I keep mentioning $S_n$, but the people who paid attention in algebra might note that it is kind of misleading to talk about it as if it is special. In fact, all finite groups appear as a subgroup of some symmetric group due to Cayley's theorem. The proof is actually fairly simple (for such a deep-seeming result) and I encourage you to look at it. So really, saying $S_n$ pops up everywhere in group theory is the same thing as saying groups pop up everywhere in group theory - not surprising.

Sadly, physicists are interested in infinite groups and in particular, Lie groups. You might have heard of the Lorentz group $\text{O}(1,3)$, a certain special unitary group $\text{SU}(2)$, or some other Lie groups - these pop up all the time in physics.

I say "sadly" because my algorithms are not directly applicable to those cases, since they're too continuous to fit inside my poor computer. It's very happy that there is such strong motivation to study Lie groups coming from physics, but my work isn't really relevant there.

How do you get a computer to compute a decomposition?

The first question to ask is: how do you do it by hand?

Let's say we have a finite group $G$ and a representation $\rho : G \to \text{GL}(V)$ where $V$ is a finite dimensional complex vector space (I'm skipping over something very important here, see if you can guess what it is, or skip to the end for the answer).

If you already have the complete list of irreducible representations of $G$ (up to isomorphism), $(\rho_i)$, you can just combine all $\rho_i$ as blocks in all possible ways that give matrices of the correct size. That is, you try to find some $i_j$ such that:

$$\rho(g) = \begin{pmatrix} \rho_{i_1}(g) & & \\ & \ddots & \\ & & \rho_{i_m}(g) \end{pmatrix}$$

Where anything not labelled above is zero. This is easy for some small groups. For example, if you have the representation of ${\pm 1}$ given by:

$$\pm 1 \mapsto \begin{pmatrix} 1 & 0 \\ 0 & \pm 1 \end{pmatrix}$$

It's easy to spot this is given by two $1 \times 1$ blocks. But that's because we already know all of the irreducible representations (let's shorten that to irrep, as is standard), there are only two. Even better, the matrices are already given in the nicest basis possible. It's not really obvious how to "observe" the structure of a representation like this in general.

Here's the dirty trick. Define $\chi_\rho(g) := \text{trace}(\rho(g))$. $\chi_\rho$ is called the character of $\rho$. It has the useful property that for a finite $G$, a finite dimensional representation (meaning the matrices are finite size), and coefficients in $\mathbb{C}$: $\rho$ is isomorphic to $\tau$ (as representations) if and only if their characters are the same.

This gives rise to this simple algorithm:

List all irreps of $G$ using e.g. Dixon's algorithm (IrreducibleRepresentationsDixon in GAP)
Put them together in all possible ways adding up to the correct matrix dimension.
Compute all of the characters - there will be exactly one matching the character of $\rho$, that's your decomposition!

This involves a lot of listing and trial and error - it is highly inefficient and intractable for even small groups and small representations.

Even ignoring efficiency, this tells you the decomposition up to isomorphism. Telling you the basis change from $\rho$ to actually get the smallest possible blocks is another problem. There's one algorithm due to Serre to do that, with a full proof of correctness in his textbook on Linear Representations of Finite Groups.

How do you get a computer to compute it, quickly?

I (and my supervisor) came up with a bag of tricks to speed up the computation by taking advantage of some extra information that's usually available.

Trading space for speed

$G$ is always isomorphic to a permutation group, but in the real world, $G$ is usually actually already a permutation group. This lets us specialise our algorithms using the vast array of tools available for computing with permutation groups. The main trick is that a good base and strong generating set) can be computed with the Schreier-Sims algorithm, letting us sum over $G$ quickly (computing $\sum_g f(g)$ for certain $f$). This lets us compute some of Serre's formulas much, much faster.

Let's say we have two representations $\rho$ and $\tau$ as earlier, which are isomorphic but we don't know the basis change (the isomorphism). $G$ is a permutation group, so summing over it is fast. But how can we reduce finding an isomorphism to a sum over $G$? It's quite a neat trick.

We're looking for a matrix $A$ such that $A^{-1} \tau(g) A = \rho(g)$ for all $g \in G$. the "dumb" way would be to forget the representation structure and just try to simultaneously diagonalise the $\tau(g)$ and $\rho(g)$. I spent a whole six months hammering away before noticing the smart way, which is to notice that there's another representation $\alpha$ defined by:

$$g \mapsto (X \mapsto \tau(g)X\rho(g^{-1}))$$

If we say that $\rho$ and $\tau$ map to $\text{GL}(V)$ then this is a representation $G \to \text{GL}(\text{GL}(V))$. Crazy! The funny thing is that this representation is actually $\tau \otimes \rho^\ast$, the Kronecker product of $\tau$ and the conjugate transpose of $\rho$ (tensor product and dual representation, if you're still coordinate-free). The key thing here is that $A$ is a suitable basis change matrix if and only if $\alpha(g)A = A$ for all $g \in G$.

Even better, the projection from $\text{GL}(V)$ to the subspace of the suitable $A$ is given by the map (a specialisation of a more general theorem from Serre):

$$p = \sum_g \alpha(g)$$

This is where the key speed gain is: we can use the summation trick from before. The sacrifice is that the matrices $\alpha(g)$ are absolutely huge. If $n$ is the dimension of $V$, this uses $\text{O}(n⁴)$ space. Terrible unless you come up with another trick to cut down the space usage.

Be lazy where possible to get the space back

We don't actually care what the matrices $\alpha(g)$ are, we just need to know how to apply them to compute the image of the $p$ above, then pick an invertible element from this image.

Even better, you can pick a random $B$ and $pB$ will almost always be invertible in a rigorous sense: the determinant map $B \mapsto \text{det}(pB)$ is polynomial, nonzero (we know $pB$ is invertible for some $B$), so its zero set is measure zero. In practice, you can just generate a random matrix and it will always work as a basis change matrix.

So in the end we don't need to ever compute the matrices $\tau \otimes \rho^\ast$ in full all at the same time, we can just compute:

$$A = pB = \sum_g (\tau(g) \otimes \rho^\ast(g))(B)$$

The tensor product has the neat property that $(A \otimes B)(C \otimes D) = AC \otimes BD$. We can write the matrix $B$ as $\sum_{i,j} B_{ij} e_i \otimes e_j$, that is we "vectorise" the matrix $B$ into an $n^2$ length vector. Then the equation above simplifies to:

$$A = \sum_g \sum_{i,j} B_{ij} \tau(g)e_i \otimes \rho^\ast(g)e_j$$

So we have ended up with a formula for $A$ (the basis change matrix) which doesn't require the computation of any matrices with $n^4$ entries, and is amenable to the fast group summing method vaguely mentioned earlier.

Since we still have to do matrix multiplications many times, this whole algorithm is still at least $\text{O}(n^3)$ but using clever tensor product tricks, we got it down from $\text{O}(n^4)$. Not bad!

One thing to mention is that the independence of the run time from $|G|$ frees us to compute with huge groups. My algorithm can operate on small degree representations of groups like $S_{10}$, with $10!$ elements (3.6 million elements).

Hopefully this has been some kind of insight into what computational representation theory might involve. Take a look at the source code for the gritty details.

What did you gloss over?

There's one huge thing I haven't explained. Have you guessed what it is?

You can't represent $\mathbb{C}$ in a computer! Floating point is useless for algebra! So how were we multiplying matrices and adding up coefficients in $\mathbb{C}$? There's another trick: restrict your attention to the cyclotomic numbers.

But why is that good enough? Is multiplication still constant time? Was it ever constant time?

I'll answer these questions at some point.

How to accidentally become a maintainer of a project

2020-04-15T00:00:00Z

I'm somehow one of the maintainers of rss2email, a popular Python program for reading RSS feeds and sending feed updates to your email address. I think I reached this point via an unusual route, so I thought I'd write a little about how it happened.

For those not familiar, back in 2004, Aaron Swartz (yes, that one, rest in peace) wrote a short Python script that would read an RSS feed and send you emails. It had a few options, but was fairly simple. It was a few hundred lines long.

After 16 years of features being added and being passed around various maintainers (see here for a complete list of contributors), rss2email is still around, with many more features, many more lines of code, and many (many) more bugs.

How did I get involved?

How I got involved

Back in 2017, when I was still studying for my Master's degree, I was looking for open source projects to take part in to pad my CV, mainly to impress interviewers for summer internships. This led to me looking at the list of software I used and trying to contribute to all of them, fixing any bugs that annoyed me, updating any packages that were out of date, and so on.

At the time, OpenBSD (which I've always had a soft spot for), had a severely outdated version of rss2email in its packages repository: 2.70. At the time, the current version was 3.9. I sent an email to ports@openbsd.org with some updates, expecting it to be accepted without fanfare. Since I was already a user of the new version, I didn't realise the sheer number of breaking changes introduced, so I couldn't predict the long discussion about how to upgrade that ensued.

Eventually, a consensus was reached that the changes were too breaking, so a new package would have to be made. Even worse, the new version had many more bugs than the old version, so existing users on the ports@ mailing list questioned why they should even upgrade.

The OpenBSD package was shelved.

Around this time, @wking's fork was the most updated fork of rss2email - there had been a Python 3 conversion, a test suite had been added, lots of features. However, the fork wasn't very active. Users would open issues, make PRs, but there weren't enough maintainers and not enough manpower to keep the project going. In the past, when this happened, another lone maintainer took over. This is essentially what @wking did, contributing in a big way for many years.

The bus factor

Python is the 2nd most famous example of a project which was run by a BDFL, a benevolent dictator for life, Guido van Rossum. In 1994, someone made this post on a newsgroup:

What if you saw this posted tommorrow.

Guido's unexpected death has come as a shock to us all. Disgruntled members of the Tcl mob are suspected, but no smoking gun has been found...

I just returned from a meeting in which the major objection to using Python was its dependence on Guido. They wanted to know if Python would survive if Guido disappeared. This is an important issue for businesses that may be considering the use of Python in a product.

-- Michael McLay (mclay@eeel.nist.gov)

This led to the popularity of the term "bus factor". A bus factor of n indicates that n team members would have to "get hit by a bus" for the project to stall. In our case a core team member had already passed away, before most of us had started contributing: Aaron Swartz. So the bus factor wasn't a joke - a low bus factor had already almost stalled rss2email for good.

Enter @jsbackus, who, in 2018, decided to create the rss2email GitHub org and add some erstwhile contributors to the project, in the hopes of revitalising it and increasing the bus factor.

Probably by accident, I was included in this effort. See here. I guess the strategy was to add anyone who had shown an interest in maintaining the project. Here's the fateful comment by @Profpatsch:

An update: Jeff has added @Ekleog, @kaashif, @Yannik and myself to the rss2email organization, so we'll hopefully be able to administrate the project with more manpower.

-- @Profpatsch

And there you have it, I was a maintainer!

What does maintaining a project mean, anyway?

After a sizeable effort by the other maintainers to migrate Linux distros to point to our repo, and to claim the package on PyPI, we were all set to start writing PRs and fixing bugs! Off to the OpenBSD mailing list to update to 3.9 (again), but this time with new and improved links to the new repo.

It didn't go too well, this package never made it. The objections can really be summarised by some feedback I received from an OpenBSD user of the old rss2email:

This is at least the second unreleased fix I noticed for newer python3+ versions. This begs the question as to how this newer version is better than what we already have in ports...

-- jca

Version 3.9 was the last version released by @wking, from before the effort to get all of the forks and maintainers under one roof to fix all the bugs. No surprise that it was riddled with bugs then. Back to the drawing board.

Eventually, a year later, we fixed a lot of bugs. The other maintainers put a lot of work in to incorporate forks and many users' personal bugfixes and we released our first version, 3.10.

This time I had a bit more success in my effort to get the package added to ports. At some point, I managed to convince sthen@ (or he decided independently of me) that rss2email didn't need a new package and could just work as a major upgrade. This was because there was a working upgrade path. So if you look at the OpenBSD port of rss2email, you'll see that:

I am the maintainer
It's not a new package: it's a straight upgrade

Amusingly, someone updated it from 2.70 to 2.71 one month before I updated it to 3.10.

Plans for the future

Nowadays, when I get time to work on rss2email, I respond to users' issues on GitHub, review PRs, merge PRs, fix bugs, and so on. It's not glamorous, but it's honest work.

Our plan is simple:

Fix bugs
Don't make users' lives difficult
Add new features

In that order. If something's nonsense, sometimes you've got to break it to fix it...

As always, contributors are welcome! Happy hacking!

Electromagnetic weaponry for fun and profit

2019-10-29T00:00:00Z

Anyone who has played Halo and fired the Gauss cannon on the Warthog has experienced a strong desire to do it in real life. Usually this is cut down by the idea that Gauss guns aren't real. But of course, they are, and I've built a few. Feel free to replace all mentions of "coilgun" with "Gauss superaccelerator" or "magnetic spaceship cannon" if this will satisfy a childhood fantasy of yours.

There are two main types of coilgun I've built:

The "standard" kind, which involves turning an electromagnet on, attracting a ferromagnetic projectile down a barrel towards the coil, then turning the magnet off once it reaches the centre of the coil. This is a reluctance coilgun.
The other kind, which involves turning an electromagnet on and using the sudden change in magnetic field to induce eddy currents in a non-ferromagnetic projectile. The projectile must be an inductor of some kind (e.g. a shorted coil) so that the eddy currents form a magnetic field that repels the projectile away from the coil. This is an inductance coilgun.

I thought an induction coilgun would be easier to time, since we don't have to quickly turn off the coil, but intractable problems led me to turn my coilgun into a reluctance coilgun.

Why not a railgun?

I actually am trying to build a railgun, but it is extremely obvious to me why coilguns are more popular among hobbyists: you don't need moving contacts for a coilgun! It's that simple!

Inductance or reluctance, a coilgun is not a railgun and doesn't require any electrical contact between the projectile and the coil.

What do I need to build it myself?

The bill of materials for a minimal version of this project is actually quite...minimal. My very first coilgun was a low voltage (30V) design, built with parts I had on hand. The voltage had to be low because I had not yet acquired any high voltage diodes or capacitors. All I had was:

Thyristors salvaged from a power drill battery charger (BT151-500R) - decent voltage capability but disappointingly low non-repetitive on-state current capability. This is not strictly necessary for a minimal design, so I won't use it right now.
Some enamelled magnet wire from a transformer from an old computer power supply.
Some 200V 680uF capacitors from the same power supply. That voltage tolerance is weird because it's below what would come from the mains but really high for a "low voltage" project. Nothing special about the capacitance, it was just what I had.
A 120W halogen lightbulb for current limiting.
Various light switches and wiring for the SPDT and DPDT switches you will see later on.
PVC pipe. Get pipe as thin as possible, you want the projectile to be light and fill the barrel, which is much easier if the barrel is smaller. The fact that it doesn't conduct is important to avoid eddy currents (see Lenz's law).

My first Gauss rifle had this design:

The idea is that all switches are open to start off. Then you close SW1 to charge the capacitor, open SW1 and close SW2 to discharge. There is no timing or triggering circuitry involved, we rely on the capacitor current petering out by the time our projectile passes the coil (for a reluctance launcher) or we just don't care at all (for an inductance launcher).

On a practical level, you wrap the coil around the pipe, put the projectile somewhere in it, and experiment with different capacitances and projectile starting positions until you get something that works as well as possible. I did mention that I started out only using 30V, so you get a stored energy (with two capacitors in parallel) of:

$$E = \frac12 C V² = \frac12 (1360 \mu \text{F}) (30 \text{V})² = 0.612 \text{ joules}$$

Is that a lot? Who knows? At 100% efficiency (completely preposterous), this would propel a 1 gram iron nail to:

$$v = \sqrt{\frac{2E}{m}} = \sqrt{\frac{2(0.612 \text{J})}{1 \text{ gram}}} = 35 \text{ms}^{-1}$$

Pretty decent, right? Wrong! So many things were wrong with this initial design it's incredible. Here's a rundown so you don't have to repeat my mistakes.

Efficiency? What's that?

Hobbyist coilguns peak in efficiency at about 1%. Even really good hobbyists can only squeeze out a few percent with really well-designed, multi-stage, low-friction, narrow barrel designs. Those of us building with random pipes and scavenged components can only hope for fractions of a percent.

The solution is to get bigger caps: higher voltages and capacitances. Voltage is more important since it's the highest order term in the equation. Too much capacitance isn't desirable since this increases the time constant $RC$ of the circuit and will lead to slower discharge.

Safety? Never heard of it!

If you accidentally close both switches, you blow the fuse in your plug or, if you're unlucky, you trip your house's circuit breaker and all of the lights shut off. Very dramatic and easily avoided if you wire your switches differently or add some current limiting (in the form of a bulb).

Back EMF

As I learnt (explosively), when you shut off current to an inductor (e.g. by opening a switch), there is a large back EMF. Essentially, a large voltage is induced in the reverse direction, damaging/destroying any polarised components, like that electrolytic capacitor.

The reason for this is apparent if you look at the inductor equation:

$$V = L \frac{dI}{dt}$$

If you cut off the current suddenly, the rate of change is negative infinity, and so is the voltage over the inductor. In the real world, you don't really get an infinite back EMF, but it'll be more than enough to fry your caps.

How does the current traverse the gap when you open your switch? The inductor always wins and closes the gap with an arc, probably damaging your switch too.

The solution is to add a flyback diode, this is a diode connected in reverse parallel with the inductor, so the back EMF is conducted through it and dissipates the energy stored in the inductor safely. Diagrams will come later to make this clearer.

A better design

Some lessons learnt:

Bigger caps
Bigger voltages
Bigger currents
...and also more safety I guess

Sounds good. Here's what I did to apply these lessons:

Take apart some servers to get some better caps. I managed to find some 450V 470uF caps like this, good enough to charge directly from the mains.
Replace SW2 with a high-current thyristor, so we can fire the gun many times without needing to replace the switch. I ended up choosing the TYN640RG which can handle 640V (more than enough) and a peak current of ~500A, which should be fine.
Add a current-limiting lightbulb so even if the thyristor and switch fail closed, it just safely turns on the lightbulb. I just found a 120W 240V halogen bulb in a drawer.
Add a flyback diode! I bought a reel of 1N5408G rectifying diodes that can block 1000V and have a fairly small (1.1V) forward voltage. They can also handle fairly large currents.

Here's what my final single-stage design looks like in circuit form:

The idea of the circuit is the same, close SW1 to charge, open it, then close SW2 to fire. The difference is that now, if you forget and close everything, the lightbulb just turns on and everything remains safe.

It goes without saying that you should use a ton of wire, otherwise this will draw a huge current and destroy your thyristor. You could also add a tiny resistor if you feel like it, but beware of a high time constant slowing down your projectile, don't make R too large.

Where to now?

I would do some calculations and measurements to work out how good my coilgun is, but there's no point: I already know the path to higher efficiency is having more stages. And bigger voltages will mean faster speeds and maybe something worth making a video about.

I'll also have learnt something about how to make a practical high-speed switching circuit with a photodiode, IR LED and maybe IGBTs instead of SCRs (thyristors), since IGBTs can be turned off.

Or maybe I'll make an induction ring launcher or something.

Finding uses for neodymium magnets

2019-09-04T00:00:00Z

I was decommissioning a few of my older computers and servers, stripping them down for parts. The hard drives in them were mostly IDE drives which had long stopped working. There are almost no useful parts you can strip from a non-functional hard drive except maybe the IDE connector and of course, the extremely strong rare-earth magnets.

But what could I do with them? Here are some things I did and photographed:

Book light that holds on with magnets
Converting a Star Trek combadge from using pins to magnets
Sticking an amplifier to the bottom of my desk

Book light

I read a lot of books, not all of them during the day. An easy solution for reading books in bed is to have a lamp, but sometimes you want a more portable solution. Who am I kidding, this is just a flimsy excuse to use some magnets.

I got the battery pack from a small torch, a PCI card filler from a Sun workstation, and a light from a torch built into a radio I got from the Science Museum in London 15 years ago. After a few generous applications of hot glue and solder, this is what I ended up with:

In that picture, it's attached to a book using two magnets. One glued to the light itself and the other inside the book. The idea is that once you get past halfway in the book, you switch the side the light is on. And there's no need to worry about the light falling out, the magnets can lift several kilos.

Here's one of me reading Dune:

Converting a combadge

Here's the product I have at the moment:

I got it at Destination Star Trek in Birmingham a couple of years ago. It's a good badge, but you have to actually poke holes in your shirt to wear it. This is suboptimal. An improvement would be to have one magnet glued to the badge and one you put inside your shirt.

I cut the pins and filed the stumps down, applied some hot glue and the magnet and voila! A magnetic combadge!

You may wonder why I used hot glue instead of something else, like superglue. The answer is that hot glue is very forgiving and not too strong: you can easily remove it with a heat gun and some pliers.

Below-desk amplifier

Recently I've been trying to design and build some amplifiers for various speakers. I have a class B amplifier in the works, but all of the amplifiers I have managed to bias correctly and so on are class A.

Currently, I'm using a class A common emitter amplifier with some 4 ohm speakers I found. A problem I had is that I kept knocking it off my desk, I needed to either stick it down or put it under my desk. Using magnets, I managed to do both:

If you're interested, each speaker is driven by a 2N2222 transistor, with just a base-collector bias resistor (not a very good or complex design, I know, I know) and a 5 ohm 5 watt power resistor in series with the speaker so we don't exceed the current limits of the transistor. I did that once and my transistors let out their magic smoke, I decided it was better to live with inefficiency than have nothing at all. Some bigger transistors are coming in the post though.

I'm looking to design some more complex amplifiers and use them with random speakers from eBay - some huge speakers go for very cheap with no bids, it's very bizarre. Stay tuned. Or maybe watch the local news for an article about a man killed by overpressure from dodgy speakers.

Other magnet fun

I've also repaired fridge magnets, made a screw holder, magnetised some screwdrivers and more! But most of those applications are too mundane to write a blog post about.

Modding a Sun Ultra 45 fan module

2019-08-14T00:00:00Z

I have a Sun Ultra 45, the last and most powerful Sun SPARC workstation. Even though mine doesn't have both of the two CPU slots filled, the fans are still really loud. Is there a way to control the fans? How could I slow them down?

I first tried to find a software solution on OpenBSD and later had to resort to soldering and adding some resistors to the fans to slow them down.

The Problem

To understand the problem, you just need to know how many fans there are. There's a fan on the CPU heatsink: this one is fairly quiet and I have no problems with it. The loud fans are the three huge monster fans between the motherboard and the hard drive/disc drive bays.

Here's a picture of the three fan module removed with various crap around for scale:

Those fans are huge, they take 12V DC and draw 1.3 Amps! That's a power drain of:

$$3 \times 12 \text{V} \times 1.3 \text{A} = 46.8 \text{W}$$

50W just for fans? No wonder they sound like jet engines!

Software fan control

The first port of call is a software solution: can we use a fan control program to slow the fans? The support for fan control on OpenBSD is already sparse, even on amd64, so this was unlikely.

On amd64, there is generally good fan sensor support: you can view how fast your fans are going. I tried to get this working on sparc64, but couldn't, so I'd have to content myself with knowing my fans were "loud".

If I couldn't control the fans in OpenBSD, could I control them in Solaris? I took a closer look at the fan module to see where it connected to the backplane:

Using a 12V bench power supply and some crocodile clips, I tested the connections to make sure the wire colours were true to what I knew. They were. Ground was black, 12V was yellow. The other two were clearly fan speed control pins: pulse width modulation (full voltage but pulsed at different speeds to control fan speed) and voltage control (varying voltage to control speed). There are 3 blocks of 4 pins on the connector, one block for each fan. The top two pins are ground and 12V, the bottom two are for speed control.

It seems to me like it might be possible to use PWM to control fan speed, but not out of the box with OpenBSD. I didn't want to try Solaris since every time an Oracle logo appears on my screen, a puppy dies. Next!

Hardware fan "control"

There is one other way to reduce fan speed: reduce the voltage across the fan. The current draw of the fan varies so ideally we'd have a voltage regulator, but I didn't have any on hand.

A workable candidate would've been something like a 7806 regulator to halve the fan voltage to 6V. The 78xx range of linear voltage regulators can provide up to 1.5A of current, so that'd be more than enough. How much is "enough" exactly? Modelling the fan as a resistor (sue me):

$$R = \frac{12 \text{V}}{1.3 \text{A}} = 9.2 \Omega$$

So the current draw at 6V would be:

$$I = \frac{6 \text{V}}{9.2 \Omega} = 0.65 \text{A}$$

We could even get away with the low power versions of the 78xx regulators at that sort of current. The issue is, of course, that I don't have any spare voltage regulators lying around!

My other option was using some power resistors I had lying around. From the earlier calculation, we know the fan is something like a 10 ohm resistor. So to halve the speed I added one 10 ohm resistor in series with each fan. Problem solved.

Power dissipation

Resistors get hot. How hot? To work this out, we need to know the power being dissipated by the resistor. Voltage splits in proportion to resistance, so we can say there are 6V across the resistor, so:

$$P = \frac{(6 \text{V})²}{10 \Omega} = 3.6 \text{W}$$

Kind of a lot for a resistor, certainly a 1/4 W resistor wouldn't cope. But the resistors I had lying around were 10W, so no problem.

How do we deal with the heat? Well, we are powering fans, so I just thought I could dangle the resistors in front of the fans and cut some holes in the plastic so everything lines up nicely. If that goes badly for me, I'll certainly write a post about it.

Does it work?

So far, yes. Here's a picture of the final fan module:

I had to cut away some of the plastic for the resistors to fit. A more elegant solution was probably possible, but I'm not going to sell this Ultra 45, so I'm fine with it being a bit damaged aesthetically.

Like I said, the dream solution would be to provide the fans with a regulated 6V, perhaps with a 78xx regulator. As long as the fans work ok with my hack job solution, I probably won't end up going that way though.

Hacking into a Sky router

2019-08-06T00:00:00Z

Like everyone, I have a ton of old routers lying around. It pains me to see these very useful computers go to waste, so I made it my business to hack into all of mine and replace the firmware. Maybe the title is a bit dramatic, but it's technically accurate.

My first target was an old Sky router, a Sagemcom F@ST2504n.

Which OS?

The obvious choice is OpenWRT since they have a page with easy instructions to get started.

There is one issue though, the pictures on that page don't work! This wasn't too much of an issue, since the pads on the PCB were all labelled so I was able to solder without incident.

Getting a console

I mentioned soldering. This is needed because there is no way to access the bootloader to flash the firmware without getting a console. There isn't a serial port on the case, but there are pads on the PCB conspicuously labelled VCC, Tx, Rx and GND. This is where you can solder some wires and connect your favourite 3.3V TTL UART. I used an FT232, but I'm sure others work too.

Interestingly, if you set your UART to 5V, you can actually power the whole router from the serial port! I discovered this by accident, I had my FT232 set to 5V and connected it without having power connected to the router. Rather spookily, the router powered on without any power. But this is possibly unsafe, don't do it! Who knows how much current the router needs, it's best to use the barrel jack and a real power supply!

Anyway, after changing to 3.3V, I got a console, here is what it looks like:

CFE version 5.14.7 for BCM96362 (32bit,SP,BE)
Build Date: Tue Mar 29 15:03:07 CST 2011 (zouchenbo@SZ01007.DONGGUAN.CN)
Copyright (C) 2005-2010 SAGEM Corporation.

HS Serial flash device: name MX25L64, id 0xc217 size 8192KB
Total Flash size: 8192K with 128 sectors
Chip ID: BCM6362B0, MIPS: 384MHz, DDR: 320MHz, Bus: 160MHz
Main Thread: TP0
Memory Test Passed
Total Memory: 67108864 bytes (64MB)
Boot Address: 0xb8000000

Board IP address                  : 192.168.1.1:ffffff00
Host IP address                   : 192.168.1.100
Gateway IP address                :
Run from flash/host (f/h)         : f
Default host run file name        : vmlinux
Default host flash file name      : bcm963xx_fs_kernel
Boot delay (0-9 seconds)          : 1
Board Id (0-2)                    : F@ST2504n
Number of MAC Addresses (1-32)    : 11
Base MAC Address                  : 7c:03:4c:ad:19:f6
PSI Size (1-64) KBytes            : 40
Enable Backup PSI [0|1]           : 0
System Log Size (0-256) KBytes    : 0
Main Thread Number [0|1]          : 0

*** Press any key to stop auto run (1 seconds) ***
Auto run second count down: 1
web info: Waiting for connection on socket 0.
CFE>

Some interesting info there, but we actually don't care about any of it except the prompt telling us to stop the auto boot. Mash enter or something and you'll end up at the CFE prompt.

I wasn't able to get minicom to work with this console, I couldn't write anything, leading me to waste some time checking my soldering. I switched to cu (from the cu package on Debian, it's in base on OpenBSD), with the recommended 115200 8N1 settings and it worked as above. I don't really know how to use minicom, so it was probably just user error.

Flashing the firmware

From here, you can follow the instructions. For a TFTP server, I used the one from the tftpd-hpa package on Debian. The one in the base system on OpenBSD also works. Here is what I did, I erased the flash then wrote the image from the TFTP server:

CFE> e a
Erase all flash (except bootrom)? (y/n):y
...............................................................................................................................

Resetting board...HELO
CPUI
L1CI
HELO
CPUI
L1CI
DRAM
----
PHYS
ZQDN
PHYE
DINT
LASY
USYN
MSYN
LMBE
PASS
----
ZBSS
CODE
DATA
L12F
MAIN


CFE version 5.14.7 for BCM96362 (32bit,SP,BE)
Build Date: Tue Mar 29 15:03:07 CST 2011 (zouchenbo@SZ01007.DONGGUAN.CN)
Copyright (C) 2005-2010 SAGEM Corporation.

HS Serial flash device: name MX25L64, id 0xc217 size 8192KB
Total Flash size: 8192K with 128 sectors
Chip ID: BCM6362B0, MIPS: 384MHz, DDR: 320MHz, Bus: 160MHz
Main Thread: TP0
Memory Test Passed
Total Memory: 67108864 bytes (64MB)
Boot Address: 0xb8000000

** Flash image not found. **

Board IP address                  : 192.168.1.1:ffffff00
Host IP address                   : 192.168.1.100
Gateway IP address                :
Run from flash/host (f/h)         : f
Default host run file name        : vmlinux
Default host flash file name      : bcm963xx_fs_kernel
Boot delay (0-9 seconds)          : 1
Board Id (0-2)                    : F@ST2504n
Number of MAC Addresses (1-32)    : 11
Base MAC Address                  : 7c:03:4c:ad:19:f6
PSI Size (1-64) KBytes            : 40
Enable Backup PSI [0|1]           : 0
System Log Size (0-256) KBytes    : 0
Main Thread Number [0|1]          : 0

web info: Waiting for connection on socket 0.
CFE> f 192.168.1.4:/home/kaashif/firm.bin
Loading 192.168.1.4:/home/kaashif/firm.bin ...
Finished loading 3932164 bytes

Flashing root file system and kernel at 0xb8010000: .............................................................


Flashing File Tag....
*** Image flash done *** !

192.168.1.4 is the address of my laptop, I connected to the router directly with an Ethernet cable. Now (after waiting for it to boot) you end up at the OpenWRT shell:

BusyBox v1.28.3 () built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 18.06.0-rc2, r7141-e4d0ee5af5
 -----------------------------------------------------
=== WARNING! ======�================
There is no root password defined on this device!
Use the "passwd" command to set up a new password
in order to prevent unauthorized SSH logins.
--------------------------------------------------
root@OpenWrt:/# uname -a
Linux OpenWrt 4.9.111 #0 SMP Sat Jul 14 13:48:14 2018 mips GNU/Linux

Fantastic. That's one router set free, I'll do the rest (and not write posts about them, since I imagine they'll all be the same as this one).

One last thing, here's a picture of the router in action, with the serial terminal connected:

My first homebrew computer!

2019-07-10T00:00:00Z

I've always wanted to design and build my own homebrew computer. By this, I mean buying some ICs and soldering together something like an Apple I or a ZX Spectrum.

This post isn't going to be about my struggles designing a homebrew PC, since I haven't done that. Instead, I bought a new Z80 homebrew kit, the RC 2014. That's right, there is still someone out there making these kits.

Why the Z80?

There are a few reasons I picked a Z80 as my first homebrew. The first is that you can buy a Z80 new from any reputable supplier: Zilog is still in business and still makes them new! You can just log onto DigiKey or whatever and add it to your basket.

The other is that you don't have to breadboard it: you can buy the RC2014 kit I linked to earlier, you just have to solder it together. I admit it's been a while since I've soldered anything but after getting this kit working, I can feel a desire to get the stripboard out and design some peripherals. So soldering is not a turn-off for me. This means you could theoretically build an entire Z80 computer without touching any wires (except the serial cable).

The other other reason is that there is an active community not just around the Z80 but around the RC2014 kit specifically. There are tons of modules people have designed and some you can even buy from them. And the creator of the RC2014 has a few available including an HDMI output, keyboard input, and more.

The build

I ordered online and I got my kit in the post. What did I get? A nice set of instructions, all of the resistors, capacitors, buttons and headers you need (no spares) and some PCBs. All as expected.

The instructions are very easy to follow and even include some troubleshooting tips.

The soldering iron I use is a cheap Chinese clone of a Hakko. It's temperature controlled and works with Hakko tips, so I would recommend it. It only cost 30 GBP and served me well for the (admittedly very small) amount of soldering I have done so far. No doubt if I get serious I'll end up having to buy something better. I would link it, but I am entirely unqualified to recommend soldering irons, so I won't. That's not the point of this post anyway.

I put the reset button I was given on the clock board and also added my own reset button (and 2k2 resistor) on the backplane. This is just because I found the clock board button hard to press without feeling like I was bending pins.

I chose not to use the barrel jack initially, instead opting to use power over the FTDI cable. This meant I had to put the jumper on the serial board to connect the 5V line from the FTDI cable.

And a side note: I didn't solder the right angle headers onto the ROM board very well, so it "leans back" and touches the serial board. I just used a folded up piece of card as a spacer to stop any shorts.

Problems

I connected my serial cable, and hit the reset button: nothing happened. Yes, my serial settings were correct and the cable was in the right way. Now what?

I don't have an oscilloscope capable of reading a 7 MHz signal, I just have a crappy handheld one that does maybe 200 kHz. This meant that I was restricted to the instructions for troubleshooting with a multimeter. Those instructions are very comprehensive by the way and did end up solving my problem.

I checked connectivity of all the lines: all good. No solder bridges either. But then I noticed that the voltage from the 5V line to ground was actually 3.3V. This should've been my first clue that my cheap Chinese FTDI cable was junk. Instead, I soldered on the barrel jack, moved the jumper so the board would take power from the jack instead of the FTDI cable, and moved on. All 5V lines were actually 5V now, at least.

The next step was building the signal tracer circuit on that troubleshooting page. As I understand it, this is just a spin on a differentiator circuit, since it gives a voltage only when the input voltage is changing. i.e. the output is related in some way to the derivative of the input voltage with respect to time.

Here is a photo of my breadboarded signal tracer:

It worked like a charm and gave me some very useful info:

There was a clock signal (good)
Activity on the address lines, so the CPU was running a program
Activity on the data lines, so the RAM and ROM probably worked (this wasn't conclusive, but they actually did work, so I never needed to check them in detail)

And here was the kicker that took me an age to notice: there was activity on the TX_Data pin of the serial communications controller (the MC68B50CP, on the serial board) but only if the FTDI cable was not connected. So the whole thing worked except the serial board!

I tried a different cable (a generic "Raspberry Pi debug cable", really a USB to TTL serial adapter) to no avail. Distraught, I emailed the RC2014-Z80 Google group and got some very good advice from the designer of the RC2014 himself, Spencer Owen. One of the things he wrote was that it was probably the serial cable not being sensitive enough - I should lower the values of the resistors on the TX and RX lines or just short them.

I shorted them and still my FTDI cable wasn't working! But then I tried my other generic Chinese cable and it worked!

The finished result

Here it is:

The resistor on the serial board is sticking out in an ugly way since this was while I was still in the middle of trying out different resistor values to see if I could get away with not shorting the connctions. Right now I just have them shorted and the serial controller gets a bit warm, which is probably not desirable.

Future steps

I look forward to writing some Z80 assembly programs, playing some BASIC games (including my favourite: Star Trek!) and designing some peripherals to use.

And if you're reading this, maybe you should check out the RC2014, it's a great kit for beginners (like me).

Containerizing my transcript search app

2019-06-07T00:00:00Z

Until recently, my transcript search web app was running (at https://transcripts.kaashif.co.uk, check it out) in a tmux session, with a PostgreSQL server running on the same machine in the usual way, as a daemon.

The web app knows nothing about its dependency on the database, this information is not recorded anywhere except in the code itself. And the database knows nothing about the web app. This isn't a huge problem except the database has a config of its own which isn't recorded anywhere in the source repo. If you try to get the web app to work with a misconfigured database, it won't work, of course.

Wouldn't it be nice if all of that configuration were in one place? And if the services all restarted themselves if they failed? And if you could migrate the entire blob of interconnected web apps and databases to a different machine with a single command?

That's where Docker comes in!

What problem are we trying to solve?

I was thinking of migrating VPS providers, and I realised that my strategy of running my web app in a tmux window was poor. All of the configs are spread out over the system, the database has been configured to run a certain way and I have a certain way of running the web app which isn't even recorded in any script anywhere. This means someone trying to host the app on their own would have to do a lot of guesswork or rely on me to tell them what to do.

There's also a lack of fault tolerance: if the web app crashes, I have no idea and it certainly doesn't restart itself. If the database crashes, I similarly have no idea. It would be nice if we could get some auto-restarting behaviour.

Which tools will we use?

The solution I chose was Docker Compose. In the words of that page I just linked:

Compose is a tool for defining and running multi-container Docker applications.

In our case, one container is the web app and one the database. This is a very common setup and is probably exactly the use case Docker Compose was designed for. This is apparent if you go through the examples in the docs.

For those who don't know what Docker is, here is a helpful explanation:

Enterprise Container Platform for High-Velocity Innovation: Securely build, share and run any application, anywhere

-- https://www.docker.com/

Just kidding, that's weapons-grade business nonsense. A container is essentially a really lightweight copy of your machine where exactly one program is running. It shares the kernel and hardware, but has its own network and bundles its own filesystem (with all its dependencies). Here is an actually good explanation.

What does the end result look like?

Rather than describing the long and tedious process of trying to get everything to work and learning how to use Docker, let my just show you the end result.

The git repo for my web app looks like:

.
|-- build.sh
|-- cli
|-- data
|-- docker-compose.yml
|-- Dockerfile
|-- lib
|-- LICENSE
|-- make_pretty.sh
|-- README.md
|-- stack.yaml
|-- transcripts
|-- transcript-search.cabal
`-- web

Assuming you have the transcript parser built (just install Haskell Stack and run stack build --copy-bins, which will get the GHC compiler, all dependencies, build and copy the binaries to the right place), we only need to focus on:

data: contains the transcript data, ready to be loaded into the SQL database
docker-compose.yml: defines the relationships between the containers and what the names of the images used are
Dockerfile: defines our web app container

The data directory

This is produced using transcript-parse, the Swiss Army Knife of sci-fi TV show transcript parsers. A small niche, but a very important one. There are only two files here:

transcripts.tsv, which is a tab-separated values file with the entire transcript database. This is the biggest file, at 72 MB. Not quite big data yet.

load_data.sh, which the database container will pick up (more on this later) and use to load the TSV file.

docker-compose.yml

The docs for Docker Compose aren't bad, you should check them out if you want to see some more examples. My code is essentially just another example to add to the list. They actually don't have a Python Flask web app as one, so maybe my code will be instructive.

Here is the entire file:

version: '3.1'

services:
  web:
    depends_on:
      - postgres
    build: .
    ports:
      - "1234:8000"
    environment:
      DB_HOST: postgres
      DB_PASS: [redacted]
    restart: always
  postgres:
    image: postgres:11
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: transcripts
      POSTGRES_PASSWORD: [redacted]
      POSTGRES_DB: postgres
    volumes:
      - ./data:/docker-entrypoint-initdb.d
    restart: always

For a full reference of what all of these keywords mean, check out the compose file reference: https://docs.docker.com/compose/compose-file/.

For now, I'll just focus on how it solves my problems.

The full relationship between the services is defined in one file. This is great, since it means there is no need to trawl through the code to see where the database connections are made.
restart: always! So if the web app crashes, it will just restart and the database won't know. If the database crashes, it will restart. Maybe the web app will crash too, but then it will restart and at some point the system will be working again.
depends_on means the services get started in the correct order: database then web app. This means we don't have to care about waiting for the server to come up in the web app code, we can just assume it's always up. Docker Compose handles enforcing this condition for us.

A technical note: when the database container is started for the first time, it mounts the data directory into its filesystem at a mount point with a special name. The Postgres image has a script somewhere that scans this directory looking for scripts to run. There is a lot of complexity hidden inside the prebuilt Postgres image, but we do not need to worry about any of this.

Our custom container definition

This is very simple, we just need something to run a Flask app. I picked Gunicorn, a decent enough WSGI HTTP server for Python apps. Here's the entire Dockerfile:

FROM ubuntu:18.04

MAINTAINER Kaashif Hymabaccus "kaashif@kaashif.co.uk"

RUN apt-get update && \
    apt-get install -y gunicorn3 python3-flask python3-psycopg2

COPY ./web /web
WORKDIR /web

EXPOSE 8000
ENTRYPOINT [ "gunicorn3" ]
CMD [ "-b", "0.0.0.0:8000", "app:app" ]

Building the container just involves copying the scripts and assorted goodies (in the web directory) into the container. Running the container just runs the web app.

Getting this up and running

$ docker-compose start

No, really, it's that easy! This will build the containers and start them in the right order for you. With full fault-tolerance, isolation and so on. Feels too easy.

Conclusion

Docker is great and I hope to use it in more projects in the future. Maybe at some point I'll make it big and have to delve into the world of Kubernetes and Docker Swarm - orchestration on a larger scale.

Until then, I'm happy with this small-scale success and I highly encourage you to try containerizing your web apps.

Happy hacking!

HP PA-RISC Assembly Crash Course

2019-04-18T00:00:00Z

Since I have access to a machine that has the PA-RISC architecture, I thought I'd compile some test programs and see what sort of assembly code produced. Some highlights:

A neat way to manage the stack pointer (and one surprise)
Every instruction seems to be shorthand for or
Completers - a weird way of giving switches to your instructions

PA-RISC is considerably less popular than x86, MIPS, PowerPC, even SPARC. And being a RISC architecture means that humans hardly ever wrote assembly for it themselves. Most of the time, programmers probably never even gave (past tense since PA-RISC is dead) their binaries a second glance. Or really even any kind of look.

Well, that's about to change! The first program we'll look at is, of course, hello world.

Hello, world!

I wanted to compile, as a showcase, a program with some nontrivialities. That means function calls, string literals and non-leaf and leaf functions (functions that do/don't call other functions). This will hopefully let us discover the quirks of PA-RISC in a controlled environment. Here is the first one, which has a leaf function call with no arguments and a function call with an argument:

#include <stdio.h>

int f() {
        return 0;
}

int main() {
        printf("Hello, world!\n");
        return f();
}

And here is the binary, compiled with gcc -O0 -g test.c and dumped with objdump -S:

000105a8 <f>:
#include <stdio.h>

int f() {
   105a8:       08 03 02 41     copy r3,r1
   105ac:       08 1e 02 43     copy sp,r3
   105b0:       6f c1 00 80     stw,ma r1,40(sp)
        return 0;
   105b4:       34 1c 00 00     ldi 0,ret0
}
   105b8:       34 7e 00 80     ldo 40(r3),sp
   105bc:       4f c3 3f 81     ldw,mb -40(sp),r3
   105c0:       e8 40 c0 02     bv,n r0(rp)

000105c4 <main>:

int main() {
   105c4:       6b c2 3f d9     stw rp,-14(sp)
   105c8:       08 03 02 41     copy r3,r1
   105cc:       08 1e 02 43     copy sp,r3
   105d0:       6f c1 00 80     stw,ma r1,40(sp)
        printf("Hello, world!\n");
   105d4:       23 88 10 00     ldil L%10800,ret0
   105d8:       37 9a 01 b0     ldo d8(ret0),r26
   105dc:       e8 5f 1a ed     b,l 10358 <_end_init+0x14>,rp
   105e0:       08 00 02 40     nop
        return f();
   105e4:       e8 5f 1f 7d     b,l 105a8 <f>,rp
   105e8:       08 00 02 40     nop
}
   105ec:       48 62 3f d9     ldw -14(r3),rp
   105f0:       34 7e 00 80     ldo 40(r3),sp
   105f4:       4f c3 3f 81     ldw,mb -40(sp),r3
   105f8:       e8 40 c0 02     bv,n r0(rp)

Only the relevant part is included. Now, we have to go through the PA-RISC ISA Reference Manual and decipher what all of this means.

Note that there are some examples of C programs and resulting assembly in that manual, but they aren't explained too much since the manual is supposed to be a reference, not a beginner's guide. It's also a bit long, over 400 pages.

Also, I have no idea how to get my hands on the C compilers and assemblers they used, so I can't verify any of their examples. Moving on.

Registers

All registers are 64 bits wide on PA-RISC 2.0 CPUs (like the one I have).

If you recall my article about SPARC assembly, you'll notice that it's almost entirely about register windows and related coolness. There is no such magic on PA-RISC, it is rather similar to x86 in that respect. That is, there are just a number of registers and you have to just remember what they're for.

Luckily, there are some helpful synonyms on page 28 of the manual. Here are the important ones:

ret0 is r28, the return value. This is set when a function wants to return something, as we will see.
sp is r30, the stack pointer. There is something weird about how this is used in the above code which I'll go over later. Can you guess what it is?
rp is r2, the return link. This is the return link.

Next comes the argument convention, which is a bit odd: r26 is arg0, r25 is arg1, r24 is arg2, r23 is arg3. Yes, it's numbered backwards for some unusual reason.

Now we can get started deciphering the code.

Which way does your stack grow?

On x86, the stack usually grows downwards. This means if you are at address 10 and you need more space, you, by convention, decrease the stack pointer (move it towards zero). The heap starts at the bottom and grows up. It's the same way on SPARC, PowerPC, MIPS and so on.

memory addresses growing -->
+---------------------------------------------------------------+
| heap grows -->                               <-- stack grows  |
+---------------------------------------------------------------+

On PA-RISC, somehow the convention is the opposite - you increase the stack pointer to allocate more memory. The heap starts at the top and grows down.

memory addresses growing -->
+---------------------------------------------------------------+
| stack grows -->                               <-- heap grows  |
+---------------------------------------------------------------+

This doesn't really matter and isn't a cool feature in any way. It's an interesting difference from the norm, though.

A leaf function

Leaf functions are simple, since we don't have to worry about setting up the registers for callees, we can just try our best to avoid messing things up for the caller and we're good.

int f() {
   105a8:       08 03 02 41     copy r3,r1
   105ac:       08 1e 02 43     copy sp,r3
   105b0:       6f c1 00 80     stw,ma r1,40(sp)

This is us saving the stack pointer. While copy may seem self-explanatory, it is actually a pseudo-operation, meaning the hardware doesn't know about it. Instead, copy x,y is shorthand for or x,0,y, which ors x with 0 and stores it in y.

stw,ma r1,40(sp) stores the value of the register r1 at sp+40. Note that we have the x86-like memory address addition syntax. We can't do multiplications, though, so there is no shortcut to accessing arrays like on x86, where you can write 5*eax+2 into a mov instruction. The stw instruction means "store word", fairly self explanatory. But what does ,ma mean?

In some PA-RISC instructions, there are two bits labeled m and a. If you use the completer (what the ,ma or ,mb part is called), then this sets them in certain ways. What exactly this means varies for each instruction.

In our case, ,ma means "modify after". This is referring to modifying the base address before/after we calculate the offset. Modify after means our offset is just the base, then we add the displacement to the base (actually modifying the base register). ,mb or modify before computes the base + displacement and uses this as both the final effective address and the value to write into the base register.

There's a diagram on page 113 of the manual.

This might seem like a pain, but this is essentially designed to make stack pointer manipulation a breeze: using modify before/after, the stack pointer can manage itself!

In this case, ,ma means the stack pointer is updated essentially automatically after we save r1.

Next, we need to return:

return 0;
   105b4:       34 1c 00 00     ldi 0,ret0

Again, this seems self explanatory: load 0 into ret0, right? But no, there is a little more going on here. The "instruction" ldi i,r (load immediate) is actually a pseudo-operation that generates an instruction ldo i(0),r. ldo d(b),t is the load offset instruction, which calculates the offset given by the expression d(b) and loads this into t.

In our case, ldi 0,ret0 calculates the offset 0(0), which is 0, and loads this into ret0. Due to the instruction encoding requiring all instructions to be 32 bits long (a common design decision in RISC architectures), the immediate d is limited to 14 bits in length.

   105b8:       34 7e 00 80     ldo 40(r3),sp
   105bc:       4f c3 3f 81     ldw,mb -40(sp),r3
   105c0:       e8 40 c0 02     bv,n r0(rp)

This loads 40+r3 into sp, then uses the ldw,mb pseudo-instruction to pop a value off the stack (updating the stack pointer appropriately) into r3. You'll notice that this is value we saved earlier. This is because r3 is callee-saved.

Also, r1 is caller-saved, so we don't have to worry about restoring it. That wasn't really that bad, right?

The main course

The main function showcases two features: calling a non-leaf function (printf) and calling a leaf function (f). Here we go:

000105c4 <main>:

int main() {
   105c4:       6b c2 3f d9     stw rp,-14(sp)
   105c8:       08 03 02 41     copy r3,r1
   105cc:       08 1e 02 43     copy sp,r3
   105d0:       6f c1 00 80     stw,ma r1,40(sp)

Again, we save r3 and update the stack appropriately.

printf("Hello, world!\n");
   105d4:       23 88 10 00     ldil L%10800,ret0
   105d8:       37 9a 01 b0     ldo d8(ret0),r26
   105dc:       e8 5f 1a ed     b,l 10358 <_end_init+0x14>,rp
   105e0:       08 00 02 40     nop

Here's the juicy bit. The string is stored in the data segment, so we use the ldil instruction to "load immediate into left". This means we load the immediate (some pointer into the data segment) into the left part of the ret0 register. The left part, in this case, is 32 bits long.

Next, we write the address of the string (imagine it's a char *) to r26, which is arg0, the first argument of printf.

The branch and link b,l instruction branches (i.e. unconditionally jumps to the address given) but also places the return point into the register rp, the link register.

The delay slot is an instruction that is executed before the branch/jump happens. In this case it's a nop, so nothing happens. But there is more to this nop than meets the eye: it's a pseudo-instruction! It really means or 0,0,0, which is a nop since nothing is changed.

        return f();
   105e4:       e8 5f 1f 7d     b,l 105a8 <f>,rp
   105e8:       08 00 02 40     nop
}

Using the branch and link instruction, it's very easy to call f. It sets ret0, so no need to set it ourselves. Now there's only one thing left to do...

   105ec:       48 62 3f d9     ldw -14(r3),rp
   105f0:       34 7e 00 80     ldo 40(r3),sp
   105f4:       4f c3 3f 81     ldw,mb -40(sp),r3
   105f8:       e8 40 c0 02     bv,n r0(rp)

We restore r3 and sp, the only caller-saved registers! There is a new instruction here, though, bv. This is a vectored branch, which sounds interesting. In actual fact, bv,n x(b) just means that we jump to b added to x left shifted by 3 bits.

That's a full program in PA-RISC assembly!

Conclusions

There are some commonalities with both x86 and SPARC.

SPARC:

Link registers
Everything is a pseudo-instruction
Delay slots

x86:

Two operand instructions
immediate(register) syntax, although no multiplications
Lots of arithmetic is done using instructions supposedly meant for calculating addresses.

Overally, I would say that PA-RISC isn't really that cool of an architecture at first glance. It doesn't have anything extra exciting like SPARC's register windows except completers maybe, but those are more confusing than anything.

There's probably tons I've missed out, but I have a feeling that there won't be hordes of HP aficionados chasing me down.

Reviving an HP PA-RISC server

2019-04-13T00:00:00Z

A while ago, I got my hands on a beast of a machine, a 7U HP L3000 (rp5470) PA-RISC server. These were released in the year 2000 and came with up to 16GB (whoa) of RAM and up to 4 CPUs.

The best site for information on PA-RISC machines is, no doubt, OpenPA.net, and they have a fantastic page on my machine.

This is the story of how I managed to install Gentoo GNU/Linux on this classic UNIX server.

OK, maybe classic is too strong a word, but it is a fairly unique machine. I've written a few posts about SPARC machines and PA-RISC is in the same vein - a RISC CPU architecture with machines and OS both sold by a single vendor. In this case, the vendor is HP, the CPU is has the PA-RISC architecture, and the OS is (or was) HP-UX.

I don't have any disks of HP-UX around and HP doesn't provide them on their website. Oracle (!!) provides Solaris freely, but I had no such luck with HP. Using OpenPA.net to work out which OSs were compatible with my L3000, I eventually settled on Gentoo GNU/Linux.

The Guardian Service Processor

Similarly to most enterprise servers, there is a kind of service processor that you can connect to the network and use to access the console, administrate the server, etc, without powering it on.

HP calls it the Guardian Service Processor or GSP for short.

The first task for me was to reset the password. I had to open up the server and press the GSP reset button. I then connected the GSP to the network, determined its IP address, and was able to telnet in. This is what it looks like:

$ telnet gsp.gondolin
Trying 192.168.1.18...
Connected to gsp.gondolin.int.kaashif.co.uk.
Escape character is '^]'.

Service Processor login:
Service Processor password:




             Hewlett-Packard Guardian Service Processor

  (c) Copyright Hewlett-Packard Company 1999-2001.  All Rights Reserved.

                      System Name: gsp



*************************************************************************
                        GSP ACCESS IS NOT SECURE
   No GSP users are currently configured and remote access is enabled.
             Set up a user with a password (see SO command)
                                   OR
       Disable all types of remote access (see EL and ER commands)
*************************************************************************

You can just hit enter twice to login without a user or password, since those were reset.

Booting from a CD

Luckily, my server has a working CD drive, so I could download the Gentoo minimal installation CD (https://wiki.gentoo.org/wiki/Handbook:HPPA/Installation/Media), pop it in, and get started.

The first step is to power on the machine. To do this, I connected to the console through telnet, CTRL-E then CF to activate console write access, and CTRL-B to access the GSP prompt:

[Read only - use ^Ecf for console write access.]

[bumped user -  ]


Leaving Console Mode - you may lose write access.
When Console Mode returns, type ^Ecf to get console write access.

GSP Host Name:  gsp
GSP>

Now you can type he for a help menu. The list of commands is:

==== GSP Help ============================================(Administrator)===
AC  : Alert display Configuration       MS  : Modem Status
AR  : Automatic System Restart config.  PC  : Remote Power Control
CA  : Configure asynch/serial ports     PG  : PaGing parameter setup
CL  : Console Log- view console history PS  : Power management module Status
CO  : COnsole- return to console mode   RS  : Reset System through RST signal
CSP : Connect to remote Service Proc.   SDM : Set Display Mode (hex or text)
DC  : Default Configuration             SE  : SEssion- log into the system
DI  : DIsconnect remote or LAN console  SL  : Show Logs (chassis code buffer)
EL  : Enable/disable LAN/WEB access     SO  : Security options & access control
ER  : Enable/disable Remote/modem       SS  : System Status of proc. modules
EX  : Exit GSP and disconnect           TC  : Reset via Transfer of Control
HE  : Display HElp for menu or command  TE  : TEll- send a msg. to other users
IT  : Inactivity Timeout settings       VFP : Virtual Front Panel display
LC  : LAN configuration                 WHO : Display connected GSP users
LS  : LAN Status                        XD  : Diagnostics and/or Reset of GSP
MR  : Modem Reset                       XU  : Upgrade the GSP Firmware

====
(HE for main help, enter command name, or Q to quit)

The important command is PC, for remote power control. Turn the power switch on, then execute the PC command to turn it on.

If you're doing this for the first time, you'll need to follow the instructions in the Gentoo page to install Gentoo. That is out of the scope of this post.

The interesting parts of the boot process are the hardware detection:

Firmware Version  42.06

Duplex Console IO Dependent Code (IODC) revision 1

------------------------------------------------------------------------------
   (c) Copyright 1995-2000, Hewlett-Packard Company, All rights reserved
------------------------------------------------------------------------------

  Processor   Speed            State           CoProcessor State  Cache Size
  Number                                       State              Inst    Data
  ---------  --------   ---------------------  -----------------  ------------
      0      550  MHz   Active                 Functional         512 KB   1 MB

  Central Bus Speed (in MHz)  :        133
  Available Memory            :    2097152  KB
  Good Memory Required        :      25000  KB

   Primary boot path:    0/0/1/1.2
   Alternate boot path:  0/0/2/0.2
   Console path:         0/0/4/1.0
   Keyboard path:        0/0/4/0.0


Processor is booting from first available device.

To discontinue, press any key within 10 seconds.

10 seconds expired.
Proceeding...

Then Linux starts booting. There are a ton of errors, but somehow it boots up fine and gives me a login prompt.

Can you actually use it for anything?

It would be a really bad idea to use this server for anything real. It's huge, it's heavy, it has a slow CPU. There's not really anything special or revolutionary about the CPU architecture, as far as I can tell. I haven't measured it, but it probably uses a few thousand watts.

There aren't really any binary packages available for Gentoo, so you have to compile everything, which is a huge time sink. Debian might be a better choice in this regard.

Lets get onto the really important question:

Why not OpenBSD?

There's no support for hppa64! You may wonder, then, how did Linux get support? HP helped out. They supplied documentation and code, eventually leading to Debian and Gentoo being ported (and they both still work on hppa, to this day!).

OpenBSD has support for most workstations, 32 bit and 64 bit running in 32 bit mode. Server support is a bit lacking, but this is understandable given the lack of hardware and interest.

What does the machine code look like?

I'm glad you asked. After snooping around the binaries, I objdumped some interesting-looking ones. Here's /bin/sh:


/bin/sh:     file format elf32-hppa-linux


Disassembly of section .init:

000112e4 <.init>:
   112e4:       6b c2 3f d9     stw rp,-14(sp)
   112e8:       6f c4 00 80     stw,ma r4,40(sp)
   112ec:       6b d3 3f c1     stw r19,-20(sp)
   112f0:       e8 40 14 78     b,l 11d34 <_GLOBAL_OFFSET_TABLE_@@Base-0x19fa0>,rp
   112f4:       08 00 02 40     nop
   112f8:       e8 4b 0b a0     b,l 278d0 <_GLOBAL_OFFSET_TABLE_@@Base-0x4404>,rp
   112fc:       08 00 02 40     nop
   11300:       4b c2 3f 59     ldw -54(sp),rp
   11304:       08 04 02 53     copy r4,r19
   11308:       e8 40 c0 00     bv r0(rp)
   1130c:       4f c4 3f 81     ldw,mb -40(sp),r4

Disassembly of section .text:

00011310 <.text>:
   11310:       2b 60 00 00     addil L%0,dp,r1
   11314:       48 35 06 50     ldw 328(r1),r21
   11318:       ea a0 c0 00     bv r0(r21)
   1131c:       48 33 06 58     ldw 32c(r1),r19
   11320:       2b 60 00 00     addil L%0,dp,r1

Wow, I don't recognise any of those instructions!

Expect a blog post in the near future (less than a year) explaining some quirks of PA-RISC. I'm sure there's no shortage of weirdness and oddities... Maybe there'll even be a cool feature or two not found in modern processors.

If I ever get my hands on a working copy of HP-UX, expect a post about that, too.

Using PostgreSQL to search transcripts

2019-03-31T00:00:00Z

Remember my transcript search engine, https://transcripts.kaashif.co.uk?

I'm not a database expert, but even I realised that spending hours and hours trying to optimise my homebrew database transcript search engine was a waste of time. For no reason at all other than to try it out, I went with ElasticSearch. The astute reader will notice that ElasticSearch is meant for "big" data. My collection of transcripts tops out at few dozen megabytes at most - this is most certainly not big (or even medium-sized) data, really.

So after getting some real-world experience with SQL databases (at a real company) and taking an in-depth database algorithms course at university, I decided to convert my web app to use a PostgreSQL database to store the transcripts, searching with bog-standard SQL queries.

There were a couple of neat tricks I used to speed things up, too.

The ultimate optimisation

I wrote an article about how I optimised my transcript parser. Lots of great tips there, but the greatest optimisation of them all is not running the code at all. Why do we need any parsing done while the web app is running, anyway? Of course, we don't - the transcripts are static and unchanging, we can just pre-parse everything.

What happens now is the parser runs once and parses the transcripts into a table with these columns:

create table raw_transcripts (
  series text,
  season_number int,
  episode_number int,
  episode_title text,
  scene_number int,
  line_number int,
  person text,
  present text,
  place text,
  speech text);

The old data was nested: a series has episodes, an episode has scenes, each scene has lines. Scenes have the same people present and take place in a single place. So of course, there is a lot of redundancy introduced when we de-nest the transcript structure like this.

The key thing to notice is that the data is so small that none of this matters!

Instead of doing some parsing for each request, we now just do some database calls. An upside to this is that the Haskell code only runs once, offline, so it can be written for maximum readability rather than speed.

Essentially, the whole of that optimisation article is obsolete.

Expected workload

We need to know what the expected workload is so that we can organise our indices, primary keys, clustering, etc. This has a huge impact on performance.

In a typical session with the transcript search engine, a user will search for a fragment of speech, maybe with place, speaker, episode, series or season specified. Indexing on speech is essentially useless - a user will probably never want to search for a line of speech using exact equality. Since this is the most important column, indexing is basically useless for speeding up searches, we'll always have to do a full table scan.

While the other columns like season number and episode number seem mildly indexable, the reality is that a user will basically never use these. No-one remembers a line by which season it was in, they usually just remember some fragment of the line.

The most important optimisation here is related to the form the user expects the results in: the line of speech with some surrounding context.

Context is for kings

Our web app wants to make queries that look like:

SELECT * FROM transcripts WHERE speech ILIKE %somestring%;

As discussed before, this is always going to incur a full table scan, indices can't help.

Then we want to get the context:

SELECT * FROM transcripts
WHERE line_number >= x-2 AND line_number <= x+2;

But here is an opportunity to cluster the table. Why not order the lines on-disk according to their line numbers? Then we can simply look at the surrounding records, which have already probably been loaded into memory with the matching record - databases load records into memory in batches or pages.

To take advantage of this, I associate an ID to each line which is globally unique and orders each line in chronological (as they appear in each script) order.

An informed reader might notice that PostgreSQL doesn't really support clustered indices. This is true, but we can cluster the table on an index and, as long as the table never changes (this is true in our case), the clustering will be maintained. Inasmuch as never interacting with the order counts as "maintaining" it.

In PostgreSQL, this is easy, just add an id column as the PRIMARY KEY, and you get indexing automatically. We can cluster the table using CLUSTER transcripts ON id Since we are exclusively going to be doing very small range queries (maybe for 5 records max), we might want a hash index or something. But this never caused a performance issue, so I left it as-is, with the default B-Tree index.

End result

I switched out the Haskell web app stuff with a bog-standard Python web app written with Flask about a hundred lines long. I'd never want to write a parser in Python and I didn't really want to write a web app in Haskell, so this situation is much better.

Performance is through the roof compared to the old handrolled searching code and even the ElasticSearch engine (I'm sure it's fantastic for big data...).

What an effect a tiny bit of database knowhow can have!

Register windows: a cool feature of SPARC

2018-08-11T00:00:00Z

Everyone's studied x86 assembly (just objdump any program on your PC...) and maybe even some ARM or MIPS in a class somewhere, but there are a few features that exist in some CPUs that don't exist at all in any of these designs.

I'm talking about register windows! When you call a function on SPARC, the new function just magically gets its own registers neatly separated into input registers, output registers and local registers. You're allowed to mess up your local registers as much as you want and the CPU does all of the saving and swapping for you.

No more weird arbitrary calling conventions about r10 and r11 being caller-saved, rax being return, rqb being Cthulhu-saved, rpqwuqew being quantum entangled with r554 on Tuesdays...

History

You could just go to the Wikipedia article about these, there is some good info there. The basic rundown is that the idea of register windows originated with the Berkeley RISC design back in the first half of the 80s, then they were implemented in a few architectures of which SPARC is the only (barely) surviving example.

This post isn't supposed to be about history, it's supposed to be about actual nitty-gritty assembly code, so let's get to it.

How are the registers laid out?

Each window consists of 8 input registers (i0 to i7), 8 local registers (l0 to l7) and 8 output registers (o0 to o7).

There are also 8 global registers (g0 to g7) which are visible at all times. g0 is actually just 0 all the time and there is some spooky stuff going on with the others: g1 to g5 might change between caller and callee, so can't be used to pass parameters. g6 and g7 are reserved for OS use, so don't use them.

Then there's sp, the stack pointer, which is also global.

The registers are seen by functions like this:

input
local
output input
       local
       output input
              local
              output ...

This may be a bit confusing. By this diagram, I mean that if you are a function f and you call a function g, if g switches to the next window, g will "see" different registers. It will see your output registers as its input registers and it will have its own local and output registers.

This means a callee can clobber the caller's output registers (e.g. to return values), but cannot even see the caller's input and local registers. In fact, no other function can see your local registers if it's in a different register window.

This is nice because it's not like x86 where there is just a convention on which registers are for what: this differs between operating systems and no program really has to follow it. On SPARC there is an easy and powerful way for functions to have their own registers.

An actual program!

Let's write a program in C, here is the source code:

int g() {
        return 0;
}

int main(int argc, char *argv[]) {
        return g();
}

Compile with debug symbols and objdump it (I use -O1 because it gets rid of a lot of writing to memory that is pointless and unnecessary):

$ cc -g -O1 test.c
$ objdump -S a.out

There's a lot of output, here are the relevant bits:

0000000000000580 <g>:
int g() {
 580:   9c 03 bf 30     add  %sp, -208, %sp
        return 0;
}
 584:   90 10 20 00     clr  %o0
 588:   81 c3 e0 08     retl 
 58c:   9c 23 bf 30     sub  %sp, -208, %sp

0000000000000590 <main>:

int main(int argc, char *argv[]) {
 590:   9d e3 bf 30     save  %sp, -208, %sp
        return g();
 594:   40 08 01 7b     call  200b80 <g@plt>
 598:   01 00 00 00     nop 
}

Let's analyze this bit by bit, starting with the main function.

The `save` instruction

On SPARC, much like x86, the stack grows downwards. So if we want to grow the stack to give a new stack frame to our function, we want to subtract from the stack. At the start of our main function, we want to grow the stack by 208 bytes, so we want to subtract 208 from %sp.

We also want to move to a new register window, where we will be able to see our input parameters: argc will be i0 and argv will be i1. And of course, we'll have our own local and output registers.

This is exactly what this does:

 590:   9d e3 bf 30     save  %sp, -208, %sp

This is called "save" because it saves the previous register window, making it inaccessible unless we go back to the previous window.

We have a stack frame and register window. Now what?

Calling a function

Now we call some memory address where g is. This is similar to x86, it's really just a jump plus some convenience - you can return.

But where is the return address kept? On x86, it's kept on the stack. On SPARC (and many other RISC architectures you may be familiar with), it's stored in a link register.

When you call a function, the return address is written to your o7 register. So when the callee executes a save, it will be in its i7 register. No need to touch memory.

In fact, if you look it up in the SPARC V9 Manual, you'll find that call addr is literally synonymous with jmpl addr, %o7. jmpl means "jump and link" which writes the return address to the given register (in this case o7) then jumps to addr.

Why is there a nop? Delay slot!

This is due to something weird on SPARC known as the delay slot. When you do a branch, the branch doesn't happen right away, the CPU actually executes the instruction after the jmp or call or whatever, then branches. This means you can fill it with a useful instruction or just whack a nop in there if it's too confusing.

Inside g

We call g, then g executes this:

 580:   9c 03 bf 30     add  %sp, -208, %sp

This reserves some space on the stack without switching to a new register window. Notice that a new register window is not necessary since we do barely anything in g. In particular, we call no other functions, which means g is known as a "leaf" function.

Then we zero out the return value (we are returning 0):

 584:   90 10 20 00     clr  %o0

Notice that because we are not in another window, we just zero out our o0 which is the same as our caller's o0.

Now we need to return and we have a choice to make when we look up "return" in the manual: there are 2 return instructions, ret and retl. ret is for returning from functions that have gone to a new register window, it jumps to i7+8. retl is for leaf functions, it jumps to o7+8 (instructions are 8 bytes long, fixed). We're a leaf, so we use retl:

 588:   81 c3 e0 08     retl

Remember the delay slot! Before the branch happens, we get 1 instruction to do something. Let's use it to throw away our stack frame:

 58c:   9c 23 bf 30     sub  %sp, -208, %sp

Now the return value, 0 is in o0 and main can just leave it there and do nothing more to return 0.

Summary

There are a few mildly interesting things in this post you may not have seen before:

Register windows: stopping all of that confusing callee/caller-saved business
Leaf/non-leaf functions: you can still just not use register windows if you don't need them
Link registers: if you only stick to x86, having a register for the return address is a bit different
The delay slot: a quirk of SPARC, originating from the time when pipelines were simple and this let you save some stalling when a branch instruction comes along. Not really necessary nowadays (speculative execution in particular lets the processor just guess what's coming up). That's a whole 'nother blog post, though.

Credits

All of this information comes in part from my own experimentation and writing programs, but it all derives from the SPARC V9 Architecture Manual in the end. Props to Sun for writing some good documentation.

Porting OpenJK to sparc64

2018-01-13T00:00:00Z

It's a little known fact that there is actually no way in C or C++ to do an unaligned access without invoking undefined behaviour. It's true! Read it yourself here:

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned [...] for the referenced type, the behavior is undefined.

C11 (n1570) 6.3.2.3 p7

Sadly, the authors of many programs ignore this and rely on it working. Which it does, on x86, with little performance impact in most cases. On some architectures, like MIPS and PowerPC, unaligned access instructions exist but are slow. But on SPARC...unaligned access is impossible and leads to this:

$ openjk.sparc64
Bus error (core dumped)

Solving these issues with OpenJK is very difficult, especially considering Jedi Knight was never meant to run on SPARC (or indeed OpenBSD, but that's less of an issue).

Dealing with custom allocators

The Jedi Knight engine has a custom allocator, which is a bit of a pain considering it was written to RELY ON UNDEFINED BEHAVIOUR (read: uses unaligned reads). Let's examine the first SIGBUS we hit:

ZoneTailFromHeader(pMemory)->iMagic = ZONE_MAGIC;

This requires a bit of context. Currently, we are in the Z_Malloc function, with the signature:

void *Z_Malloc(int iSize, memtag_t eTag, qboolean bZeroit /* = qfalse */, int iUnusedAlign /* = 4 */)

This is a replacement for malloc, which takes a size, a tag saying what "sort" of memory this is, and a boolean saying whether to zero the memory.

pMemory is a pointer to a zoneHeader_t, which is this:

typedef struct zoneHeader_s
{
    int iMagic;
    memtag_t eTag;
    int iSize;
    struct zoneHeader_s *pNext;
    struct zoneHeader_s *pPrev;
} zoneHeader_t;

typedef struct
{
    int iMagic;

} zoneTail_t;

Why is this a problem? It's kind of subtle if you're not used to these kinds of bugs. We know that pMemory and all its members are aligned correctly: it was created earlier and initialised without any pointer casting or anything like that. The bug reveals itself when we look at ZoneTailFromHeader:

static inline zoneTail_t *ZoneTailFromHeader(zoneHeader_t *pHeader)
{
    return (zoneTail_t*) ( (char*)pHeader + sizeof(*pHeader) + pHeader->iSize );
}

So when we do this:

ZoneTailFromHeader(pMemory)->iMagic = ZONE_MAGIC;

What we're really doing is interpreting a pointer (pHeader) that was just cast to char * and had some values added to it. Interpreting pHeader as something (e.g. int * or char *) would actually be fine: malloc gives us memory satisfying the strictest alignment requirements and we just made pHeader with malloc, so that's OK.

The problem arises when we start adding stuff to pHeader. The result, which we interpret as a zoneTail_t, might actually not lie on any particular boundary. In this case, I was getting a SIGBUS when the engine tries to allocate a block of size 11. Obviously that won't end up on any boundary, so when we try to access an int on that misaligned boundary (iMagic is an int):

ZoneTailFromHeader(pMemory)->iMagic = ZONE_MAGIC;

We get a SIGBUS! Mystery solved.

The fix

The engine code in OpenJK (and the original code from Raven) is split into multiplayer and singleplayer. The mechanics of the game work differently in MP and there is no netcode in SP, so this makes some kind of sense. However, there is a huge downside: bugs which are noticed and fixed in one codebase might be overlooked in the other.

This isn't exactly what happened here, but it's close. In the SP code, in the SP version of Z_Malloc, I saw the following line:

    // Add in tracking info and round to a longword...  (ignore longword aligning now we're not using contiguous blocks)
    //
//  int iRealSize = (iSize + sizeof(zoneHeader_t) + sizeof(zoneTail_t) + 3) & 0xfffffffc;
    int iRealSize = (iSize + sizeof(zoneHeader_t) + sizeof(zoneTail_t));

This is interesting! What this does is round iRealSize (the size of the memory plus the zone metadata we add) up to the nearest multiple of 4. This doesn't quite work as written, since the zone layout is like this:

+--------+----------------------------------------------+------+
| header |       block of malloc'ed memory              | tail |
+--------+----------------------------------------------+------+

The SIGBUS comes when we access the tail. The end of the header is already aligned to a 4 byte boundary, just by looking at its members. The problem is the bit in the middle, which pushes the tail off of a 4 byte boundary.

We can solve this by rounding iSize:

// Round size of allocation up to multiple of 4 bytes
int iRoundedSize = (iSize + 3) & 0xfffffffc;
int iRealSize = (iRoundedSize + sizeof(zoneHeader_t) + sizeof(zoneTail_t));

Then replace iSize with iRoundedSize in the rest of the function.

Did it work?

Yes, kind of. There are a few other similar errors with the memory allocator. We also have some "static" memory blocks, where the entire zone layout is written into a struct, like:

#define DEF_STATIC(_char) {ZONE_MAGIC,TAG_STATIC,2,NULL,NULL},{_char,'\0'},{ZONE_MAGIC}

This was kind of hard to spot a priori, but after running the game under gdb, the SIGBUSes were easier to track down.

The fix here is just to add more NULLs so the ZONE_MAGIC isn't misaligned.

The game does run now. There were a heap of other similar errors (mostly casting char * to int *), but nothing impossible to fix.

Isn't this still a bit dodgy?

Yes. I don't even see why the zone magic thing even exists. Was memory corruption that big a deal? Maybe it was on the Gamecube or Xbox, where this code also ran.

Playing with LDoms, OpenBSD and Solaris

2017-12-03T00:00:00Z

A few weeks ago, I got my hands on a Sun T2000 server. It's got an UltraSparc T1 CPU, 32 threads, 32 GB of memory, a Sun XVR-300 GPU and what sounds like a huge jet engine mounted at the front.

It's a great machine (although maybe not as a workstation...), and there are a few things unique to SPARC that I've really been looking forward to playing around with. Mostly LDoms (logical domains - Sun's virtualization technology), OpenBSD on a beefy sparc64 (compared to my older UltraSparcs anyway) and Solaris (just as a curiosity).

Which OS to run on the bare metal?

I have a few choices here, but it essentially boiled down to OpenBSD or Solaris. There are a few factors to compare, but I settled on OpenBSD, for a few reasons.

Better software support. No software nowadays has Solaris in mind and I have no clue where to get updates for free. Oracle makes it impossible to download anything related to Solaris without having a support contract. OpenBSD has regular updates (for free) and a ports tree filled with up-to-date software.
I don't have any clue how to use Solaris and learning it is a waste of time.
If there's a problem, I won't know where to ask for help. I can't contact the devs and there isn't really a big Solaris community online.

And OpenBSD supports all the cool stuff I want to do fairly well anyway, so picking it is a no-brainer.

Someone not acquainted with the state of affairs when using old Sun hardware might ask why I'm not using some Linux distro. Surely there must be one out there that does what I need, and isn't Linux more popular and thus more well-supported than OpenBSD? Well there is a distro that does LDoms...Oracle Linux.

I think I'll pass on that one. And I can't find anyone running Debian as a primary domain on SPARC, so I guess you can't do it.

Installing the primary OS

This went so painlessly I don't even feel the need to write anything about it. Just put the disk in and everything works as expected. I guess you could netboot, it's not too difficult, but using an install DVD is way easier.

Setting up LDoms

This is actually really easy too. OpenBSD supports LDoms remarkably well, due to the efforts of Mark Kettenis (and others, probably). Take a look at this guide to see how I did it. With the T2000, there are a few differences, but these are confined to the ILOM/ALOM.

Where Ted types start /SYS, I type poweron. He types start /SP/console, I type console.

There is little real difference between the ALOM of the T2000 and the ILOM of the T5120 as far as I can tell. ldomctl from OpenBSD can't see the I/ALOM anyway, since that lives below OpenBoot.

I ended up with the following config:

domain primary {
    vcpu 16
    memory 16G
}
domain openbsd {
    vcpu 8
    memory 8G
    vdisk "/home/kaashif/vm/install62.iso"
    vdisk "/home/kaashif/vm/openbsd.img"
    vnet
}
domain solaris {
    vcpu 8
    memory 8G
    vdisk "/home/kaashif/vm/install-solaris-10.iso"
    vdisk "/home/kaashif/vm/solaris.img"
    vnet
}

Where solaris.img and openbsd.img are files created with truncate:

$ gtruncate -s 10G openbsd.img

This creates a sparse file rather than actually filling some file with zeroes.

Networking

For this, I just bridged vnet0, vnet1 and em0 (the interface connected to my LAN):

# ifconfig bridge0 up add vnet0 add vnet1 add em0

This lets the LDoms speak to my LAN as if they were real. I haven't got DHCP to work, but that hasn't been an issue.

What to do with the LDoms?

As of now, I haven't actually got the Solaris LDom to boot. I boot from the install media, install, but then it refuses to boot from the disk. Since this was just a toy, I might just abandon it.

In future, I might give the Debian SPARC port a spin, just to see if it's not a steaming mess. Same goes for NetBSD. Expect a post or two in the future about these.

Dmesg!

Here is the dmesg from before I sliced it up with LDoms:

console is /virtual-devices@100/console@1
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2017 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.2-current (GENERIC.MP) #307: Wed Oct 11 15:17:26 MDT 2017
    deraadt@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/GENERIC.MP
real mem = 34225520640 (32640MB)
avail mem = 33610276864 (32053MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: SPARC Enterprise T2000
cpu0 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu1 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu2 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu3 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu4 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu5 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu6 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu7 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu8 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu9 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu10 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu11 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu12 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu13 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu14 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu15 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu16 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu17 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu18 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu19 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu20 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu21 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu22 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu23 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu24 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu25 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu26 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu27 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu28 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu29 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu30 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
cpu31 at mainbus0: SUNW,UltraSPARC-T1 (rev 0.0) @ 1200 MHz
vbus0 at mainbus0
"flashprom" at vbus0 not configured
cbus0 at vbus0
vldc0 at cbus0
vldcp0 at vldc0 chan 0x0: ivec 0x0, 0x1 channel "hvctl"
"ldom-primary" at vldc0 chan 0x1 not configured
"fmactl" at vldc0 chan 0x3 not configured
vldc1 at cbus0
"ldmfma" at vldc1 chan 0x4 not configured
vldc2 at cbus0
vldcp1 at vldc2 chan 0x14: ivec 0x28, 0x29 channel "spds"
"system-management" at vldc2 chan 0xd not configured
vcons0 at vbus0: ivec 0x111, console
vrtc0 at vbus0
"fma" at vbus0 not configured
"sunvts" at vbus0 not configured
"sunmc" at vbus0 not configured
"explorer" at vbus0 not configured
"led" at vbus0 not configured
"flashupdate" at vbus0 not configured
"ncp" at vbus0 not configured
vpci0 at mainbus0: bus 2 to 7, dvma map 80000000-ffffffff
pci0 at vpci0
ppb0 at pci0 dev 0 function 0 "PLX PEX 8532" rev 0xbc
pci1 at ppb0 bus 3
ppb1 at pci1 dev 1 function 0 "PLX PEX 8532" rev 0xbc
pci2 at ppb1 bus 4
em0 at pci2 dev 0 function 0 "Intel 82571EB" rev 0x06: ivec 0x795, address 00:14:4f:e1:c8:82
em1 at pci2 dev 0 function 1 "Intel 82571EB" rev 0x06: ivec 0x796, address 00:14:4f:e1:c8:83
ppb2 at pci1 dev 2 function 0 "PLX PEX 8532" rev 0xbc
pci3 at ppb2 bus 5
ppb3 at pci1 dev 8 function 0 "PLX PEX 8532" rev 0xbc: msi
pci4 at ppb3 bus 6
ppb4 at pci1 dev 9 function 0 "PLX PEX 8532" rev 0xbc
pci5 at ppb4 bus 7
mpi0 at pci5 dev 0 function 0 "Symbios Logic SAS1064E" rev 0x02: msi
mpi0: UNUSED, firmware 1.9.0.0
scsibus1 at mpi0: 63 targets
sd0 at scsibus1 targ 0 lun 0: <HITACHI, H101414SCSUN146G, SA25> SCSI3 0/direct fixed naa.5000cca00098ddfc
sd0: 140009MB, 512 bytes/sector, 286739329 sectors
vpci1 at mainbus0: bus 2 to 9, dvma map 80000000-ffffffff
pci6 at vpci1
ppb5 at pci6 dev 0 function 0 "PLX PEX 8532" rev 0xbc
pci7 at ppb5 bus 3
ppb6 at pci7 dev 1 function 0 "PLX PEX 8532" rev 0xbc
pci8 at ppb6 bus 4
ppb7 at pci8 dev 0 function 0 "Intel 41210 PCIE-PCIX" rev 0x09
pci9 at ppb7 bus 5
ebus0 at pci9 dev 2 function 0 "Acer Labs M1533 ISA" rev 0x00
com0 at ebus0 addr 3f8-3ff ivec 0x2: ns16550a, 16 byte fifo
ohci0 at pci9 dev 5 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7c1, version 1.0, legacy support
ohci1 at pci9 dev 6 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7c3, version 1.0, legacy support
pciide0 at pci9 dev 8 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc4: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7c4 for native-PCI interrupt
atapiscsi0 at pciide0 channel 0 drive 0
scsibus2 at atapiscsi0: 2 targets
cd0 at scsibus2 targ 0 lun 0: <TEAC, DW-224SL-R, 1.0B> ATAPI 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 disabled (no drives)
usb0 at ohci0: USB revision 1.0
uhub0 at usb0 configuration 1 interface 0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
usb1 at ohci1: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
ppb8 at pci8 dev 0 function 2 "Intel 41210 PCIE-PCIX" rev 0x09
pci10 at ppb8 bus 6
ppb9 at pci7 dev 2 function 0 "PLX PEX 8532" rev 0xbc
pci11 at ppb9 bus 7
em2 at pci11 dev 0 function 0 "Intel 82571EB" rev 0x06: ivec 0x7d6, address 00:14:4f:e1:c8:84
em3 at pci11 dev 0 function 1 "Intel 82571EB" rev 0x06: ivec 0x7d7, address 00:14:4f:e1:c8:85
ppb10 at pci7 dev 8 function 0 "PLX PEX 8532" rev 0xbc: msi
pci12 at ppb10 bus 8
radeondrm0 at pci12 dev 0 function 0 "ATI FireGL V3100" rev 0x80
drm0 at radeondrm0
radeondrm0: ivec 0x7d4
ppb11 at pci7 dev 9 function 0 "PLX PEX 8532" rev 0xbc: msi
pci13 at ppb11 bus 9
umass0 at uhub0 port 2 configuration 1 interface 0 "USB2.0 Flash Disk" rev 2.00/0.00 addr 2
umass0: using SCSI over Bulk-Only
scsibus3 at umass0: 2 targets, initiator 0
sd1 at scsibus3 targ 1 lun 0: <USB2.0, Flash Disk, 2.60> SCSI2 0/direct removable serial.1221323400000000847F
sd1: 998MB, 512 bytes/sector, 2043904 sectors
uhub2 at uhub1 port 1 configuration 1 interface 0 "Atmel UHB124 hub" rev 1.10/3.00 addr 2
vscsi0 at root
scsibus4 at vscsi0: 256 targets
softraid0 at root
scsibus5 at softraid0: 256 targets
bootpath: /pci@780,0/pci@0,0/pci@9,0/scsi@0,0/disk@0,0
root on sd0a (8b6ac481aa65550f.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
WARNING: clock lost 6494 days -- CHECK AND RESET THE DATE!
BIOS signature incorrect 0 0
error: [drm:pid0:r100_cp_init_microcode] *ERROR* radeon_cp: Failed to load firmware "radeon-r300_cp"
error: [drm:pid0:r100_cp_init] *ERROR* Failed to load firmware!
drm:pid0:r300_startup *ERROR* failed initializing CP (-2).
drm:pid0:r300_init *ERROR* Disabling GPU acceleration
drm:pid0:radeon_bo_unpin *WARNING* 0x40027322390 unpin not necessary
radeondrm0: 1024x768, 8bpp
wsdisplay0 at radeondrm0 mux 1
wsdisplay0: screen 0 added (std, sun emulation)

Using vmm(4) to target old OpenBSD releases

2017-09-10T00:00:00Z

This server (the very one you are reading this post on), at the time of writing this post, runs OpenBSD 6.1-stable. It's fully patched and updated and everything, so it's a perfectly fine OS to run. But the VPS has limited memory and disk space, and the CPU isn't very fast, so compiling large projects on it, especially Haskell ones, is impractical.

This post describes a way to build fully-functional, dynamically linked (so you get all those security updates, super important for a public-facing web service), native Haskell binaries with cabal-install, Stack and...OpenBSD's (kind of) new native hypervisor, vmm.

Why can't I just do this on my local machine?

Well, I don't have any machines running 6.1-release or 6.1-stable, so that's right out. The reason I don't keep any around is that I need them all on -current to test ports and random patches I see on tech@. There have been a lot of major version bumps and not to mention the switch to clang and probably some other security features that all conspire to make 6.2-beta very binary incompatible with 6.1-stable.

Act 1: The setup

Very simple actually. I want to install 6.1 on a virtual machine with 4G of memory, 15G of disk (overkill, but disks are huge nowadays) and a network interface.

First, start up vmd:

# rcctl enable vmd
# rcctl start vmd
vmd(ok)

cd somewhere with more than 15G of free space (and get more than 4G of memory).

# vmctl create disk.img -s 15G

Get a ramdisk to boot from. We want the amd64 6.1 one:

$ ftp http://ftp.fr.openbsd.org/pub/OpenBSD/6.1/amd64/bsd.rd

Now spin up a VM (here we name it "vm61"):

# vmctl start vm61 -b bsd.rd -m 4G -d disk.img -i 1

Intermission: Networking

You do need some networking if you want to get anything useful done. If you read the vmctl man page, you'll notice that vmctl's -i option creates a network interface inside the VM which uses the vio(4) driver and a tap(4) interface in your host machine. Since you want the VM to access the real world, you probably want to bridge the tap and your real interface.

My real interface is re0, my tap interface is tap0. I created a bridge interface bridge0 as follows:

# cat > /etc/hostname.bridge0 <<EOF
add re0
add tap0
up
# sh /etc/netstart bridge0

Now the network is bridged and you can proceed as normal.

Act 2: Building the binaries

To get a console on the VM, run:

# vmctl console vm61

From here, you install as normal, using the vio interface to setup the network and proceed with the install as usual (look up how to do this elsewhere if you don't already know).

You might recognise the console program as cu. Accordingly, look at the cu(1) man page for instructions on escape sequences to send files, exit the console and so on.

Note: since this is essentially a throwaway VM I'm just going to use to build Haskell binaries, I overrode the default partition layout and just made one huge 14G root partition with wxallowed on and a 1G swap partition. I don't know how much space exactly cabal, stack and so on will take up (it's a lot though) and I know they need wxallowed. A more elegant solution is possible where you allocate the bare minimum space (15G is way too much), but this works well enough.

After installing, you have to boot from the disk. We need to kill and start the VM again without the bsd.rd argument:

# vmctl stop vm61
# vmctl start vm61 -m 4G -d disk.img -i 1

You may also notice that the tap0 interface was destroyed when the VM halted. So add it back to the bridge:

# ifconfig bridge0 add tap0

And now re-enter the console (or SSH in).

Here's what I did next (after adding ~/.cabal/bin to PATH):

$ doas pkg_add ghc cabal-install git
$ cabal install stack
< heaps of output >
$ stack setup
$ git clone git://github.com/kaashif/stargate-search.git
$ cd stargate-search
$ stack install

Surprisingly, the binaries installed to ~/.local/bin Just Worked (tm) on this server.

Perhaps that's not so surprising, since they're both OpenBSD 6.1 amd64 machines and all computers are the same nowadays.

Reviving a Sun Ultra 5 workstation

2017-08-10T00:00:00Z

I recently got an old Sun Ultra 5 working. It wasn't too difficult, but I needed to dig up a few old serial cables...

It already had SunOS 5.8 installed, but I put OpenBSD 6.1 on it, since I need a modern OS to actually do anything with it.

The first boot

The first step was obviously to just turn it on and see what happened. This went well, it turned on, the CD drive and floppy drives made some noises, then it stayed on. Of course, I didn't have access to a Sun keyboard anymore, so this is basically all I could do at this point.

I decided to connect a monitor to it and see what it said.

No keyboard detected. Redirecting output to ttya

Then the screen went blank.

Of course, how could I forget! Sun workstations aren't piece of crap i386 machines, they're right and proper workstations, designed to work properly over serial ports.

I consulted the OpenBSD faq page for serial consoles. There was a bit about sparc64 machines:

These machines are designed to be completely maintainable with a
serial console. Simply remove the keyboard from the machine, and
the system will run serial.

I don't have a keyboard, so this is perfect. The serial port on the back of a Sun Ultra 5 is DB25. There is a DB9 port, but it's ttyb, so isn't used automatically. I dug up a DB25 to DB9 adapter and a USB to DB9 cable, connected it and hey presto, it worked!

I was able to boot into SunOS! I was greeted with a curses-type setup screen, I was asked about the network (it failed to connect) and the hostname. Then I booted into SunOS 5.8!

This was great, but since I had no network, I had to get a modern OS on it, one I knew how to use.

Getting OpenBSD install media

There are a couple of options here:

CD
Floppy
Network

I tried to find a CD-R to no avail. I went to a nearby OpenBSD mirror to download the install61.iso for sparc64, but since I could only find already-written CD-Rs (useless) and CD-RW, I decided to give it a shot with a CD-RW. It didn't work:

ok boot cdrom
Resetting ... 
Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), No Keyboard
OpenBoot 3.11, 512 MB memory installed, Serial #1653024.
Ethernet address 8:0:20:19:39:20, Host ID: 80193920.

Rebooting with command: boot cdrom
Boot device: /p...@1f,0/p...@1,1/i...@3/cd...@2,0:f  File and args: 
Can't read disk label.
Can't open disk label package
Evaluating: boot cdrom

Can't open boot device

As I suspected, I couldn't get the drive to read a CD-RW.

Next up was floppy. I got the floppy61.fs from there and tried it out:

ok boot floppy bsd
Resetting ... 
Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), No Keyboard
OpenBoot 3.11, 512 MB memory installed, Serial #1653024.
Ethernet address 8:0:20:19:39:20, Host ID: 80193920.

Rebooting with command: boot floppy bsd
Boot device: /p...@1f,0/p...@1,1/e...@1/fdthree  File and args: bsd
Bad magic number in disk label
Can't open disk label package
Evaluating: boot floppy bsd

So what option did I have left? miniroot61.fs, of course.

On my laptop:

$ ftp http://mirror.bytemark.co.uk/OpenBSD/6.1/sparc64/miniroot61.fs
$ uuencode -o miniroot61.fs.uue miniroot61.fs miniroot61.fs

On the Ultra (using cu as a serial terminal):

# uudecode

Then I hit ~> which is the cu command to send a file. I choose miniroot61.fs.uue, which is then uudecoded and placed into the file miniroot61.fs where I ran uudecode.

Now I write it to the disk. Just in case it doesn't work, I decided to write it to swap. This way, if it fails to boot, I haven't hosed my system.

# dd if=miniroot61.fs of=/dev/rdsk/c0t0d0s1

Then enter the ok prompt:

# init 0

...some output...

ok boot disk:b
Resetting ... 
Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 270MHz), No Keyboard
OpenBoot 3.11, 512 MB memory installed, Serial #1653024.
Ethernet address 8:0:20:19:39:20, Host ID: 80193920.

Rebooting with command: boot disk:b bsd
Boot device: /pci@1f,0/pci@1,1/ide@3/disk@0,0:b  File and args: bsd
OpenBSD IEEE 1275 Bootblock 1.4
..>> OpenBSD BOOT 1.9
open /pci@1f,0/pci@1,1/ide@3/disk@0,0:b/etc/random.seed: No such file or directory
Booting /pci@1f,0/pci@1,1/ide@3/disk@0,0:b/bsd
4045496@0x1000000+1352@0x13dbab8+3251904@0x1800000+942400@0x1b19ec0 
symbols @ 0xfff62300 120 start=0x1000000
console is /pci@1f,0/pci@1,1/ebus@1/se@14,400000:a

From here I saw the usual OpenBSD installer output. The network was detected without a hitch, the installer downloaded the sets and everything worked perfectly.

It's almost a bit disappointing how well it works, I wanted to have to solve some problems.

Anyway, now I have a real big-endian system to test my software on! Of course I could have picked up an old macppc powerbook, but that would be even less fun than this!

Here is the sysctl hw:

hw.machine=sparc64
hw.model=SUNW,UltraSPARC-IIi (rev 1.3) @ 269.804 MHz
hw.ncpu=1
hw.byteorder=4321
hw.pagesize=8192
hw.disknames=wd0:0d5853015d2b2605,cd0:
hw.diskcount=2
hw.cpuspeed=269
hw.vendor=Sun
hw.product=SUNW,Ultra-5_10
hw.physmem=536870912
hw.usermem=536854528
hw.ncpufound=1
hw.allowpowerdown=1

And here is the dmesg.

I hope I can get my hands on a 440 MHz CPU at some point to give it an upgrade, but it is already fast enough for daily use.

Sorting a ton of mail

2017-08-04T00:00:00Z

Migrating mail servers is a tricky business, especially when one server doesn't have IMAP set up. The easiest way is to download all the mail and reupload it to the new mail server.

This seems simple enough, but I ran into problems. After all, I was going from an IMAP server somewhere to a maildir (no IMAP sync tool supports mbox for some reason) to an mbox through procmail to a directory of mboxes. Not trivial.

The Problem

I am a user of Zoho's mail service. It serves me well but there are usage limits. I set up my own mail server so I wouldn't have to deal with there limits. But how do I get my mail from an IMAP server to a non-IMAP server.

I'm obviously not going to setup IMAP, another public service which increases my attack surface, just to migrate mail.

The Solution Part 1: Downloading the mail

This was supposed to be simple. I installed isync and wrote an .mbsyncrc which would fetch mail and deliver to a maildir. But there are usage limits, and I ran into them while fetching my 100k messages.

IMAP command 'AUTHENTICATE PLAIN <authdata>' returned an error: NO [ALERT] Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.
*** IMAP ALERT *** Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.

Argh! So annoying. I noticed that I could try again after a few minutes. So what if I slowed down mbsync to the point where it takes longer to hit the usage limit than the time it takes for the limit to reset. This turned out to be simple, I just added this line to my .mbsyncrc account:

PipelineDepth 1

This entirely disables pipelining, i.e. only one IMAP command can happen concurrently whereas before, the limit was infinite.

Converting to mbox

So I have all my mail in ~/.mail. But now what? My mailserver deals in mboxes, not maildir.

I came across a Python script that converts a maildir to a mailbox:

#!/usr/bin/python
# -*- coding: utf-8 -*-
    
import mailbox
import sys
import email

mdir = mailbox.Maildir(sys.argv [-2], email.message_from_file)
outfile = file(sys.argv[-1], 'w')

for mdir_msg in mdir:
    # parse the message:
    msg = email.message_from_string(str(mdir_msg))
    outfile.write(str(msg))
    outfile.write('\n')

outfile.close()

It's so simple and speaks to the simplicity of the mbox format. So I ran it:

$ cd mail
$ for d in *; do python2.7 mailconv.py "$d" "${d}.mbox"; done

It eventually completed and I was left with a mess of disjointed mailboxes.

Sorting

Now I had to sort them. But of course, mails get sent one at a time, so if my final intention is to pipe all of them through procmail to apply my new filters, I had to smush them all together first.

Funnily enough, I came across a Python script that sorts mbox:

#!/usr/bin/env python2.7
from email.utils import parsedate
import mailbox, sys

def extract_date(email):
    date = email.get('Date')
    return parsedate(date)

the_mailbox = mailbox.mbox(sys.argv[1])
sorted_mails = sorted(the_mailbox, key=extract_date)
the_mailbox.update(enumerate(sorted_mails))
the_mailbox.flush()

Using this script:

$ cat *.mbox > all.mbox
$ python2.7 sort.py all.mbox

And there we have it, a sorted mbox! That script uses a lot of memory and CPU, Python isn't the best language for this.

Getting it to the mail server

This wasn't too difficult. First I sent the blob:

$ gzip all.mbox
$ scp all.mbox.gz mail.kaashif.co.uk:~/

Then, on the server, I used a nifty program called formail which comes with procmail and applies a command to each mail in an mbox:

$ gunzip all.mbox.gz
$ formail -ds procmail < all.mbox

And the mail was sorted as I specified and it's all there!

Moving to my own email server

2017-08-03T00:00:00Z

There I was, a loyal user of http://mail.zoho.com, when I decided to download all of my emails. For archival purposes, you know.

So I fired up mbsync, set everything up, let 'er rip, but after only about 10,000 emails downloaded, I got this error:

IMAP command 'AUTHENTICATE PLAIN <authdata>' returned an error: NO [ALERT] Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.
*** IMAP ALERT *** Your account is currently not accessible via IMAP due to excessive usage. Kindly try after some time.

What sort of...anyway, this was unacceptable, so I decided to set up my web server as a mail server.

Prerequisites

Everything you need to run a mail server is already installed on OpenBSD. Of course, if you want to use DKIM (and you really do, or your mail will be sent to spam or just not received by anyone), you need to install dkimproxy.

$ doas pkg_add dkimproxy

That's basically it.

Setting up OpenSMTPD

OpenSMTPD has a pf-inspired config file. That is to say, it's very easy to grok.

Step 1: Set up an SSL certificate

This is very easy with acme-client, which comes with OpenBSD.

Your /etc/acme-client.conf should already have the letsencrypt authorities in it, so you just need to add your domain to the bottom.

It's so easy, look at man acme-client.conf. For reference, here is my domain entry:

domain earendil.kaashif.co.uk {
    alternative names { kaashif.co.uk git.kaashif.co.uk
    www.kaashif.co.uk mail.kaashif.co.uk }
    domain key "/etc/ssl/private/earendil.kaashif.co.uk.key"
    domain certificate "/etc/ssl/earendil.kaashif.co.uk.crt"
    domain full chain certificate "/etc/ssl/earendil.kaashif.co.uk.fullchain.pem"
    sign with letsencrypt
}

Then you run acme-client -vFAD earendil.kaashif.co.uk as the man page suggests and everything will Just Work, with your new certificates ending up where you specified.

Put that command in /etc/monthly.local and it will run every month, keeping everything valid and non-expired.

Step 2: Set up your virtual users

My mail server will be single user, so I just slapped this into /etc/mail/virtuals:

kaashif@mail.kaashif.co.uk kaashif

I don't even think you really need it, but in the future you may want to add a webmaster email, or mailing lists for specific projects, etc, so its good to have it.

Although you didn't add anything to /etc/mail/aliases, you should still run newaliases just to generate the aliases.db that OpenSMTPD uses.

Step 3: /etc/mail/smtpd.conf

This is the big one, where the magic happens.

Actually, this file is so simple, I can just dump it here and explain it:

pki mail.kaashif.co.uk certificate "/etc/ssl/earendil.kaashif.co.uk.crt"
pki mail.kaashif.co.uk key "/etc/ssl/private/earendil.kaashif.co.uk.key"

These are the letsencrypt certs you made earlier that smtpd will use.

table aliases file:/etc/mail/aliases
table virtuals file:/etc/mail/virtuals

This declares the tables so you can use them later on in the file as <aliases> and <virtuals>.

listen on all
listen on all port 10028 tag DKIM

This requires some explanation. We listen on all interfaces on the usual smtp ports for mail (this is where mailservers will send us mail). But on port 10028, we listen for mail tagged DKIM. This is where dkimproxy comes in.

When we send mail from this server, we want it signed with DKIM. So the dkimproxy_out daemon listens on port 10027, we send it mail and it sends it back (signed) on port 10028 and the session has a DKIM tag.

accept from any for domain "mail.kaashif.co.uk" virtual <virtuals> deliver to mda "/usr/local/bin/procmail"

Procmail is a well-known mail filtering agent. Explaining how to configure it is out of the scope of this post, but basically you just make a ~/.procmailrc in the users' home and put some rules in there. Where the mail ends up (mailbox, maildir, somewhere else) is entirely up to Procmail, smtpd forgets about it after that.

accept from local for local alias <aliases> deliver to mbox

We don't want to mix local and non-local mail, so all that local-only daily output and insecurity output will end up in our normal mbox in /var/mail. But of course, there's nothing stopping you from configuring Procmail to put some non-local mail in /var/mail too.

accept tagged DKIM for any relay
accept from local for any relay via smtp://127.0.0.1:10027

The only sessions that will ever be tagged DKIM are ones made by dkimproxy. So it is safe to just relay them all. Any mail sent from here is sent to dkimproxy on port 10027, to be signed and returned later.

Setting up DKIM and SPF

Step 1: Generating a key

DKIM works by signing messages with a private key stored locally. Then untrusting mail servers get the public key from a DNS TXT record and check the signature. We don't have any keys yet.

$ mkdir -p /etc/mail/dkim
$ cd /etc/mail/dkim
$ openssl genrsa -out private.key 1024
$ openssl rsa -in private.key -pubout -out public.key

Why only 1024 bits? Because that's all that fits in a single TXT record, it's a bit incovenient otherwise.

Now, make those readable by dkimproxy and no-one else:

$ chmod 0600 *.key
$ chown _dkimproxy:_dkimproxy *.key

Now edit /etc/dkimproxy_out.conf's keyfile line to point to the private key you just made:

keyfile   /etc/mail/dkim/private.key

Step 2: Putting everything into DNS

This is important since people will have no idea how to verify your sigs if they don't know your public key.

Make a record on selector1._domainkey.your.domain.tld, type TXT, with the following content:

v=DKIM1; k=rsa; p=<your public key>

When I say your public key, I mean go to your public key file, ignore the header and footer lines, concatenate each line then paste it there.

Also take the opportunity to set up SPF. Add a TXT record to your.domain.tld, with:

v=spf1 a mx ip4:<your ip address> ~all

Where your IP address is your static IP. If you don't have a static IP address, you'll probably forever be on every spam list ever. Contact your ISP to get a static IP, or get a VPS.

Final touches

Now just start everything:

$ rcctl enable dkimproxy_out
$ rcctl restart dkimproxy_out smtpd

And everything should work!

But now you wonder how I read mail? Easy, I SSH in and use mutt. Couldn't be any simpler.

But how do I check it on my phone? Dude, I just told you! SSH and mutt. I can use mail(1) if I really can't deal with a complicated TUI.

To my surprise, everything worked. http://www.mail-tester.com gave me a clean bill of health and I found my mails were able to reach all my friends and family.

And no arbitrary usage limits! I can download my many gigabytes of mail as many times as I want!

But of course, I do have to still download them from Zoho to upload them to my new server...

Hardware Census

2017-08-02T00:00:00Z

Before I get my Sun Ultra 5 working and can write something about that, I thought I'd go through all the hardware I'm using right now and the OSes I'm running on them. Spoilers: it's all OpenBSD and Debian.

I have accumulated quite a few laptops in various states of disrepair. The working ones, I prefer to install OpenBSD on them. The non-working ones become spare parts for the working ones. Occasionally I see a great laptop on eBay for 99p with the comment "for parts". Except the "problem" is that the memory is bad, or the HDD died, or something easy to fix, so I snag it and get it working.

Enough with the backstory, let's see some dmesgs!

thinkpad-t61

hw.machine=amd64
hw.model=Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz
hw.ncpu=2
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=cd0:,sd0:6ec4c14a8e9df601
hw.diskcount=2
hw.cpuspeed=2001
hw.setperf=100
hw.vendor=LENOVO
hw.product=765912G
hw.version=ThinkPad T61
hw.serialno=L3B4859
hw.uuid=c4ede381-495a-11cb-a14e-9b55a480804b
hw.physmem=1047199744
hw.usermem=1042952192
hw.ncpufound=2
hw.allowpowerdown=1
hw.perfpolicy=manual

This was my main laptop for quite a while and I still use it occasionally. It's got enough RAM to get work done and a strong enough CPU so you don't die waiting for things to compile. But while the graphics are well supported (Intel GMA), they are very weak.

Now, 2/3 times it fails to boot with a bad memory error. So I moved on. I still test a port or two on it occasionally.

Here is the dmesg. It runs OpenBSD-current.

thinkpad-r61

This one runs Debian, since it has an Nvidia Quadro graphics card (unaccelerated on OpenBSD). I suppose I could run NetBSD or FreeBSD on it, but it works fine. Besides, an OS monoculture is bad, right?

It has 2GB of RAM, a decent enough CPU and the GPU is actually capable of running some games at an OK speed. But when you have to turn the graphics to low on 15 year old games, you get the feeling this laptop wasn't designed for gaming.

Here is the dmesg.

thinkpad-760el

hw.machine = i386
hw.model = Intel Pentium (P54C) ("GenuineIntel" 586-class)
hw.ncpu = 1
hw.byteorder = 1234
hw.physmem = 33140736
hw.usermem = 32948224
hw.pagesize = 4096
hw.disknames = wd0,fd0
hw.diskcount = 2

This is the old one. Since I only use it as a serial terminal, it really doesn't matter what OS I run on it. I run OpenBSD 3.3 on it. This is because I tried to install NetBSD, but there were network problems even though my card (see the dmesg if you want to know which card) is supported. So I went the easy route and installed OpenBSD 3.3. It installed from a single floppy, which is great. Even OpenBSD 6.1 installs from a single floppy, so that hasn't changed at least.

Here's the dmesg.

acer-vn7-591g

hw.machine=amd64
hw.model=Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
hw.ncpu=8
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=sd0:050a15853dd14169,sd1:
hw.diskcount=2
hw.cpuspeed=2601
hw.setperf=99
hw.vendor=Acer
hw.product=Aspire VN7-591G
hw.version=V1.15
hw.serialno=NXMUVEK046514021B66600
hw.uuid=64980eb6-aa08-4cf3-80f6-ff6cd8648269
hw.physmem=12788166656
hw.usermem=12738244608
hw.ncpufound=8
hw.allowpowerdown=1
hw.perfpolicy=manual

This is the laptop I do most of my work on. It has an Intel GPU (the one I use most of the time) and an Nvidia GPU (I only use this to boot into Debian and play the occasional game of whatever).

I, of course, run OpenBSD-current on it. This is where I play with ports and stuff.

Here is the dmesg.

I have a few more laptops, but since I never use them for anything, listing them here would be pointless.

ibm-system-x3455

hw.machine=amd64
hw.model=Dual-Core AMD Opteron(tm) Processor 2218
hw.ncpu=4
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=wd0:a216e88897876460
hw.diskcount=1
hw.cpuspeed=2593
hw.vendor=IBM
hw.product=IBM System x3455-[7984W20]-
hw.serialno=KDYHXP7
hw.uuid=0b2a5418-7277-3bc8-a15a-85dc8760d5d2
hw.physmem=12867076096
hw.usermem=12867051520
hw.ncpufound=4
hw.allowpowerdown=1

I picked this up for cheap to be my build server, file server, etc. It doesn't have many places to put HDDs, but that's not an issue since I only have 2 hard drives to put in it, and I keep one disconnected from it (long term backups like photos, videos, etc). I also keep off-site backups and backups on other people's computers (also known as the cloud).

Here is the dmesg, it's fairly boring.

So there we have it, that's my in-use hardware.

Soon, I'll be adding a Sun Ultra 5 workstation to that. Finally, I'll be able to see what specs it has (I forgot and don't have the documents). I could open it up and look inside, but who knows how many of the disks, sticks of RAM, etc actually work. Maybe the CPU doesn't work.

Hopefully the next post won't be a post-mortem of the workstation.

Cutting down memory usage of a Haskell web app

2017-06-28T00:00:00Z

You may have heard of the Star Trek script search tool at http://scriptsearch.dxdy.name. I'm writing a similar thing for Stargate. The difference is, of course, is that my tool will be running on some crappy Amazon AWS t2.nano with no RAM.

The first prototype was written in Python, but the parsing code was always written in Haskell (Parsec is great). I decided to move everything into Haskell so there wouldn't be this redundancy of parsing in Haskell then serializing to disk then reading it from Python...one codebase for everything would be much simpler.

I wrote up a quick Haskell version, but there was one small problem when I tried to use it:

$ ./stargate-search
Setting phasers to stun... (port 5000) (ctrl-c to quit)
Killed

That's right, the OOM killer had to step in and put my app down like the sloppy wasteful piece of junk it was.

How could I fix this?

Profiling

Obviously the first step when trying to fix code is trying to work out what's wrong. I compiled my program with profiling options and ran some stress tests.

$ stack build --profile
$ stack exec -- stargate-search +RTS -p -s &
$ ./tools/stress.sh

(All stress.sh does is shoot 25 requests to the server so we can see how it runs)

The results were deeply troubling. Here are some highlights (bear in mind, all this code does is search a few dozen MB of fairly structured data).

75,013,509,424 bytes allocated in the heap
 5,751,785,424 bytes copied during GC
   528,085,912 bytes maximum residency (17 sample(s))
     5,809,384 bytes maximum slop
          1523 MB total memory in use (0 MB lost due to fragmentation)

Wow, that's really something. 75 GB allocated? That seems like a lot, but remember that the garbage collector in GHC is really good, we can afford to allocate a ton as long as we get rid of it fairly quickly.

But the 528 MB residency is concerning. Surely all of that is during the initial parse of the Stargate transcripts, right? I'd better take a look at the graph...

Oh...I guess the peak residency is in the parsing stage, but it doesn't go down much. Let's take a look at the cost centre breakdown to see where the low-hanging fruit is.

COST CENTRE  MODULE            SRC                                         %time %alloc

match        Web.Stargate.Main lib/Web/Stargate/Main.hs:95:1-87             71.3   56.4
satisfy      Text.Parsec.Char  Text/Parsec/Char.hs:(140,1)-(142,71)          6.0   13.0
string       Text.Parsec.Char  Text/Parsec/Char.hs:151:1-51                  5.2   14.5
manyAccum.\.walk Text.Parsec.Prim  Text/Parsec/Prim.hs:(607,9)-(612,41)      2.3    5.3

So this basically tells us that the vast majority of work is done in the string matching code. Not a surprise. Let's take a look at what that code is, exactly:

match :: String -> String -> Bool
match query body = (null query) || ((map toUpper query) `isInfixOf` (map toUpper body))

So many things wrong with this...everyone always says premature optimization is the devil, but I think I took it a bit too far in the other direction.

String is rubbish

The obvious low-hanging fruit here is String. This is for several reasons:

type String = [Char]: There is no secret optimization going on here, you get all of the downsides of O(n) access with none of the upsides of a linked list since we never do anything with the head, where everything is O(1).
map toUpper query: This is blatantly inefficient. Every time we want to do a case-insensitive search, we map over the query but then throw away the result? I guess the compiler might optimize this away, but it would be smarter to upper case it before running the match function a billion times.
isInfixOf from Data.List is literally implemented like this: isInfixOf needle haystack = any (isPrefixOf needle) (tails haystack). Sweet Jesus, that's the most naive implementation of anything I've ever seen. I suppose that, since this is from Data.List, no assumptions can be made about the data, so fair enough. But this is another clue that I shouldn't be using String.

Let's get on with fixing this. Essentially, I just imported Data.Text qualified with T at the start of every file, replaced String with T.Text everywhere, dealt with all of the type errors and it Just Worked.

With no further optimization, here is the memory graph for the new Text app:

A pretty big improvement, right? But it's still pretty slow.

Highly hanging fruit

Now that the low-hanging fruit was done with, let's move on. I thought it would improve the program if I did away with all of the inefficient list stuff and replaced it with Data.Vector where I could. After all, I just build these things once when parsing then access them millions of times: the cost is in the access, which is O(1) with Data.Vector. I also replaced some hash tables with (key, value) vectors, since 95% of the work is iteration, and all this hashtable stuff is unnecessarily complex for that.

So in summary, I:

Replaced [a] with Vector a where possible.
Replaced HashMap a b with Vector (a,b) where it made sense.

This resulted in:

Uh, that's a bit better I guess, but not really. The execution time, in particular, is still really bad for such a simple app. The memory usage I can forgive, since the data is about that big so we need to keep it all in memory.

Writing some imperative code for more speed

At this point, I was just assuming that the RAM usage was OK (the initial parsing thing happens only once, we can just ignore it) and I set my sights on reducing the run time.

Looking at the search function, it's filled with folds and stuff. People are always saying that the unpredictability of Haskell's performance is due to laziness and inherent to the functional style, since it's so far from the metal.

So why not write an imperative-style algorithm to search? We can do this very easily using the ST monad. For those not in the know, this is basically the IO monad, but you can escape it.

Essentially, you can have pointers, write and read to them, modify stuff in memory, but still have the function you write be pure. This is done by not allowing you to use the pointers or mutable values from one ST monad in another one, so the mutability can't leak out of your pure function.

For example, here's a Fibonacci function that runs in constant space:

fibST :: Integer -> Integer
fibST n =
    if n < 2
       then n
       else runST $ do
           x <- newSTRef 0
           y <- newSTRef 1
           fibST' n x y
 
   where fibST' 0 x _ = readSTRef x
         fibST' n x y = do
             x' <- readSTRef x
             y' <- readSTRef y
             writeSTRef x y'
             writeSTRef y $! x'+y'
             fibST' (n-1) x y

So I rewrote the searching function in this style. The resulting code was much easier to read, but the performance, well:

Yeah, that's right, it's worse! I guess GHC does a good job with all of those folds and stuff, better than I can do.

Remember to check all of your profiler's output

I hadn't actually looked at the profiler output for the execution time. What I saw when I did surprised me and made me facepalm a bit.

COST CENTRE      MODULE                                SRC                                                       %time %alloc

match            Web.Stargate.Search                   lib/Web/Stargate/Search.hs:36:1-87                         41.5   30.4
caseConvert      Data.Text.Internal.Fusion.Common      Data/Text/Internal/Fusion/Common.hs:(398,1)-(405,45)       15.4    0.0
upperMapping     Data.Text.Internal.Fusion.CaseMapping Data/Text/Internal/Fusion/CaseMapping.hs:(16,1)-(219,53)   14.2   24.8

So it turns out 24.8% of the allocations and 29.6% of the runtime of the program is spent mapping text to upper case. That's ridiculous! How didn't I see this?

So I just added a field to the transcript data structure which contained an uppercased version, I made match use this instead of computing it itself every time and look at the result:

So now my program is faster. The memory usage is still bad, but that's a one-time thing. I can solve that by just switching to Attoparsec or some other actually performant parsing library.

Conclusion

Don't assume you know better than your tools: trust your profiler and compiler to tell you where you're wrong!

Trying out DragonflyBSD

2017-06-22T00:00:00Z

So I was setting up a laptop I had just picked up, when it came to deciding what OS to install on it. Obviously, I'd probably end up installing some Linux and maybe also OpenBSD (hard drives are huge nowadays, I could fit hundreds of OSs on there). While I had tried out FreeBSD and NetBSD, DragonflyBSD had never been on my radar.

It still isn't, really, but I thought I'd try it out on an old laptop, just to see what it was like. It went pretty well, but there were a few oddities and one or two kind of weird design choices.

Getting installation media

I expected this to be a breeze, since all OSs offer all sorts of images. Dragonfly is no exception, on their downloads page, they offer a ISO for DVDs and a raw disk image, for USB disks. I went for the latter.

$ wget http://mirror-master.dragonflybsd.org/iso-images/dfly-x86_64-4.8.0_REL.img.bz2
$ bunzip2 dfly-x86_64-4.8.0_REL.img.bz2
$ sudo dd if=dfly-x86_64-4.8.0_REL.img of=/dev/sdb

This should work, but when I booted it up on my laptop, the bootloader screen came up, some kernel messages were printed, but then I got an error saying da8s1a (the root partition of the USB disk) was not found.

This was weird, so I typed the following:

mountroot> ?
da8
da0s1
da0s2

Uh, so apparently the USB disk doesn't have any partitions. I was not able to solve this problem, probably because all of my disks were bad - when I wrote the image to each of 3 USB sticks, each of them was different and none of them matched the actual image. I didn't manage to overcome this, so I just used the DVD image:

$ wget http://mirror-master.dragonflybsd.org/iso-images/dfly-x86_64-4.8.0_REL.iso.bz2
$ bunzip2 dfly-x86_64-4.8.0_REL.iso.bz2
$ sudo cdrecord dev=/dev/sr0 dfly-x86_64-4.8.0_REL.iso

Your device name may vary: the machine I'm doing this from is running Gentoo Linux, so the CDRW drive is /dev/sr0.

This disk booted and brought me to the installation menu without a hitch.

Installing

The menus were fairly pleasant and user-friendly. There's even a nice dragonfly logo behind the ncurses menu.

I was prompted about whether I wanted UFS or HAMMER. I've heard a lot of good things about HAMMER, so I went for it, mainly to see if it would break. It didn't break. I don't plan to use DragonflyBSD too much (since it's very close to FreeBSD, which I tried for a period of time and later abandoned), so this is the most I can really say.

I noticed that the partition layout was simple, just / (taking up 90% of the space), /boot, swap and something called /build taking up the rest of the space. I assume this is something to do with building from source, since it contains a usr.obj directory, and some other buildy stuff.

Something very strange was that I was not allowed to use any non-alphanumeric characters in my passwords. This means no ;'[]"! and so on. I cannot fathom why this is the case.

Also, the default shell is tcsh, which is just csh plus some things. It goes without saying that there are a lot of reasons not to use csh (at least ten). So this is less than ideal. We can change this later, so I wasn't too fussed about it at this point. But ugh, I hate csh.

Next, I was asked which network interface I wanted to configure. There were a ton of them, including faith0, sl0, ppp0, but I knew I was looking for em0, so there was no confusion here. For a beginner, it might be a bit weird that these pseudo-devices are enabled by default. I don't think many beginners install DragonflyBSD, though...

The rest of the installation (time zone, installing the bootloader, etc) was problem-free, so I just had to reboot and get cracking.

Using the system

Obviously I won't be using it long-term but I just wanted to confirm I was able to get everything set up with a minimum of pain.

The first order of business is to install X and some other stuff. I settled on Xfce, since it's so hassle-free to set up.

I realised that sudo wasn't installed (neither is doas from OpenBSD), so I tried su. I was prohibited from doing this, so I logged out and back in as root to install sudo. It's very simple:

$ pkg update
$ pkg install sudo

It already comes with package mirrors preconfigured and everything, so this is all you need.

Then I logged back in as kaashif and installed Xfce:

$ sudo pkg install xfce

After installing, I tried to run it with startxfce4, but I got complaints that xinit wasn't installed. And neither was X, actually! This is weird since I thought Xfce should depend on X.

$ sudo pkg install xorg xinit

Surely now X would run! No, I get an error that libssl.so isn't found. How is that possible? Anyway, I install libressl:

$ sudo pkg install libressl

And try again, but now I get a different error:

...
/etc/machine-id does not exist
abort
...

I thought dbus was supposed to make that for me. No matter, we can solve that:

$ dbus-uuidgen | sudo tee /etc/machine-id
$ startxfce4

And voilà! It works! Except for the clit mouse...but I guess I can make do with a USB mouse.

Conclusion

You know, that wasn't too bad, I remember struggling with video drivers and monitor refresh rates and dreadful Xorg.conf files...things have really improved not just on Linux, but on BSD, too.

Not perfect, but I expected it to be a lot worse, given how few people use it. I guess that's the result of being a fork of the most popular BSD.

Playing around with distcc

2017-03-20T00:00:00Z

Today, I decided to install Gentoo on a spare machine I had lying around, since I was bored. Obviously, the first issue I ran into was that emerge x11-base/xorg-server was taking a really long time to run since the Xorg server is a pretty bloated program. Then emerge firefox was taking forever too.

One solution (for Firefox anyway) was to use the provided binary packages, this meant firefox-bin for Firefox. But this means I abandon all of the nice features (for me, that means USE flags) that Gentoo offers. If I'm going to download a load of binaries that I can't customize, why not just install Debian?

So the solution is to speed up compilation. That means putting more CPU cores to work. But my poor old ThinkPad only has 2 cores! This is where distcc comes in.

What is distcc?

It's a way of distributing compilation jobs to build servers which are supposed to have huge beefy CPUs. This saves a lot of time.

To explain how to set it up, I only need quote the distcc man page:

1 For each machine, download distcc, unpack, and install.

2 On each of the servers, run distccd --daemon with --allow
  options to restrict access.

3 Put the names of the servers in your environment:
  $ export DISTCC_HOSTS='localhost red green blue'

4 Build!
  $ make -j8 CC=distcc

This is an OK explanation of the general process, but the specifics can get a bit more fiddly than that.

My setup

Because I didn't want to put too much effort into standardising the OSs of my machines, my ~~pile of shitty laptops~~ build cluster runs a whole host of different Linux distros. There's Debian, Slackware, Ubuntu, Arch. I really wanted to use my OpenBSD server in here, but sadly distcc doesn't abstract away the build server OS that much: the files it sends just get compiled by the server's compiler then sent back - nothing too fancy.

My instructions will focus on Debian, since my Xeon desktop and dual-Opteron server both run Debian - all the other machines probably contribute less than 10% of the total CPU power.

Installing and configuring on Gentoo should be basically the same if you're using systemd.

Installing

The package and ebuild are both named distcc, so just use apt or emerge as appropriate. Also install distcc-pump, for an extra speed boost: it offloads even more of the build process to the server (take a look at this).

Configuring the client

There are a few things that cause distcc to fail.

On the client (where the jobs get sent from), in /etc/distcc/hosts, there may be a line reading +zeroconf. If you don't use Zeroconf, just delete this line. It will likely cause distcc to fail to distribute jobs even though it's supposed to realise you're not using Zeroconf. Type the IP addresses or hostnames (if they will resolve correctly) here, separated by spaces. For each IP, add ,cpp,lzo to the end, so that pump mode will work correctly and compression will be used to send files.

So an example config might look like:

192.168.0.32,cpp,lzo big-server,cpp,lzo my-desktop,cpp,lzo

Then go to /etc/portage/make.conf and add distcc distcc-pump to the FEATURES variable (or create it if it doesn't exist).

Change MAKE_OPTS to -jN -lM where N is double the total number of CPU cores available and M is however many jobs you want to run locally (in my case, 2). We pick double the number of CPU cores since the jobs are relatively small: it's possible the network will bottleneck us if the number of jobs is too small and we have to wait for many round trips for small batches of work.

Configuring the server

After installing distcc, you need to edit /etc/default/distcc.

Add your subnet to ALLOWEDNETS. Mine looks like:

ALLOWEDNETS="192.168.0.0/24"

Change LISTENER to your IP address. That is, the IP address the client will use to acces the server. For me that's:

LISTENER="192.168.0.83"

And set ZEROCONF to "false", just in case it messes something up.

Now, enable and start the service:

$ systemctl enable distcc
$ systemctl start distcc

There is one small thing left to do. The platform on my Gentoo installation is x86_64-pc-linux-gnu, so when it sends out a job, it expects the compiler to be x86_64-pc-linux-gnu-gcc. This is not actually the case on my Debian installation, where the compiler is called x86_64-linux-gnu-gcc. Note that the pc is missing. This makes no difference - it really is the same platform. But unless we fix this, distcc will fail.

The solution (maybe a bit of a hack) is to just symlink all the platform-specific stuff (ld, nm, gcov, gcc, g++ etc). Here's how I did that:

$ ls | grep x86_64-linux | while read a; do sudo ln -sf $a ${a//linux/pc-linux}; done

This certainly works, maybe there's a better way to do it, though.

Since we are working in a single LAN, I assume you don't firewall traffic between computers in your LAN (if you're its only user, for example). If you do, you want to open port 3632 on your servers.

The moment of truth

Now, try to emerge a random package using pump and distcc as follows:

$ pump emerge firefox

This should transparently just use distcc and send some jobs over to one or more of your servers.

Take a look at /var/log/distcc.log on one of your servers, it should be filled with lines like this:

distccd[14465] (dcc_job_summary) client: 192.168.0.82:36466 COMPILE_OK exit:0 sig:0 core:0 ret:0 time:169ms x86_64-pc-linux-gnu-g++ main.cpp

My build time is 1/30 of what it was before, it's incredible. Of course, what I save in time, I lose in electricity. So maybe next time I'll just use a binary distro. But then, of course, I wouldn't be able to make use of USE flags to stop 20 scripting languages getting pulled in when I install sudo.

Happy compiling!

Backing up PGP private keys

2017-03-10T00:00:00Z

There are hundreds of blog posts about backing up your PGP keys around on the internet. Most of them just say something like: put a passphrase on it, keep it on a USB stick, a CD, a floppy disk, or something like that. These are all very useful ways to back important stuff up - in fact, I just restored a backup of my GPG keys from a CD after deleting them by accident. These mediums are, however, not going to survive for many decades like paper can. And if you store to a writeable medium like a rewritable CD, DVD, USB stick or floppy, there is still the danger of you trying to restore then accidentally deleting your backup. Or, more likely, you write over it without realising many months later that you wrote a Linux distro or movie to that DVD that had your only PGP key backup.

There are dozens of blog posts floating around about paperkey, which is a program intended to extract the important bits of a PGP key, output this binary data in a pleasing human-readable format (basically a hex dump), then allow you to print it out and have a reasonably short thing you could recover your private PGP key from. The reason paperkey has a use at all is that without it, the whole "private" key (which includes a copy of the public key) would be ludicrously long and impossible to type into a computer correctly. By this, I mean it will be about 3000 bytes long. This is short for a computer, but very long for a human.

There is still the question of how to recover this paper key. The difficulty of this will be heavily dependent on how exactly you choose to encode your private key before printing it out. Here are a few methods of encoding binary data as something easily retrievable from a printout.

Getting started

First, I'll generate a dummy key so I can walk you through the steps of backing everything up.

$ gpg --gen-key
gpg (GnuPG) 2.1.18; Copyright (C) 2017 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Note: Use "gpg --full-generate-key" for a full featured key generation dialog.

GnuPG needs to construct a user ID to identify your key.

Real name: Joe Bloggs
Email address: joe@bloggs.com
You selected this USER-ID:
    "Joe Bloggs <joe@bloggs.com>"

Change (N)ame, (E)mail, or (O)kay/(Q)uit? o

...unimportant output...

pub   rsa2048 2017-03-11 [SC] [expires: 2019-03-11]
      B39C0AB5ED8C797D25A559B7DDEC6B207C796781
      B39C0AB5ED8C797D25A559B7DDEC6B207C796781
uid                      Joe Bloggs <joe@bloggs.com>
sub   rsa2048 2017-03-11 [E] [expires: 2019-03-11]

There are several choices at this stage. You can either just use GPG's ASCII output and mangle it some way yourself, or you can use paperkey.

Using GPG's built in output

Simply run:

$ gpg -a --export-secret-key "Joe Bloggs"
-----BEGIN PGP PRIVATE KEY BLOCK-----

lQOYBFjDP+0BCACgxDb16nklo76aud9Fr0trxU2orqMA9JuDjNm4bkKoBcFkeEAe
cK4N36fpV7IWFGgYyNqd576EaC+83Trgv1YuDQ6m0dn4gFnWDN5OquTyvjcrU5/z
STIS38DcicuMKT1SUoGhX6/zBtghVH3dEDuqiwMgT8gelKt12kOy/ZnpTnBLgIEv
V2Y6zLjNqVoG8RXKeop+pyzRkIeKE667O7hPMez5ODKNnLtJDBTADxkxVyZx5ZAk
afmY74ChEotNBJNyDOri2KJ0TEeH0nKR3e7El5lZNeUzXP84y5DxQ/ufusKtTlFg
mIrtU3LurMw2Sdvkr0HqBrxFveeNMMRXaBbRABEBAAEAB/sGHtI01INeMnisLU1Z
us21QaKuPE/KVoWhIXicc94nxWhSad2PCx0lPBGJaaRHAOnhn6vq/Qqcwdanawi1
y7L9N9QJ981Dj6db5cuE1S64KxOwm5NoUK4OV+RgwQI1yNAj1S5INXteVjFeO3g7
NUYAPSCWV1M4DtLkPrX7F3qHjjx2nSrRD7H5umEAi7xjwTyc9CQtwLaWRf0MqMhj
h4AWqB6NED7OGz5pB0ExxEab6CgMqX4Qhmn5GadqlS5BdzUEIQ8iVNSiaKsw4pZE
+H9XvQTlq/iPekjVKG6F4Zh9dlPllAFjo3R7CHwwe+zgAen6kM8dilugBkBkSx7k
sl0RBADF6uBsa+Rn5tKw4rtU21ox4qwLiPBeircXdWa8TIFsuzW04SJ5TMkUswKR
QrdWeiMmVWjAOv8UTJBPNuswZR9hDpK84Ohtkn9l9p91Z4kNQmqOsPxUDk4++Vx0
q4TiTnDYk6MvfY9VjU8Avq62suV3aXs3Ikj/O/FTYIKRYUzNNQQAz/JCkEbVfZ4f
tvhLmk7SQsu3zn1eh8nRMj73VLJA2ZFL0ccmdjdHiqZIfYlMpASXberfpiZQ4q4x
YMtCdZ6Wcv7QV1FBeTFJyu/tM5dswXiy+pRvg3PFj8XR2XHAf00ZwXpva2ODgG6X
QFMSXO7xrmkQXkPYS9NUE9yyzBErAq0D/0lAApgs6UyMuiyMFRLDHc5lWYAHID6m
0FWOXSkhBrMTAPBfx8/W67+Gp+zJ0NufC4HsP8oJ1J9Uu/TY0HrmxyFuIZSGv/KI
+gFmjKjPoaUCAPvGNFbmRfLLK7wmz3n8RVTgl1Z8SXB/3cpxryTTEa+8ikhdv+cB
PVdfpVsMcM7cPYi0G0pvZSBCbG9nZ3MgPGpvZUBibG9nZ3MuY29tPokBVAQTAQgA
PhYhBLOcCrXtjHl9JaVZt93sayB8eWeBBQJYwz/tAhsDBQkDwmcABQsJCAcCBhUI
CQoLAgQWAgMBAh4BAheAAAoJEN3sayB8eWeB7WMH+wcwrQx/lvjRdCa43vmSReD8
dJN1jgkz6h9Q6jJCdwLVl8DXz7IO8vAJ7nbpDlHPOZ+t4FndrgbHw0kiwbIk9HUp
HueWxhf60cmOJtUnvKmVKnhie3jsd/T/W+N4DbNv1vO+cwtVFTC2KKhT0I5xCPd1
7q/L6zszpeBk4fUyrNN4vM3wwoNykmOENtEv9095Q99ccLxIYuhwMKaqTFV1XhNT
gPczshvyxfznReSAC5dRGLf/dG7tgoYEBtS7gc6ejOuMSfJe1b4Xp4hwY6QU3d/A
mTSguXEP3qqZAh5ZRMc2M723ACu906y4OWLPtP2Ip5DGWH5JWYrUR7hZ8yr6zyOd
A5gEWMM/7QEIAKjsUvglQZ00JPotnDGiGNE95ZGXnSn7bkOSWGHOdNkHW3Z7kmmZ
H/R619Xc8TNQ7rN6sQp2uz3wcvqYF5k5B+gG1st3RKii5tIwTykvgIn4ddoitn/h
agrvUjxJmh9pyS8HlhHfp12qnZ7Upyzl0DfHg8Qpht61Wr9/tbnUkuHTzEYOqag1
5rVBSKqB/yRaYG8XPOYN8pn5AUWdnyHxT0MBLbwfDeA5HuHYz2yx0QCZHjiJga3B
zPe570GvsE9spoDjYyN8cdpO4FG4NhMqwFwQMSWWH02/ouM+93XXwLnlFUaCwP6h
AdJfjbHMiMx5UFn/bY8MJi9Ia7GbA4dQQhMAEQEAAQAH/RnYmvNL5Ae1IElFLEZt
2mU9lsAZlhsD1QGyxSIl8Dv6w7RTwPm2S6zhFOAsn50t72/3wFntA8Y84aLVHZs8
niiSz0+vbops7mtPp/URxxWVNhcLw6e6ajrFFmySCGpxCa7P9tbCRT3wKpDQUcnt
WdgHB3K+tduinQF6/WezDkxOF5aTCMiocdmx68ufAmgDrOFhY2o13/i1ODaOwt1b
2izlWGHYoSCye4tADqXGr6zLENlwm5xilV1nn+XQSDKqTksLSarVebKUn0bJJpDx
lKbiMO2rTDkLL37V8XetREU/wiOREnu1XY1y4Y7ciKDR4fTXEHVVDRU6rpXun9PN
NZUEAMXX0UWJgmCtLoJaF8+toshE6o23UYllZyv4JCuVv2KANiq9IFKX0lnKnffa
CRivKTuRhD/Qfao7hK9TkRDfIhQTvRPp6xTjqkfe1BNE9+wco0bTa6ECGDCkZ9eV
mV8mKD+931kHIKL0XaXEAdadDXS0NJIb93DhO2hDEx/c78VXBADalDZV8ivldVV7
+GSehIoaXYIS7Mj+FImj4fTTeQeE3G/qD7Q5OchnBTDCqxHdnh9V0TJkhW0KcaLA
hnoR+rn8LLKXHRk01EWNPlpKp7Iky4//gGHEolfETDTtDksfMwEnmo4JUhkINgku
icA96d14pnFhZOykMVk8wYkbrgbXpQP/QsyHzvST5Xzk31qhAbIv5nUQHOCcQUrc
qf//U9OSk87/BFV+iL65ILFlKrpjG27rIfnxCO9s220FEkeXa9nio26PkpJdtKwg
4BeBIUmBozX+Z+qnwEilzwOt1pbdVIWjbsfCK83zeLGSv+7fcMaWf4UM+Bhi9zHj
LApKiZSl77lHJ4kBPAQYAQgAJhYhBLOcCrXtjHl9JaVZt93sayB8eWeBBQJYwz/t
AhsMBQkDwmcAAAoJEN3sayB8eWeBwI4H/2FZvwPtZRkUbBdZmrdgQZXYPE0Qi++8
JIT0PVAfF2oUGYPWMTks+G5FaoBaWC3cqkNU9pym63FTTNu84z1V54bXwJy8Czuu
9BWNxe4oEu4vKqqW/cTF4gXIruET5uywC+2RiO6XRlSlaR3pD8LgLYQ6LmZ/JdKB
eyj65AO7lnkaP6BrEhEF0dMrY5hR6afSoqp0DyhvBajUSLDnMvq9hl5MAIxEZ1Mc
Ix7sEC+9Pa0daKBXp+8t80Sqp+Qpj/Dk4LYs3IIMp95F51qxF2b0+eimpV7SWEdT
s7/Duz3ZCmtk87r2NJkw2UzY3mXL2i4tQYPiu8m8riFfJiUkPk8SF5s=
=bte1
-----END PGP PRIVATE KEY BLOCK-----

You could print that out and try to type it in to recover. That would be impossible. You could also print it in a nice font and try to use OCR to read it. That might work, but if the OCR software makes even a tiny error, you'd be incredibly hard-pressed to find it. I do not recommend doing this.

You could pipe this into xxd, which would turn it into a hex dump. OCR software may have an easier time with only letters and numbers, but I still don't recommend it.

If you do go for this method and try to restore from backup, here's how you do that (after getting your paper key into text file form by either typing or OCRing):

$ gpg --import my-key.txt

Cutting down the size using paperkey

First, we install paperkey. You could go over the sources yourself to verify it's not malware then compile it yourself, but I have a nominal level of trust in the Debian package maintainers. Thus this is fine for me:

$ sudo apt install paperkey

Next, we export our secret key, pipe it through paperkey to make sure we only end up with the secret bits, then pipe that into a file. There are a couple of different ways to do this.

Using paperkey's built-in base16 output

This is perhaps the easiest way to proceed. Run the following:

$ gpg --export-secret-key "Joe Bloggs" | paperkey
# Secret portions of key B39C0AB5ED8C797D25A559B7DDEC6B207C796781
# Base16 data extracted Sat Mar 11 00:27:46 2017
# Created with paperkey 1.3 by David Shaw
#
# File format:
# a) 1 octet:  Version of the paperkey format (currently 0).
# b) 1 octet:  OpenPGP key or subkey version (currently 4)
# c) n octets: Key fingerprint (20 octets for a version 4 key or subkey)
# d) 2 octets: 16-bit big endian length of the following secret data
# e) n octets: Secret data: a partial OpenPGP secret key or subkey packet as
#              specified in RFC 4880, starting with the string-to-key usage
#              octet and continuing until the end of the packet.
# Repeat fields b through e as needed to cover all subkeys.
# 
# To recover a secret key without using the paperkey program, use the
# key fingerprint to match an existing public key packet with the
# corresponding secret data from the paper key.  Next, append this secret
# data to the public key packet.  Finally, switch the public key packet tag
# from 6 to 5 (14 to 7 for subkeys).  This will recreate the original secret
# key or secret subkey packet.  Repeat as needed for all public key or subkey
# packets in the public key.  All other packets (user IDs, signatures, etc.)
# may simply be copied from the public key.
#
# Each base16 line ends with a CRC-24 of that line.
# The entire block of data ends with a CRC-24 of the entire block of data.

  1: 00 04 B3 9C 0A B5 ED 8C 79 7D 25 A5 59 B7 DD EC 6B 20 7C 79 67 81 F905AA
  2: 02 8B 00 07 FB 06 1E D2 34 D4 83 5E 32 78 AC 2D 4D 59 BA CD B5 41 84698B
  3: A2 AE 3C 4F CA 56 85 A1 21 78 9C 73 DE 27 C5 68 52 69 DD 8F 0B 1D 5CBE5F

... lots more output ...

This is a pretty good printout to have: it has instructions on how to recover, it has CRCs for every line and for the whole thing, everything's in nicely formatted base16. There is a problem: it can't be easily read by a machine.

So you could OCR it and feed it into paperkey again, but this is error-prone. There are CRCs, so you can easily check if each line is correct and guess the correct line (since the errors are not random, OCR will mistake similar-looking letters for one another).

Here is how you would do that (again, after converting your paper backup to a text file):

$ paperkey --pubring pubring.gpg --secrets printout.txt --output secretkey.gpg

Where the pubring.gpg is the file located in ~/.gnupg after you've imported your public key. You can then import secretkey.gpg as usual.

Converting to a machine-readable paper format

You might want to convert to a QR code, barcode or some other machine-readable image format. For this section, I'll assume you have your secrets in a secrets.bin file, made as follows:

$ gpg --export-secret-key "Joe Bloggs" | paperkey --output-type raw --output secrets.bin

There are a few ways to convert this to a machine-readable image: let's start off with the most well-known, QR codes.

Encoding as a QR code with qrencode

Install qrencode:

$ sudo apt install qrencode

Now you can encode your secrets.bin as follows:

$ qrencode --8bit --level=M -o key.png < secrets.bin

It looks like this:

Note: the reason I used the error correction level of M instead of the highest, H, is that the resulting image would be obnoxiously large and hard to print out (except on a poster). Printing a large image on a small sheet of paper would only defeat the whole purpose of error correction: you would be introducing more errors (squashed image) in order to reduce errors.

If you find that your key's image is way too big, turn the error correction level to L (this is actually the default).

This is great, but if your key is huge (maybe your chosen key format is just big), you might notice that the largest type of QR code, a version 40 177x177 one, can only store 2953 bytes. This is a serious problem since if your key is too big, you're simply SOL - there is no workaround. We'll have to find a different format.

If this problem doesn't affect you, then I highly recommend this method: it's really easy to restore a QR code backup with any QR code reader.

Encoding as a data matrix with dmtxwrite

Install the program:

$ sudo apt install dmtx-utils

Now, encode your secrets:

$ dmtxwrite -e 8 -f PNG -o key2.png < secrets.bin

You'll end up with this:

This has fewer problems than QR codes, in my opinion, since there is no limit to the size of the data you're encoding. It is just as easy to restore the data, though, so really this is an almost purely aesthetic choice if either will work for you. If your key is huge, use this. If your key is not huge, it doesn't matter which you pick.

Encoding with a linear barcode

No, this is stupid. Do you know how long that barcode would have to be? And half of it would need to be error correction stuff. And folding the paper in half would probably triple the number of errors you get.

Conclusion

I would personally slap my private key into a paper data matrix and CD-R and call it a day. I keep a folded up copy of my private key in my wallet, just in case. In case of what, I have no idea.

Just don't put your private key online (even if it has a passphrase), that's just idiocy. Even if you're storing it with someone you trust, this doesn't mean the network between you and them is also trustworthy. The only type of offsite backup of this sort of sensitive data you should be keeping is one that you physically transport offsite yourself.

Making a list of the websites of people on nixers.net

2016-05-05T00:00:00Z

I wanted to make a list of the websites of the people on the website http://nixers.net, and I decided to solve it not by asking people to tell me what their sites were called, but by scraping the forum.

I didn't scrape the whole forum, I just scraped one topic on the forum that I created a few years ago: https://nixers.net/showthread.php?tid=1547

There really isn't much to it, all I had to do was to fetch all of the pages of the forum post and somehow go through them and retrieve all of the URLs, then do some filtering.

Getting the pages wasn't very difficult, I just used my good old friend cURL:

$ curl https://nixers.net/showthread.php?tid=1547 > 1.html
$ for i in {2..8}; do curl "https://nixers.net/showthread.php?tid=1547&page=$i" > ${i}.html; done

So I have all of the pages in the current directory. All I need to do now was do some text processing. As always, the first tool in my toolbox when I need to do complicated string matching involving regexes is Perl. Sure enough, there is a Perl module on CPAN for this. It even comes with a script to make running it super easy.

The module is URI::Find and after CPANing that, the script it installs is urlfind. The documentation for the script can be found here.

What I need to do is find all the URLs, remove all of the ones that aren't personal sites, remove all of the duplicates, then store the result in a file. Psh, no problem.

$ urifind -n * | \
grep -vE 'nixers.net|github|imgur|openbsd|tumblr' | \
sed -e 's/https/http/' -e 's,/$,,' -e 's,http://,,' -e 's,/.*$,,' | \
sort | uniq > sites

I also snuck in a sed command that removes trailing slashes and all of the URI path stuff, just leaving us with the domain names, which is all we really want, anyway.

This gave me a file that I then went through to remove anything the command didn't get rid of. The list I got is reproduced faithfully below:

albertocg.com
andrew.harrison.nu
arcetera.moe
arcetera.party
b4dtr1p.tk
blog.neeasade.net
blog.xero.nu
bugsofberk.net
charliethe.ninja
code.xero.nu
elliottpardee.me
eyenx.ch
fontvir.us
git.b4dtr1p.tk
icetimux.com
jona.io
josm.xyz
kaashif.co.uk
literallyryan.weebly.com
lugm.org
neeasa.de
neeasade.net
nullball.nu
pluviophile.xyz
ports.brianctomlinson.com
pub.iotek.org
punkweb.co
purestench.blogspot.com
qoob.nu
quitter.se
redpanduh.com
rocx.rocks
s0lll0s.me
stenchforums.net
strangequark.tk
thevypr.com
u2620.net
venam.1.ai
wildefyr.net
www.brianctomlinson.com
www.dafont.com
www.letterheadfonts.com
www.unixcri.me
xcelq.org
xero.nu
xero.owns.us

So there you have it, a list of websites you might want to check out. I'll also put this on my about page, in case this post gets buried (by me, in the future).

Sharing /home between OpenBSD and Debian

2016-05-03T00:00:00Z

Some people reading this might be thinking: "hey, it's really easy to do this, why is he writing an article on this?". You are partially right, this should be really easy, but there are some weird things that happened while I set this up that I feel should have been written down somewhere, so my fear that I was completely borking my system would have been assuaged.

Anyway, let's get onto the first thing I had to do:

Resizing my Windows partition

Yes, I have Windows on my laptop. Yes, Stallman wouldn't be too impressed. But come on, occasionally (very occasionally nowadays), I have to run a program that only has a Windows version. In any case, I don't have to justify myself to you, dear reader.

Doing it from inside Windows turned out to be less than easy, since I was unable to shrink the C partition even after disabling and cleaning out the pagefile, some system restore points and a load of other files the Disk Cleanup Utility removed. I don't know what sort of hidden, immutable system files were around stopping me, but I decided to just boot into Debian, forcibly shrink it, and worry about it later.

I booted into Debian, started up Gparted and resized the partition without issue. I then created a new partition (around 5GB in size, I don't really keep too much stuff in my /home, this was really too big).

What filesystem should I use?

This is the next big issue: there is not that much overlap between the filesystems supported by OpenBSD and Linux. Well, certainly every filesystem OpenBSD supports is supported by Linux to some extent, but there are a few problems with most of them.

FAT32

Really, I can't use FAT32. It's 2016, how can I be caught storing my files on FAT?

Seriously, though, it has no journaling, it's slow, it's really crap and my data would all be corrupted and irretrievable after 5 mins.

On the other hand, it is very well supported by both OSes, but no, let's move on.

NTFS

This is the really perverted option. Using NTFS to store the home partition of 2 different Unix-like OSes. This doesn't seem like too bad of an option really, since it is a fairly modern filesystem compared to FAT32.

But most programs are designed with Unix permissions and ownership. This is possible to setup with NTFS, but it doesn't work out of the box, which is what I'd prefer. Moving on.

ReiserFS, XFS, BTRFS, ZFS, others

OpenBSD has very limited (read: none) support for any of these.

Just using the filesystems I already use

Why should I go through all of this hassle when I could just format the partition with ext4 or UFS and be done with it?

Well, rw support for UFS does exist under Linux, but very little attention is given to the default ufs (the FreeBSD one, as far as I know), let alone the ones where you have to add the option ufstype=something (44bsd for OpenBSD's UFS). I really don't trust the Linux support for UFS at all. Obviously there is no support for soft updates or any of the SSD stuff OpenBSD has either.

ext4 support technically does partially exist under OpenBSD, since you can mount an ext4 partition with all of the ext4-specific features (journaling, some extended attributes, basically everything in /etc/mke2fs.conf) disabled, but then you might as well be using ext2.

The only choice remaining

Yes, that's right, I decided that plain ext2 was the best for my needs.

Now all I had to do was format it and change my /etc/fstabs to point to the right place. A piece of cake, right? Not quite, I ran into some compatibility issues.

Partition table woes

I had already formatted the disk as ext2 while I was booted into Debian, so I expected to just have to copy over my files, change the fstab and that would be that.

Since OpenBSD was my main OS, I had to boot into OpenBSD to copy that home directory to the partition I had made (and formatted) in Debian. The partition was /dev/sda7.

Now, for those of you who don't use BSD, you won't be familiar with BSD disklabels. They come from a time when you weren't allowed more than 4 partitions per disk due to the limitations of the MBR (this is still true for a good number of computers, so I guess it isn't a complete anachronism). On my laptop, I can have a lot more partitions that that thanks to GPT: I can have up to 128 partitions, more than anyone reasonable would need, so really, the disklabels just get in my way, since they are basically just a way of cramming 16 partitions into 1 MBR or GPT partition.

I had 7 GPT partitions already and 10 OpenBSD "partitions" in the disklabel in OpenBSD's little area of my disk. This makes a total of 17.

This shouldn't really be a problem, except due to the way OpenBSD's partitioning system works, it relies entirely on the disklabel to tell it where the partitions are on the disk, and since Debian obviously has no reason to update the OpenBSD disklabel, there is no record of the ext2 partition we can see.

But what does that have to do with there being 17 partitions? Well, disklabels can only keep track of 16 partitions, so I have to get rid of some of the partitions we already know about.

Anyway, the only place I could get the info to add the ext2 partition was in the GPT, the very place the partitions are stored. OpenBSD should really just look at this itself every so often:

$ doas fdisk sd0

Disk: sd0       Usable LBA: 34 to 117231374 [117231408 Sectors]
   #: type                                 [       start:         size ]
----------------------------------------------------------------------
   0: EFI Sys                              [        2048:       204800 ]
   1: e3c9e316-0b5c-4db8-817d-f92df00215ae [      206848:       262144 ]
   2: DOS FAT-12                           [      468992:     68757504 ]
   3: Win Recovery                         [    95830016:       921600 ]
   4: 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 [    96753664:     20475904 ]
   5: OpenBSD                              [    77418496:     18411520 ]
   6: OpenBSD                              [    69226496:      8192000 ]

Yeah, I gave the partition the type OpenBSD in the hopes that this would help OpenBSD see it. No, that didn't work.

Anyway, the actual solution was really simple, I just had to edit the disklabel and add a record of this partition:

$ doas disklabel -e sd0

At that moment, my disklabel looked something like this:

# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: LITEON IT L8T-64
duid: 0000000000000000
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 7297
total sectors: 117231408
boundstart: 77418496
boundend: 95830016
drivedata: 0 

16 partitions:
#                size           offset  fstype [fsize bsize  cpg]
  a:           350400         77418496  4.2BSD   2048 16384    1 # /
  b:           536980         77768896    swap                   
  c:        117231408                0  unused                   
  d:           544256         78305888  4.2BSD   2048 16384    1 # /tmp
  e:           648896         78850144  4.2BSD   2048 16384    1 # /var
  f:          2029760         79499040  4.2BSD   2048 16384    1 # /usr
  g:          1160512         81528800  4.2BSD   2048 16384    1 # /usr/X11R6
  h:          4567424         82689312  4.2BSD   2048 16384    1 # /usr/local
  i:           204800             2048   MSDOS                   
  j:           262144           206848 unknown                   
  k:         68757504           468992   MSDOS                   
  l:           921600         95830016 unknown                   
  m:         20475904         96753664 unknown                   
  n:          2171776         87256736  4.2BSD   2048 16384    1 # /usr/src
  o:          2811648         89428512  4.2BSD   2048 16384    1 # /usr/obj
  p:          3589760         92240160  4.2BSD   2048 16384    1 # /home

I just made this up, but you get the idea, it's basically a complete mess and I've exhausted the number of partitions I can have. I just deleted partitions m and n, since I'll probably never need those on separate partitions, and added a line with the fdisk info for my new partition:

m:            8192000         69226496 ext2fs

Then I added the line to my /etc/fstab:

/dev/sd0m /home ext2fs rw,nodev,nosuid 1 2

And I deleted the old /home line, of course.

After copying the files, I rebooted and everything worked.

Or did it?

panic: system on fire

After booting into OpenBSD and using it for a bit, it worked, so I decided to boot into Debian and see if it worked there. It did.

But then I tried to boot back into OpenBSD and got a rather scary error:

** /dev/rsd0m
BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
/dev/rsd0m: BLOCK SIZE DETERMINED TO BE ZERO

The system could boot, though, but all of my files had mysteriously vanished. Running fsck_ext2fs gave the same error, leading me to believe something was actually wrong.

I booted back into Debian and forced a fsck of the partition, but it came out clean.

The moral of the story is that Linux and OpenBSD sometimes do things differently, and in this case it is harmless, since the filesystem works correctly from both OSes, it's just that the number of backup superblocks Debian decided to keep was different than the number OpenBSD was expecting. I could look up the right way and the exact options to use to have perfect interop, but it works as is, so I can worry about that later.

All I needed to do was disable the fscking of the partition at boot in the /etc/fstab on OpenBSD, and both OSes booted up fine.

Conclusion

Hopefully, someone at some point will benefit from this post, even if it's only that someone looks at this and decides it's too much trouble.

Converting my blog to frog

2016-04-29T00:00:00Z

What is frog?

Frog is a static website generator written in Racket. It does the same sort of thing as Jekyll, Hakyll and other software like that, some of which I've used in the past.

What does "frog" mean?

There's a simple answer to this on their README:

Q: "Frog"? A: Frozen blog.

I'm not 100% on what that means exactly, but the program works great, so who am I to say that their name doesn't make sense?

What this blog post isn't

They have some really great documentation on how to get started using frog and I'm not trying to write a blog post replicating all of their effort.

What I really want to focus on is how I converted my Hakyll website into a frog site.

Converting all of my posts into the frog format

The bulk of the work was going to be converting all of the metadata I had from the Hakyll/Jekyll style:

---
title: My Post Title
date: 1970-01-01
comment: something witty
---

into the frog format, which doesn't have a comment field (I added that on top of Hakyll's default config myself anyway), but does have a tags field and looks like this:

My Post Title 1970-01-01

So the header at the top of every file had to change almost completely. Let's think, what tool can we use to go through a load of files and transform each line based on a series of regexes and maybe something more complicated?

That's right, I wrote a Perl script!

The aim of this exercise is just to come up with a script where I can do ./myscript.pl post and it'll output the changed post, so that I can then just use sponge to overwrite the post itself.

(If you don't know about sponge, it's a really cool program for when you want to do something to a file in-place, you can't be bothered with a temporary file, and the program you're using doesn't support in place operations, install it! It's usually in the package moreutils on most distros, or at least it is on Debian.

Writing the script

It's a pretty standard script to be honest, since this is what Perl is built for, processing text. Also, there's a long tradition of sysadmins whipping up quick Perl scripts to do things, since Perl is everywhere.

I wrote the following in script.pl.

#!/usr/bin/perl
use strict;
use warnings;

my $fname = $ARGV[0];
open(my $file, "<", $fname);

while (my $line = <$file>) {
    if ($line =~ /^---$/) {
        next;
    } elsif ($line =~ /^title:/) {
        my $title = ($line =~ s/title: //r);
$title";
        next;
    } elsif ($line =~ /^date:/) {
        my $date = ($line =~ s/date: //r =~ s/\R//r);

        next;
    }
    print $line;
}

I don't know if that warrants much explanation, really, it's an easy to understand script. I made sure not to use $_, since invisible variables really frighten me.

Anyway after that, I just typed something into the shell:

for f in *.md; do
    ./script.pl $f | sponge $f
done

And that was that, all of my posts were now frog.

That's essentially it

Everything else worked as I'd expect, I just needed to convert my templates to frog's style, which is really Racket's style, and that was very easy, since I kept barely any of the logic of my site in the templates.

Well, uh, there's not much more to say. Really, using frog is that easy, I'd really recommend you use it if your current blogging platform is giving you trouble.

Happy frogging!

Hacking StumpWM with Common Lisp

2015-06-28T00:00:00Z

Before a few weeks ago, I was always one of those people who said that Lisp isn't useful, it's not type-safe, it's not pure, Haskell is better etc etc ad nauseam. All of that may be true for writing some sorts of programs, but Lisp (well, Common Lisp anyway) provides something a lot more pervasive.

What does pervasive mean? Well, right now, I'm controlling my window

manager and browser through a Lisp REPL from Emacs, and it's a lot more useful (and fun) than it sounds.

Setting up Emacs

You too can get in on this action very easily. First, install a Common Lisp implementation: I recommend SBCL, usually available in repos as "sbcl" (e.g. apt install sbcl). Next, type M-x package-install RET slime RET into Emacs and you'll have already installed the Superior Lisp Interaction Mode for Emacs. It's as good as it sounds, trust me. Next, add the following to your Emacs init file to make sure slime knows where to find Lisp:

(setq inferior-lisp-program "sbcl")

As an aside, you may also want to install slime-company, the completion backend for company-mode for SLIME. Without it, company-mode completion doesn't really work for SLIME. If you do do that, then you'll also want to add the following to your Emacs init file:

(slime-setup '(slime-company))

I assume you already have run global-company-mode (why wouldn't you), but if not, just add (global-company-mode) to the above to turn it on.

Also, you will definitely want to install rainbow-delimiters and paredit-mode, they are essential to any Lisp programming experience. They are, however, not impossible to do without and I won't go over how to use them in this article. Do install them, though, they are really cool.

Quicklisp

The de facto Common Lisp library installer is Quicklisp. You'll definitely need it, and need SLIME set up to work with it. Here's how to do that. First, download quicklisp.lisp and run it (this is copied and pasted from http://www.quicklisp.org/beta/):

$ curl -O https://beta.quicklisp.org/quicklisp.lisp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 49843  100 49843    0     0  33639      0  0:00:01  0:00:01 --:--:-- 50397

$ sbcl --load quicklisp.lisp
This is SBCL 1.0.42.52, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.

  ==== quicklisp quickstart loaded ====

    To continue, evaluate: (quicklisp-quickstart:install)

* (quicklisp-quickstart:install)

There will be more output, but unless it's something glaringly errorful, you're good to go, just tell Quicklisp to add itself to your .sbclrc and quit:

* (ql:add-to-init-file)
I will append the following lines to #P"/Users/quicklisp/.sbclrc":

  ;;; The following lines added by ql:add-to-init-file:
  #-quicklisp
  (let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp"
                                         (user-homedir-pathname))))
    (when (probe-file quicklisp-init)
      (load quicklisp-init)))

Press Enter to continue.


#P"/Users/quicklisp/.sbclrc"
* (quit)
$

And that's that, you don't need to worry about telling SBCL about any library stuff again. You do, however, need to add the following to your Emacs init file:

(load (expand-file-name "~/quicklisp/slime-helper.el"))

That will make sure SLIME plays nice with anything you do with Quicklisp in the future (it will see all of the libraries you install, completion will work etc).

Some light Lisp hacking

Now you're ready to get hacking with Lisp! Just to test out SLIME, open a file and input the following:

(defun hello-world ()
  (format t "Hello, world!"))

Now, M-x slime. You'll see something like this:

; SLIME 2015-02-19
CL-USER>

That's the fabled REPL everyone always talks about. You should still be in the buffer with the hello-world function, so type C-c C-l to load it, then C-c C-z to switch to the REPL it was loaded to. You could just use C-x o, but you may have multiple buffers open and it may be more convenient to switch right to the REPL.

Now you can execute the procedure you just wrote by typing (hello-world) and hitting return. You should end up with something like this:

CL-USER> (hello-world)
Hello, world!
NIL

Some of what makes Lisp special

Well that wasn't exciting, you can do the same thing with Python and inf-python, or Ruby and inf-ruby! Anyone can load code from a buffer and play around with it, what makes Lisp special? Well, here's a quote from some guy who works on space stuff:

Debugging a program running on a $100M piece of hardware that is 100 million miles away is an interesting experience. Having a read-eval-print loop running on the spacecraft proved invaluable in finding and fixing the problem.

That's actually a quote from a Lisper at JPL talking about how useful a REPL is for debugging.

Think about it: you're running a window manager, you want to change a tiny bit in the configuration, but you don't want to restart the window manager, that could take seconds if not tens of seconds. You're already in Emacs, so wouldn't it be great if you could control and modify the state and configuration of your WM while it runs?

I certainly think so. There aren't millions of dollars on the line here if your WM crashes, but you could definitely save some time.

Anyway, onto the reason the post exists: StumpWM.

Getting StumpWM set up

You installed Quicklisp earlier, and for good reason. You could install StumpWM using the system package manager, but it tends not to work out well. I couldn't even get StumpWM to start with Debian's stumpwm package, because of errors involving the also installed cl-asdf package. I assume it was out of date, or something wasn't being loaded properly or maybe I'm just an idiot.

Anyway, to install StumpWM, open up SBCL and eval the following:

$ sbcl
* (ql:quickload "stumpwm")

It will take care of all dependencies and everything for you. Best of all, you don't need to fiddle with any manual loading of libraries, since Quicklisp takes care of all of that for you.

Now, replace the last line of your .xinitrc with the following:

exec sbcl --load /path/to/startstump

In that startstump script, place the following:

(require :stumpwm)
(stumpwm:stumpwm)

SBCL already knows about all of the libraries Quicklisp installed, so this will start StumpWM. Just one more thing: you want to be able to debug it live, right? Add the following to your .stumpwmrc (well, create it with the following content):

(in-package :stumpwm)

(require :swank)
(swank-loader:init)
(swank:create-server :port 4004
                     :style swank:*communication-style*
                     :dont-close t)

This won't work until you install the swank library:

$ sbcl
* (ql:quickload "swank")

Going back to the .stumpwmrc, notice how the port is set to 4004? I do that so that when you start SLIME in Emacs, there are no errors because the default port is actually 4005. This ensures you can't mess up your WM by accident while writing unrelated code.

OK, ready for the moment of truth? Kill your X session and run startx and you should see a "Welcome to StumpWM message". If it didn't work, chances are there are some errors in the TTY you started X from. Kill X and look at them if something went wrong. Chances are something went wrong before the swank server started, so you wouldn't be able to use SLIME to fix those errors.

If everything worked, fantastic!

A taste of what's possible

So you're sitting around coding up the next Node.js webscale NoSQL business synergy application when you notice something you want fixed with your window manager. You want to fix it right now with minimal hassle. No worries, you can do it from within Emacs!

M-x slime-connect. When prompted for host, accept 127.0.0.1. When prompted for port, put in 4004 (not 4005). You are now inside the live Lisp image of your WM. Exciting, right? Why not see if you can really control it?

CL-USER> (require :stumpwm)
NIL
CL-USER> (stumpwm:select-window-by-number 1)
NIL

That should've switched to window number 1...so you are in control! Why not rebind a key?

CL-USER> (stumpwm:define-key stumpwm:*root-map* (stumpwm:kbd "u") "exec urxvt")
NIL

Try it: press your prefix key then "u" (by default, C-t u) and a urxvt (replace with your favourite terminal) will spawn.

And you did it all without leaving Emacs or restarting your WM!

I hope this has opened up a whole new world of Lisp hacking for you. For me, it was the gateway drug. I now dream about macros and s-expressions.

Happy hacking!

How to get a list of processes on OpenBSD (in C)

2015-06-18T00:00:00Z

Is it portable?

First off, the information in this post definitely doesn't apply to Linux (as it has a completely different way of doing things) and may or may not apply to other BSDs (I see that NetBSD and FreeBSD both have similar, maybe identical, kvm(3) interfaces). There certainly isn't anything in POSIX to make this standard. The only real reason

any UNIX-like OSes have this particular interface comes from the kvm(3) man page in OpenBSD:

The kvm interface was first introduced in SunOS. A considerable number of programs have been developed that use this interface, making backward compatibility highly desirable. In most respects, the Sun kvm interface is consistent and clean. Accordingly, the generic portion of the interface (i.e., kvm_open(), kvm_close(), kvm_read(), kvm_write(), and kvm_nlist()) has been incorporated into the BSD interface. Indeed, many kvm applications (i.e., debuggers and statistical monitors) use only this subset of the interface.

So even with that, only the "generic portion" of the interface is "standardised" (although it's not really standardised, it's just de facto). Hence, using kvm_openfiles(3) and the like has no reason to work on any other OS the same way it does on OpenBSD.

Actually writing some code

The information about running processes is stored somewhere: the kernel. Even in Linux, with procfs (/proc), the info all really comes from the kernel.

On OpenBSD, you want to access the running system's kernel image. This is done using kvm_openfiles(3), which takes a host of parameters detailing the file you want to load the kernel image from. Obviously, you don't want to load a file, but the running kernel. To do this, just pass in NULL as the parameters that have anything to do with files: the function will know this means you want the running system:

#include <stdio.h>
#include <kvm.h>
#include <limits.h>
#include <sys/param.h>
#include <sys/sysctl.h>

int
main(void)
{
    char errbuf[_POSIX2_LINE_MAX];
    kvm_t *kernel = kvm_openfiles(NULL, NULL, NULL, KVM_NO_FILES, errbuf);

You should really check if it's null or whatever, but for didactic purposes, I'll just leave out all of the boring error handling - it's obvious where the error string would be stored (hint - errbuf), so you can handle that.

Next, you want to get a list of processes from that kernel image using kvm_getprocs(3):

    int nentries = 0;
    struct kinfo_proc *kinfo = kvm_getprocs(kernel, KERN_PROC_ALL, 0, sizeof(struct kinfo_proc), &nentries);

Again, check if null, handle errors. The number of processes obtained is stored in nentries. If you're wondering, the "0" in the arguments to kvm_getprocs(3) actually doesn't matter - KERN_PROC_ALL is an operation that doesn't take an argument. There are other useful operations which do take an argument, so see kvm_getprocs(3) for info on those.

Now, you obviously want to go through the processes and do something with them: you know how many processes there are and you have the pointer to the first one, so a simple for loop with a counter will do. Why don't we just print the binary name for every process?

    int i;
    for (i = 0; i < nentries; ++i) {
        printf("%s\n", kinfo[i].p_comm);
    }

Now that was easy. There are a lot of fields in the kinfo_proc struct, and there actually isn't a man page for them, since the full definition is available in <sys/sysctl.h>. Look there (/usr/include/sys/sysctl.h) for info on information you can get.

Almost forgot: this is supposed to be a valid program, so return something:

    return 0;
}

The final code

#include <stdio.h>
#include <kvm.h>
#include <limits.h>
#include <sys/param.h>
#include <sys/sysctl.h>

int
main(void)
{
    char errbuf[_POSIX2_LINE_MAX];
    kvm_t *kernel = kvm_openfiles(NULL, NULL, NULL, KVM_NO_FILES, errbuf);
    int nentries = 0;
    struct kinfo_proc *kinfo = kvm_getprocs(kernel, KERN_PROC_ALL, 0, sizeof(struct kinfo_proc), &nentries);
    int i;
    for (i = 0; i < nentries; ++i) {
        printf("%s\n", kinfo[i].p_comm);
    }
    return 0;
}

Compile that with cc -lkvm main.c and run it to get some output (hopefully).

Iodine is cool

2015-04-19T00:00:00Z

I know that sometimes, I've bene stuck in an airport or in a coffee shop without internet. That's annoying in and of itself, but it's even more annoying when there's a WiFi hotspot nearby, but it requires you to pay £4/hour or something crazy like that.

You can still connect to the network, it's just that whatever URL you

visit in your browser, you get redirected to the "pay for internet" page. Obviously, you can't SSH tunnel somewhere, or use an ordinary HTTP proxy, those things are all blocked.

In desperation, I would ping google.com, since that's what everyone does to check internet connectivity. There, I noticed something:

$ ping google.com
PING google.com (216.58.210.46) 56(84) bytes of data.

How did it know the IP of google.com? It must have resolved it using a DNS server, it's not like I have a big list of sites in my /etc/hosts.

That means I can request information from a server, and get back information from on any server on the internet. All it would take is someone running a DNS server (or similar) putting some data in the records it sends back and I'll have a way of communicating to the outside... without even logging into the hotspot.

Iodine - how to install and set up

I was ready to write a solution myself, but it already existed. Iodine! Here's how to set it up.

First, set up an A record pointing to your public server and an NS record pointing to that A record. Basically set it up like the readme tells you to.

Now, on the server pointed to by the A record (assuming Debian):

$ sudo apt install iodine

Obviously that's not going to do anything. On Debian, there is a config file at /etc/default/iodine. Fire up your editor and open it, and make the relevant variables look like this:

IODINED_ARGS="-c -d tap0 192.168.233.1/24 t1.mydomain.com"
IODINED_PASSWORD="mypassword"

Where t1.mydomain.com is an NS record, like in the README. Put in something secure for the password. Now, all you need to do is start it up:

$ sudo systemctl start iodined

And the server should be running. Check systemctl status iodined for any errors if you're really paranoid.

The client

On the client, setup is just as easy. To make your life a bit easier, why not make iodine a service? Install iodine and edit /etc/systemd/system/iodineclient.service and put in the following:

[Unit]
Description=Iodine DNS proxy

[Service]
ExecStart=/usr/sbin/iodine -fP mypassword t1.mydomain.com

[Install]
WantedBy=multi-user.target

Now, just systemctl start iodineclient and wait a while for it to connect and you'll be golden. You'll get an IP of 192.168.233.2, if you used the same settings I did. The server it's running from should be at 192.168.233.1, so you can run whatever web proxy you want there, and you'll be able to access it so long as you can query DNS servers for t1.mydomain.com.

Setting up Squid

The whole point of this was to access the web from access points where all I have is DNS, so I need a web proxy. I chose Squid. Setting it up is really simple. On the server:

$ sudo apt install squid
$ echo 'http_access allow localnet' | sudo tee -a /etc/squid/squid.conf
$ sudo systemctl enable squid
$ sudo systemctl start squid

I know there are a lot of other things you can do with Squid, but with this, you can just fire up iodine and use 192.168.233.1 on port 3128 as your HTTP proxy and it'll be like you paid that £400/second to access the internet, but your wallet will stay full.

Rainbow brackets in Emacs

2015-04-11T00:00:00Z

You know something that really annoys me? When I'm writing some Racket, Clojure, or any other Lispy language, and my editor won't cooperate. Emacs is far, far, better than most other editors for this sort of thing, mostly due to paredit-mode and SLIME (and geiser-mode, and clojure-mode, and evil-mode, and...), but there's still one problem I hadn't solved until recently.

Matching up sets of parentheses.

I know what you're thinking, a quick 2 second Google search would have told me to install rainbow-delimiters, and I did, with M-x package-install RET rainbow-delimiters RET. Then I added it to my .emacs file:

(require 'rainbow-delimiters)
(global-rainbow-delimiters-mode)

And that was that. Or was it? Take a look at this:

How can you tell the colours apart when they're so...gray?

How could I fix that? All the colours look the same! I don't want have to set all the colours manually, I just want to programatically make all of the colours brighter and less drab and indistinguishable from each other. Is that so much to ask?

It turns out that it's not, and there's an easy way to do that, just M-x package-install RET cl-lib RET (although you might already have it) and add the following to your .emacs (after requiring rainbow-delimiters):

(require 'cl-lib)
(require 'color)
(cl-loop
 for index from 1 to rainbow-delimiters-max-face-count
 do
 (let ((face (intern (format "rainbow-delimiters-depth-%d-face" index))))
   (cl-callf color-saturate-name (face-foreground face) 30)))

That increases the saturation on all the colours to the maximum, so they look nice and bright.

Much better.

OpenShift vs Heroku: Haskell

2015-03-30T00:00:00Z

There are a few Platform as a Service (PaaS) services out there, and the most famous is probably Heroku. I know I've had people come up to me and suggest using Heroku for the next big thing they're planning. There is a problem with Heroku: it's not free (libre). That didn't stop me from at least trying it out to see what all the fuss was about.

Deploying a Haskell app on Heroku

Obviously, it wasn't as easy as just git push heroku master, or this post wouldn't exist. No, it was more complicated than that.

I couldn't just push my Happstack app (a toy app, now hosted at http://quenya.kaashif.co.uk) to Heroku, so I had to do a bit of searching around. The best way to do this was to create an app with the buildpack set to either https://github.com/begriffs/heroku-buildpack-ghc or https://github.com/mietek/haskell-on-heroku. The latter was apparently faster, so I tried that out.

Setting the buildpack was as easy as:

$ heroku buildpack:set https://github.com/mietek/haskell-on-heroku.git -a quenya

Apparently, it was now just as easy as pushing. So I tried:

$ git push heroku master

And it failed, with the following errors:

*** ERROR: Cannot build sandbox directory
*** ERROR: Failed to deploy app
*** ERROR: Deploying buildpack only

There was some other output, too, but those were the important lines. This wasn't supposed to happen. After looking around on the tutorial page, I see that this error occurs when you're trying to build using a one-off dyno. Maybe I could still make it work!

$ heroku run -s 1X build

And that failed, telling me private storage was expected (an S3 bucket). Well, I should've expected that, I suppose, considering the errors I got when the push failed are only supposed to occur when trying to use private storage.

Well anyway, let's try the other buildpack.

Trying the other buildpack

I set the buildpack to the first one I looked at:

$ heroku buildpack:set https://github.com/begriffs/heroku-buildpack-ghc -a quenya

That worked fine. Then, I pushed.

$ git push heroku master

It seemed to be going well, GHC was downloaded, cabal-install was downloaded, and it was about to run when suddenly:

cabal: error while loading shared libraries: libgmp.so.3: cannot open shared object file: No such file or directory

And it was going so well! If I got this error on my machine, I'd just do some dirty symlink trick to make cabal think it had the right library, but I couldn't do that on Heroku, since I didn't have permission to do anything.

Now that I think about it, this means that the haskell-on-heroku buildpack would have probably failed too, even if I set up an S3 bucket for it.

Anyway, I'd had enough of Heroku and decided to try out OpenShift.

Getting started with OpenShift

I visited https://openshift.redhat.com and signed up for a free account. It was simple enough to make an application, I just had to select a "cartridge" (basically a set of preinstalled libraries and programs), add my SSH key, and pick a name.

There was already a Happstack cartridge which came with happstack-server, blaze-html, and all the libraries I needed, bar a few. Before pushing my app, I decided to install rhc, the command line tool for OpenShift, like heroku.

It has some cool features, like rhc port-forward, which sets up a tunnel so you can access the internal services (Redis, PostgreSQL etc) from your own computer, on localhost. For example, if you run rhc port-forward, you could access PostgreSQL on 127.0.0.1:5432. Cool, right? I suppose Heroku must have something similar, but this isn't supposed to be a balanced blog post.

Anyway, I sshed into the application server to check things out, and it looked like a normal RHEL server, without anything weird or virtual-looking (although it probably is virtual). I had a home directory in /var/lib/openshift/, alongside a number of others. cabal was actually installed, unlike on Heroku, so I didn't need any "buildpacks": the build would take place exactly as it would do on my computer.

All I had to do was git push <really long uri> and it worked. All that happened was that cabal run <host> <port> was run (or something to that effect), and it started the app. Simple, right?

Moving a static site to OpenShift

This is easy, although there isn't a "static site" cartridge. All you have to do it pick a PHP cartridge, and Apache just serves the root directory of the repository (with mod_php enabled). So I just generated my blog, pushed to a new PHP app, and Apache served my site. Easy.

Conclusions

OpenShift is good, use it instead of Heroku. I can't speak for the paid options, but it's a lot easier to deploy Haskell on OpenShift.

Note: I have almost no experience in using either, except for what I've written in this article.

Quaternions, spinors and rotations

2014-12-26T00:00:00Z

Earlier, I was trying to find something I could talk about at my school's maths society. It had to be something exciting, useful, or at least beautiful in some way. I really wanted to do something on quaternions and vectors, because it seemed fun. The problem came when I realised I had to do something more substantial than stand there and explain something that boring. Then I saw this quote:

No one fully understands spinors. Their algebra is formally understood but their general significance is mysterious. In some sense they describe the "square root" of geometry and, just as understanding the square root of -1 took centuries, the same might be true of spinors. Michael Atiyah

Wow, that sounds cool. Maybe I'll be the one to explain spinors to everyone.

Some basics

Before delving into the nitty gritty of what a spinor actually is, there are a few things that need to be understood. This is a matrix:

$$ M = \begin{bmatrix} a & b \ c & d \end{bmatrix} $$

This is that same matrix, transposed:

$$ M^T = \begin{bmatrix} a & c \ b & d \end{bmatrix} $$

This is the conjugate transpose, usually denoted by an asterisk or dagger:

$$ M^{\dagger} = \begin{bmatrix} a^\ast & c^\ast \ b^\ast & d^\ast \end{bmatrix} $$

This matrix $M$ is a unitary matrix, if and only if:

$M^{\dagger}M = MM^{\dagger} = I$
Multiplying by $M$ preserves the inner product of two complex vectors. i.e. $Mx \cdot My = x \cdot y$
$|\det(M)| = 1$

In case you're wondering, the inner product of two vectors in $\mathbb{C}^2$ is defined as:

$$(a,b) \cdot (c,d) = ac^\ast + bd^\ast$$

This is known as the Hermitian inner product.

There are a few other conditions for a matrix to be unitary, but they'd require a longer explanation. In any case, the most important parts are there.

Importantly, all of these properties mean that unitary $n \times n$ matrices form a group, $U(n)$ under matrix multiplication.

If we take all of the elements of $U(n)$ with determinant 1, we get a special unitary group, $SU(n)$.

Special cases of special groups

There are loads of things you can do with these groups (mostly to do with quantum mechanics), but something that really caught my eye was the fact that there exists an injective mapping from $SU(2)$ to $\mathbb{H}$. That is interesting. Obviously, the mapping can't be a bijection, since not all quaternions correspond to unitary matrices. Maybe that's not obvious yet.

To see how $\mathbb{H}$ relates to $2 \times 2$ matrices, let's look at the mapping from from $\mathbb{C}$. Complex numbers in the form $a+bi$ can be represented by a real matrix in the form:

$$ \begin{bmatrix} a & -b \ b & a \end{bmatrix} $$

Using that, it seems natural to say that $\mathbb{H}$ is isomorphic to $2 \times 2$ complex matrices in the form:

$$ X = \begin{bmatrix} \alpha & \beta \ -\beta^\ast & \alpha^\ast \end{bmatrix} $$

Where $X = a + b\mathbf{i} + c\mathbf{j} + d\mathbf{k}$, $\alpha = a+bi$ and $\beta = c + di$.

If we remember the handy quaternion multiplication table, use the right hand rule for vector multiplication, or use this image (very intuitive), we remember that:

$$ \mathbf{i}\mathbf{j} = \mathbf{k},\;\; \mathbf{j}\mathbf{k} = \mathbf{i},\;\; \mathbf{k}\mathbf{i} = \mathbf{j} $$

The only way you can get a matrix in the form above and stick to these rules is to define the unit quaternions as:

$$ \mathbf{i} = \begin{bmatrix} i & 0 \ 0 & -i \end{bmatrix}!,\;\; \mathbf{j} = \begin{bmatrix} 0 & 1 \ -1 & 0 \end{bmatrix}!,\;\; \mathbf{k} = \begin{bmatrix} 0 & i \ i & 0 \end{bmatrix} $$

Obviously, the only quaternions that will end up with determinant 1 will be the ones of norm 1 - the unit quaternions. Thus, the mapping is obviously injective, since not all quaternions have this.

The norm of a quaternion is simple to work out, since (intuitively) norm is a generalised idea of the length of a vector. For example, for the quaternion $X$ I mentioned earlier, its norm would be $\sqrt{a²+b²+c²+d²}$.

Just a note: if you want to multiply quaternions $X$ and $Y$, it's easier to treat their real parts and imaginary parts separately. If we say $X = (x,\vec{x})$ and $Y=(y,\vec{y})$:

$$ XY = (xy - \vec{x} \cdot \vec{y}) + (x\vec{y} + y\vec{x} + \vec{x} \times \vec{y}) $$

This is a lot easier because $\vec{x}$ and $\vec{y}$ are just ordinary vectors in $\mathbb{R}^3$, which are very easy to deal with.

What use is this?

That seems really useless, but remember that quaternions are widely used in computer graphics (among other places) to represent rotations in three dimensions, which means $SU(2)$ is also isomorphic to $SO(3)$. Not only that, there are dozens of uses in quantum mechanics to do with spin, QCD and, you guessed it, spinors.

But what is a spinor? You could find out on the Wikipedia page, but chances are you won't understand any of it without either spending a long time trawling through a massive tree of articles or having someone explain it.

I would explain it now, but I have other things to do.

That's it?! But I still have no idea what a spinor is!

Well then, I guess you should come to Maths Society on Thursday the 5th of February!

While I definitely won't have anything finished before the day, you can check out any work in progress at http://kaashif.co.uk/code/spinor.git/.

ShareLaTeX on OpenBSD

2014-12-06T00:00:00Z

The other day, I was trying to access http://sharelatex.com at school, and it didn't really work, probably due to a combination of Internet Explorer and possibly an overzealous filter that could have been blocking something. That's what I thought, anyway, until I tried it on Chrome and it still didn't work. Odd. The best solution was obviously to set up my own ShareLaTeX instance on my server.

Dependencies

On the ShareLaTeX wiki page about dependencies, I saw that I needed Node.js, Grunt, Redis, MongoDB, Aspell, and TeXLive. All of these are packaged in OpenBSD and they can all be easily installed:

$ sudo pkg_add node redis mongodb texlive_texmf-full aspell latexmk

After that, I installed Grunt using npm:

$ sudo  npm install -g grunt-cli

That's that. I read through the instructions and they said I needed to configure MongoDB, but that's not actually necessary.

The hard part

I chose to install into /var/www, as per the recommendations on the wiki, so I cloned the repo into /var/www/sharelatex and ran the commands. It's important to note that Grunt expects the make binary in the PATH to be GNU make, and I couldn't be bothered to find some way to change this expectation, so I moved BSD make and symlinked GNU make in its place:

$ mv /usr/bin/make /usr/bin/bmake
$ ln -s /usr/local/bin/gmake /usr/bin/make

Now I was ready to install billions of npm packages and let Grunt set up trillions of config files:

$ git clone -b release \
https://github.com/sharelatex/sharelatex.git \
/var/www/sharelatex
$ cd /var/www/sharelatex
$ npm install
$ grunt install

That shouldn't show too many errors. Now, I followed the rest of the instructions on the wiki page I linked earlier. In case you're too lazy to go there, they're reproduced below (and edited for BSD):

Make a sharelatex user, and chown all files to it:

$ useradd -b /var/www/sharelatex -G sharelatex sharelatex
$ chown -R sharelatex:sharelatex /var/www/sharelatex

Move the config files to a better place:

$ mkdir /etc/sharelatex
$ mv /var/www/sharelatex/config/settings.development.coffee \
/etc/sharelatex/settings.coffee

Edit that config file and make sure the dir variables read as follows:

DATA_DIR = '/var/lib/sharelatex/data'
TMP_DIR  = '/var/lib/sharelatex/tmp'

Make all of the directories:

$ mkdir -p /var/lib/sharelatex/data/{user_files,compiles,cache}
$ mkdir -p /var/lib/sharelatex/tmp/{uploads,dumpFolder}
$ chown -R sharelatex:sharelatex /var/lib/sharelatex

That's it, there are only a few things left to do.

A problem and a fix

I tried to run ShareLaTeX, and it wouldn't work. It seems like there's a well-known bug in Node.js (or something like that) that causes it to fail unless you do the following:

$ cd /var/www/sharelatex
$ rm -rf web/node_modules/bcrypt
$ npm install

Apparently, it's something to do with dependencies, but it doesn't matter, I just ran the above command and everything ended up working.

Init script

Obviously, the Upstart script supplied is impossible to use on OpenBSD, but it's not too hard to whip up an rc.d script. Here's the one I use:

#!/bin/sh
# /etc/rc.d/sharelatex
daemon=/usr/bin/tmux
daemon_user=sharelatex
daemon_flags="new-session -s sharelatex -d 'grunt run'"

. /etc/rc.d/rc.subr

rc_stop(){
    ${rcexec} "sudo -u sharelatex ${daemon} kill-server"
}

rc_cmd $1

Obviously, it's a bit primitive, since it just runs in tmux with no logging and is killed by killing tmux, but it does work, and you can start and stop it.

The last thing I needed to do is to add everything to /etc/rc.conf.local:

pkg_scripts=redis mongod sharelatex

And start it up with:

$ /etc/rc.d/sharelatex startbash

Now, it should work. If you've been following along, browse to port 3000 on your server to check it out. I advise setting up Apache or Nginx (or some other web server) as a reverse proxy and using TLS to access it. Configuring web servers isn't within the scope of this post, though.

Check out http://sharelatex.com, though, it's really cool. Maybe you'll want to set it up yourself, too (if you haven't already)!

Switching to Mercurial

2014-11-15T00:00:00Z

UPDATE: I now use Mercurial on the client side (i.e. everything I do locally involves hg) and Git on the server-side. It just makes it easier to mirror to Gitorious, GitLab, etc. You can view all the repos at http://kaashif.co.uk/code.

Earlier in the year, I was getting curious about version control

systems other than Git. Git was my first VCS, and the one I was most familiar with, but I didn't want to be...locked in (cue gasps). Obviously, there isn't any potential for actual lock-in, but there is a Git (and even more worryingly, a GitHub) monoculture developing that I don't want to be part of.

I could lie and say that it was for practical reasons, but at that time, I was concerned with the fact I was becoming mainstream in my choice of VCS. The first alternative I looked to was Darcs.

Migrating to Darcs

When I switched to Darcs, the first thing I had to do was convert all of my repos from Git. This posed a bit of a problem, as 99% of guides are for converting from Darcs to Git, which is obviously not what I want. In the end, I just settled for:

$ for repo in *; do
> (cd $repo; rm -rf .git; darcs initialize);
> done

That's not ideal, because I'm essentially nuking all of my history for no reason, but as a lone developer, it doesn't matter. So that was fine. I guess I can't hold it against them that no-one wants to migrate to Darcs. Well actually, I can, but I chose not to.

Eventually, after setting up a darcs account and pushing all my repos to my server, I decided I wanted some sort of cgit-esque web interface. I was pleasantly surprised to discover that you don't necessarily need anything special to browse Darcs repos, you could just use a normal web server's directory listing capabilities. I did that for a while, but it was pretty bad. In the absence of any bundled solution, I took to the web and found darcsweb, which wasn't too bad. Sure the last release was in 2008, but that means it's feature-complete and mature.

Darcs: the good, the bad and the exponentially complex

Darcs is good. It's written in Haskell. Everyone loves Haskell. You know who loves Haskell the most? The guys who wrote the Glasgow Haskell Compiler. But, uh, they don't use Darcs, they use Git. They used to use Darcs, but it's apparently too slow for them.

I didn't think that would affect me: GHC is an absolutely massive project.

I was wrong.

After trying to commit the entirety of the OpenBSD source tree (a VCS should be able to handle this) and doing some merge-fu, it was time to go back to Git. Evidently, Darcs' merging algorithms and patch algebra needed some work.

Why switch?

After the recent news that the Go team decided to switch to Git, I decided once again to switch away from Git. This time, I wouldn't come back. The obvious choice was now Mercurial: it is widely touted as being easy to use, competetive with Git in terms of speed, and overall, it's supposed to be actually usable. Also, Mozilla use Mercurial for their massive projects, so at least someone other than the creators use Mercurial.

You might be wondering why I told that story about Darcs. Well, it's mostly just so that you have more than 2 data points to compare. Let's get comparing, then.

Mercurial is really, really easy

Git has a propensity for being really hard to use and cryptic. There's the classic example of reverting a file. How do you do that?

(Git)

$ git checkout file

(Mercurial)

$ hg revert file

Well I suppose you'd eventually remember that checkout means revert... not that bad, right? Well, putting aside the fact that there exists a git revert command that does something completely different, sure. It's not that bad.

Let's look at reverting to the last commit.

(Git)

$ git reset --hard c0mm1th45h

(Mercurial)

$ hg rollback

I know, I know, I've cherry-picked examples that paint Git in a bad light. Whatever fruit I've been picking, the fact is that Mercurial is a lot easier to use, and a lot harder to mess up than Git is.

Also, it's shockingly easy to convert a Git repo (or any other repo) to Mercurial. Let's say there's a repo in "myrepo":

$ hg convert myrepo

Then there'll be a converted Mercurial repo in myrepo-hg. That's fantastic!

So, after some quick Googling and playing around in test repos, I converted all of my Git repos and pushed everything to BitBucket. But I needed to set Mercurial up on my server.

hgweb

Hgweb comes with Mercurial and is, as the name sort of suggests, the gitweb of Mercurial. It was a breeze to set up, you just fill in the path to the config in the script itself, set it up as a CGI script, and write a config file. Mine looks sort of like this:

[paths]
blog = /home/hg/blog
tau = /home/hg/tau

And a load of other repos. You can take a look at http://hg.kaashif.co.uk.

Wow, that was barely a paragraph. Mercurial really is easy!

Committing to BitBucket from a hook

I really like it when I can self-host 90% of everything and just use GitHub, Gitorious and now BitBucket as a glorified backup. I already have an online repo browser and SSH server, all I need is for my server to push to BitBucket whenever there are incoming changes.

This was surprisingly easy. All I had to do was edit the ~/.hgrc of the hg user and add a global hook and path to a script:

[hooks]
incoming = /usr/local/bin/push_to_bitbucket

Now, when there are incoming changes, they'll be received then pushed to BitBucket. Well, they will be after I write the script:

#!/bin/sh
hg push ssh://hg@bitbucket.org/kaashif/$(basename $(pwd))

Within 30 mins, I had everything set up. A lot quicker than I was able to do it with Git, although I suppose I now have the benefit of hindsight and experience.

I've even heard that with hg-git, I never have to use Git, even when upstream is Git-only. Maybe this time, my migration will be permanent.

Pkgsrc on Slackware

2014-10-11T00:00:00Z

Back in the day, I used to use Slackware. It was the best distro around and all the cool kids used it. Nowadays, it's rather different: a much lower proportion of people use Slackware. Despite the efforts of Eric and Patrick (and whoever else), Slackware isn't really all that popular. It's still a solid distro, though. There is one problem

that seems to annoy people.

No dependency management.

But isn't that a selling point of Slackware?

Maybe it is, but it causes a lot more problems than it solves. If you're going for a system where not a single thing was installed that you didn't want and where you've built all the packages and kernels from source, then you're a lot better off with Gentoo (which has a much cooler selling point: USE flags).

Using binary packages without dependency management just causes problems. If you try to use sbopkg (the SlackBuild package manager), you quickly find that having to build from source and not being able to manage dependencies is the worst of every single world: installing anything takes ages and the builds will all probably fail halfway through.

Of course, there is a "solution" in place: queues. You're supposed to look up the whole dependency tree beforehand and queue up all of the packages, one by one. Pretty tedious.

There are alternatives, like slapt-get, but those require you to surrender all control to the package manager, at which point you might as well go back to Ubuntu. So we want dependency management and control. Where can we turn? Of course, NetBSD's pkgsrc.

Getting started

You might think that since it's part of the NetBSD project, pkgsrc can only run on NetBSD. That's not true, it can run on any vaguely POSIXy OS. To get started on Slackware, you need to make sure you have a few of the sets. I'm not sure how many you actually need, but this is what I got it down to:

The need for "a" is obvious: that's the base. You need "d" and "l" for compiling, which we'll be doing a lot of. "n" for downloading pkgsrc. "ap" for, well, being able to do things while you build everything (which could take a while). Eventually, you could uninstall "ap" and "n" and replace everything with programs from the pkgsrc tree. You can't do that yet, though.

Next, get and extract the pkgsrc tree:

# cd /usr
# wget ftp://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc.tar.gz
# tar xzvf pkgsrc.tar.gz

Now, there is a ports-looking sort of thing under /usr/pkgsrc. Nifty. But you need NetBSD's make to use it. Luckily, there's a bootstrap script to get you started.

# cd /usr/pkgsrc/bootstrap
# ./bootstrap

Something should happen and at the end, there should be some binaries in /usr/pkg/bin and /usr/pkg/sbin. That's where all of the pkgsrc installed stuff will go, so you might want to add the following to your shell's profile:

export PATH=$PATH:/usr/pkg/bin:/usr/pkg/sbin

Additionally, you might want man pages.

export MANPATH=$MANPATH:/usr/pkg/man

Now everything's set up to start building. You might not have an X server yet and even if you do, you'll want to replace it with pkgsrc's. Installing a port is easy:

$ cd /usr/pkgsrc/x11/modular-xorg-server
$ sudo bmake install

On my old laptop, this took several hours. If you have a more recent machine, it shouldn't take too long.

While building, you might think "Hey, why's it installing Perl, I already have that!". That's a valid question, but pkgsrc doesn't detect dependencies by looking in /usr/bin/, it searches its own package database, which is completely empty.

This means that you'll end up building Perl, Python and dozens of libraries you already have. After pkgsrc is done, you might as well uninstall all of the Slackware packages you don't need. Go to /var/log/packages and and removepkg everything you don't want.

What now?

You can search the ports tree with /usr/pkgsrc/pkglocate for all your favourite software and install it. There's more software in pkgsrc than there is in the default Slackware repos and it's a lot more convenient than sbopkg. For more info: http://www.netbsd.org/docs/pkgsrc/.

Disclaimer: I do not use pkgsrc on Slackware on a day-to-day basis, so don't blame me for your problems. Send something in to the pkgsrc mailing list or some NetBSD IRC channel instead.

NetBSD "Review"

2014-10-07T00:00:00Z

Earlier today, I was discussing operating systems and came onto the subject of ease of installation. Which OS had the easiest installer? The obvious answers would tend towards OSs with GUI installers, but is that really easy? Sure, it could be familiar, but there's a lot more you have to do with GUI installers compared to, say, OpenBSD's

installer, where you can just mash enter (other than typing the root password) and get a usable system. But then, my attention was drawn to the text-based faux-GUI installers of FreeBSD and NetBSD. Were they any good? I had no idea.

Essentially, what started out as idle thinking over whether NetBSD was any good (I already tried out FreeBSD and indeed used it a lot for servers a while back) turned into this "review" of NetBSD. I'm careful with that word, because that implies my opinion is to be trusted when in fact, I don't know all that much about OS design.

The machine

I'll be installing NetBSD on (not a ThinkPad this time) an old Dell Studio 15 laptop with a Core 2 Duo, ATI Radeon graphics and a reasonably OK 1200x800 screen. OK, that's not great, but I've used a lot worse - this is at the higher end.

Getting installation media

This bit was pretty easy. I just searched "NetBSD", clicked on the first link that mentioned mirrors, found one close to me, and downloaded an ISO, which I burnt to a disk. Just for reference, the location of the ISO I downloaded was ftp://ftp2.fr.netbsd.org/pub/NetBSD/NetBSD-6.1.5/amd64/installation/boot.iso, which is a link to the amd64 image from a French mirror.

While some people might think burning to a disk is anachronistic, I like having a little memento of my first NetBSD installation. Maybe I'll even buy a set of NetBSD CDs if I end up using NetBSD enough.

Installing

Booting from the CD went fine. There were no surprises when it came to picking a language, keyboard layout and choosing what sort of installation (installation without X, full install etc). I did run into a bit of a problem when it came to partitioning the disk, though.

The installer prompts you whether you want to use the whole disk (yes), whether you want to overwrite the partitions that already exist (yes) and whether you want to look at the finished partition table (yes).

It was all well and good up to this point, where I saw that NTFS and ext2 partitions were still present on my disk, and that the NetBSD part of my disk had not used the whole disk: only 300GB out of the 500GB. There wasn't any obvious "delete partition" button, so I changed the type of the partition to "unused", which seemed to delete the partitions. The NetBSD part was still a bit undersized, but it really didn't matter: I was "warned" that I needed at least a few GB of space to install NetBSD at the start.

I suppose I could live with a bit of unused space - I probably wouldn't use it anyway.

Getting the sets

For the uninitiated, installation sets are the actual stuff that is extracted and installed onto the hard disk: the stuff on the CD is just the installer, a way of getting the network set up to download the sets and install them.

There was a bit of a problem, however: iwn wifi adapters don't tend to work out of the box with completely free OSs.

Surprisingly, when I selected DHCP to automagically set up my network, something seemed to be happening: I got an IP from somewhere, a gateway, a default DNS server... this must have been from someone's passwordless network down the road. I tried to change the settings to those of my network, but I couldn't find where to enter the SSID or WPA key. I could have looked harder, but I had a Ethernet cable so I just used that.

My troubles didn't end there, however. The installer pinged the router to make sure it was working. It was. Reassured, I tried to install the sets from FTP using the ftp2.fr.netbsd.org mirror. After a few minutes of waiting on "Connecting to ftp2.fr.netbsd.org", it told me it couldn't connect. Maybe they suddenly went down? It's possible. Next, I tried ftp.netbsd.org, the master site. There's no way that could be down, but I was told that I couldn't connect.

The router was fine, the firewall was fine, but no internet on my NetBSD installer, despite it working on my laptop an inch away. Since I could still ping things on the LAN from the installer, I assumed it was some network config mistake and that I'd better just download the sets to my HTTP server and point the installer there, inside the LAN.

That seemed to work fine and I was able to get to the "Congratulations ... you have installed NetBSD ... reboot". So I did.

The first boot

While rebooting, I decided to read the massive warnings that appeared, warning me to change some rc.conf variables or not be able to boot to multiuser. Sure enough, I rebooted and it wouldn't go into multi-user. My bad, I suppose. I remounted the root read-write and edited rc.conf, changing "rc_configured=NO" to "YES". Why that is needed is beyond me, it seems to serve no purpose. Maybe it does, but that shouldn't be my problem.

Now I had to add a user (why wasn't this done already, by the installer?) and configure sudo access. Wait, no, sudo isn't installed! So Lua is in the kernel, but sudo isn't in base. Great priorities. Maybe there's an actual reason for this, but again, I don't really care.

After doing all of that, I do end up with a mildly usable system with vi, su, csh... Who am I kidding, an OS isn't even almost usable unless you can get onto the internet, so let's do that next.

Networking

OpenBSD uses /etc/hostname.if files to manage configs for each interface. Debian uses /etc/networking/interfaces. Everyone seems to use something different, and I was right to suspect NetBSD was not unique in its uniqueness. This was something that couldn't be fumbled through by trial and error, I had to look it up on the internet, using the (actually quite good) NetBSD guide http://www.netbsd.org/docs/guide/en/chap-net-practice.html .

So it turns out they use /etc/ifconfig.xxx files, which is actually a whole lot more sensible than what OpenBSD calls them. The configuration syntax is 99% the same as what OpenBSD uses, so I can't fault it there. This is, of course, because it's essentially a script that runs ifconfig for each line.

After consulting this page and finding out that NetBSD uses wpa_supplicant, I worked out how to set it up from the examples and now I could start the network!

# /etc/rc.d/network restart
Stopping network.
...
add net default: gateway 192.168.0.1: Network is unreachable

Uh... I suppose I have to add an /etc/ifconfig.iwn0 that reads "up", just to make sure the OS knows I want iwn0 up. No that still doesn't work. Maybe if I reboot, it'll "just work".

The second boot

After the second boot, everything works as I'd expect. Now networking works, I have a user (kaashif, obviously). All I need to do is to get package management working and I'll be able to get everything exactly the same as I have it on my current box. After consulting the FTP mirror to find the right package path (/pub/pkgsrc/something/version/ or something similar), I looked here only to discover that it works almost the same as it does on OpenBSD, the environment variable even has the same name: PKG_PATH. So I set that and...

# pkg_add sudo

Wow, everything seems to work! Let's try X:

$ startx

That works and starts up twm. From here on out, everything would be that same as on any Unix-like OS.

Conclusions

Quite a bit more hassle than OpenBSD, I think, although it could just be that I'm more used to OpenBSD. There is no doubt, however, that OpenBSD's installer lets you get a system up and running faster. That's not too useful in and of itself, since I don't know how good NetBSD as a day-to-day OS and I don't plan to use it.

I like OpenBSD's WPA implementation more, there are fewer programs to worry about and fewer knobs to fiddle with in rc.conf (none, actually). Also, I prefer OpenSMTPD to Postfix. And a recent version of pf to whatever NetBSD comes with now (the configs are ever-so-slightly different).

These are all tiny complaints, though. With more usage, I'm sure I'd come to see that NetBSD is at least usable.

Learning Racket

2014-09-25T00:00:00Z

I've heard a lot of things about Racket (well really, things about many different Lisp dialects), and most of them were good. Recently, I decided to try to decide between Haskell and a Lisp once and for all. I wanted to go for a Lisp-1, since they keep functions and values in the same namespace, which is how it should be. Eventually, after

trying out Guile, Chicken Scheme, and a few others, I settled on Racket.

Setting Racket up

This was pretty easy. Unlike with Guile, OpenBSD has a very up-to-date Racket package, so I just installed that.

# pkg_add racket

Now I had a Racket interpreter living at /usr/local/bin/racket. Sure, I could have just run that and started to play around, but I really wanted to install a mode for Emacs, which is what all serious Lisp users do. A quick Google revealed geiser-mode as the premier Scheme interaction mode. Opening up Emacs, I installed that with a quick "M-x package-install RET geiser-mode RET", and that was that.

There were a few bindings I had to get used to, but the real interesting part of my experience was using Racket, not my editor (although geiser-mode is pretty good).

After playing around for a bit and installing paredit-mode, I was ready to get started.

First impressions

The first thing you notice when using a Lisp is the large number of parentheses. In Racket, sometimes square brackets are used to clear things up, which really helps. I've never seen this in Common Lisp or other Schemes, so this must be something Racket-specific.

(let ((a 5)
      (b 10))
  (+ a b))

That seems pretty trivial and non-threatening, right? Right? Well, it doesn't stay very nice for long unless you use square brackets or some other distinct type of bracket.

(let ((a (map (lambda (x) (equal? x "hello"))))
      (b (get-field info some-struct)))
(map list a b))

Maybe you don't think that's too bad, but it looks a lot better with square brackets. Mainly, it lets you keep track of which parens close with expressions. Even with rainbow-delimiters mode, it's really difficult! It's a bit less difficult in Racket, though.

Coming from Haskell, it was a bit jarring to see that there was no type safety at all in default Racket. You could have something like this:

(if (equal? (read-line) "error out please")
  (read-line 4 5)
  (+ 3 4))

Of course, since (read-line) doesn't take any arguments, you'll get an error at runtime complaining about giving it the wrong number of arguments. But that's the point, you only get the error at runtime. In fact, if you never type in "error out please" you'll never get an error. Oh man, that's scary: you're supposed to catch all errors at compile-time with a good type system.

Typed Racket

In Racket, you start every file with a declaration of what language you're using. Usually:

#lang racket

But sometimes you might be writing a document or info file for raco, so you'll want a different one. Here's one I always use, especially after that horror story about runtime errors:

#lang typed/racket

Pretty self-explanatory, I think. Here's how some normal Racket code might look:

(define (square n)
  (* n n))
(define my-numbers (list 1 2 5 3 8 5))

In Typed Racket, you can put in type annotations, too! It's usually pretty easy to see what's meant. Here are the appropriate type annotations for the two bindings above:

(: square (-> Number Number))
(: my-numbers (Listof Integer))

Those actually do look similar to type annotations in Haskell:

square :: (Num a) => a -> a
myNumbers :: [Int]

Personally, I like the Haskell syntax more. Of course, since Racket is a Lisp, the syntax can be changed infinitely, so that's not really a complaint.

All in all, the Racket REPL is cool (although nothing special nowadays), the language is good if unsafe, Typed Racket is nicer, Haskell, ML, OCaml and other "real" functional languages still feel nicer to me, though.

A few days isn't enough to really get to know a language, though, so maybe I'm wrong. Until someone tells me so, I won't use any Lisp as extensively as I do Haskell, mostly because it feels unsafe.

Emacs is great

2014-09-18T00:00:00Z

I've seen many completely stupid articles where people furiously circlejerk over how Vim is the best and nothing will ever come close, but it's rare that I see anyone write much about Emacs, probably because fewer people use it (or maybe Emacs attracts a different demographic). It's rare I see an article like

this one, where the author's stupidity is front and center.

I admit that Atom can't replace Vim, as the title says, but here is the kicker:

If this sounds a bit commonplace, it's because Emacs' big idea has been widely influential and extensibility is today a standard feature in any serious editor. Sublime Text uses Python instead of Lisp, and Atom uses Coffeescript, but the fundamentals of commands and keymaps are built in to the core. Even Vim has absorbed Emacs' extensibility: Vim script can define new functions, which can be mapped to command keystrokes.

While Emacs' big idea has caught on, hardly any program, let alone any editor lets you customise it in the way Emacs does. For example, it is entirely possible (although stupid) to use Emacs as your web browser, email client, it's even possible to use Vim in Emacs, with evil-mode. Hell, I'd argue that with Emacs' nigh-infinite extensibility, evil-mode is even better than the "real deal" - you get all of the fantastic Emacs modes you can't get in Vim, due to its limited extensibility.

Here's another gem:

This philosophy of minimalist commands that can be composed together is the fundamental originating philosophy of Unix, and Vim exemplifies it like no other editor.

Unix philosophy = good. OK. I get it. I hear the same thing from anti-systemd zealots who spout this tripe without knowing what it means. The Unix philosophy tells you to write programs that do one thing and do it well. Programs. One thing. What we have learnt from the blog post is that you shouldn't find the best editor, you should find the most minimalistic editor with composable commands. Somehow, I think the author uses Vim with many plugins instead of piping text through program after program, each of which only does one thing.

The lesson here is that you should look for the best tool, not the one that most fits into your twisted, misinterpreted view of a philosophy almost entirely unrelated to what your tool should do.

Patching OpenBSD

2014-08-24T00:00:00Z

Recently, I've been trying to understand the ins and outs of CVS in order to be able to contribute to OpenBSD without messing up anything. I have sent a few patches to ports@, but anything complex was beyond my abilities until recently.

How does OpenBSD's contributing system work?

OpenBSD uses a tried and tested method of accepting contributions from users that has remained unchanged for many years - a mailing list. Anyone can send in a patch to ports@openbsd.org or tech@openbsd.org and it will be looked at by the people with commit access and (hopefully) committed into the tree.

But where's this "tree"? What is it? OpenBSD uses CVS as its version control system. While it is a bit old, they have their reasons for using it, so it's best not to complain. Essentially, all history is stored on the server, not locally, which is very different from how a distributed VCS would do it. It doesn't affect actual usage of CVS, though, so don't worry about it.

Checking out the source

From here on out, I'll assume you're trying to update a program in the ports tree you might use. The example program I'll be using is "editors/joe", which is Joe's Own Editor, a very simple editor inspired by Emacs and WordStar (yeah, it's not the newest editor either).

The first step is finding a nearby CVS repo. A good place to look is on this list. Pick one that's near you and export the CVSROOT variable where it says "CVSROOT=anoncvs@blahblah". For example, I'm in the UK, so I'll pick a server in Europe and export CVSROOT:

$ export CVSROOT=anoncvs@ftp5.eu.openbsd.org:/cvs

The next step is checking out the ports tree from the server:

# cd /usr
# cvs checkout ports

Now you have a fully updated ports tree in /usr/ports! Note that I didn't specify any flags: on OpenBSD, there is a default ~/.cvsrc that comes by default in every user's home. Mine looks like this:

# $OpenBSD: dot.cvsrc,v 1.1 2013/03/31 21:46:53 espie Exp $
#
diff -uNp
update -P
checkout -P

Making a patch

Now, you might want to find the file you want to change. In this example, it's in editors/joe, so lets go there in the ports tree:

# cd /usr/ports/editors/joe

I'm not going to explain how to edit Makefiles and update ports in OpenBSD, a guide for that already exists here. Instead, lets assume you already know how to make changes (and you will, after reading that guide).

# vi Makefile
**make some changes**

If you delete any files or add any files, you have to do cvs delete <file> or cvs add <file> for the changes to be tracked properly. Do this before making a patch in the next step.

Now you have to get the changes you just made into an email, which is easy using cvs diff. Run cvs diff and redirect its output to a file somewhere.

# cvs diff > /tmp/joe.diff

If you look at /tmp/joe.diff, you'll see that all the changes you made have been recorded in that file, in the format used by patch(1). You don't need to know how to apply the patch, that will be done by whoever wants to try out your changes. For completion's sake, you do that by running patch -p0 < /tmp/joe.diff in the right directory (/usr/ports/editors/joe, in this case).

Mailing that patch to ports@

In case you don't know how mailing lists work, let me explain. You send an email to an address (in this case, ports@openbsd.org), and it is processed by the mail server and sent out to everyone subscribed to the list, who can then apply the patch.

In your email client, write a new message with the subject "UPDATE: joe 3.7 -> 3.8" (with the right versions, of course), to be sent to "ports@openbsd.org" and CC'ed to whoever was listed as the port maintainer in the Makefile. Write something short and descriptive in the body of the message, like this:

Hello ports@,

This is an update to joe which fixes this bug and that bug and adds a
useful feature.

OK?

In the rest of the message, paste your diff. It's easier for everyone if you just put your diff in the body of the message, because they can then just apply your email as the patch and everything will work out fine.

Now send the mail and wait.

If you get a response, take into consideration any criticism and send a new diff. If someone important (usually with an @openbsd.org email address) tells you it's OK, congratulations! They'll apply your patch, commit it to CVS and you'll have contributed to OpenBSD!

Samba

2014-08-22T00:00:00Z

Recently, I've been trying to get away from pre-packaged file sharing solutions (e.g. FreeNAS) and trying to set up the services they provide from scratch. While I obviously won't be able to write a web GUI or create a whole distro, that simply isn't necessary. What is necessary is setting up a file share and appropriate read/write permissions.

What is Samba?

Among other things, Samba is a collection of daemons - winbindd, nmbd and smbd - that let you provide file shares and printers to any client capable of communicating with Samba. I don't really care about the printer part, but it's nice to know about.

Essentially, if you have a Windows machine, Samba lets a Unix machine announce itself and provide file shares when you click on your Unix box in the file manager. You can do the same with free tools, on Unix, but I don't think any Unix has a default that's as easy to use as Windows', as much as it pains me to say.

Setting it up

If you have a Windows PC, you might have noticed that your Unix box doesn't show up on the network, because it's not broadcasting its name. That's easy to fix, just start nmbd! Assuming OpenBSD:

# /etc/rc.d/nmbd -f start
nmbd(ok)

Now you should see your hostname appear on the network from a Windows machine or other Samba client.

While that may be cool, it isn't very useful. You might want to share some files. Let's assume you have a directory you want to share, at /home/samba. If someone with no valid credentials walks into your house with their laptop, do you want them to be able to read your share? Chances are you do, since you might want people to see your pictures from their phone or whatever. You don't necessarily want everyone to have write access, though.

Keeping that in mind, let's take a look at /etc/samba/smb.conf, which is the config for smbd, the "main" Samba daemon that takes care of authentication, printers and, what we want, file shares.

$ cat /etc/samba/smb.conf
# This is the main Samba configuration file. You should read the
# smb.conf(5) manual page in order to understand the options listed
# here. Samba has a huge number of configurable options (perhaps too
# many!) most of which are not shown in this example
#
# For a step to step guide on installing, configuring and using samba, 
# read the Samba-HOWTO-Collection. This may be obtained from:
#  http://www.samba.org/samba/docs/Samba-HOWTO-Collection.pdf
#
# Many working examples of smb.conf files can be found in the

That goes on and on and on... So let's empty the file first.

# : > /etc/samba/smb.conf

Now there's nothing in it, and you're free to put this in there with whatever editor you want:

[global]
workgroup = WORKGROUP
server string = Kaashif's Server
security = user
map to guest = Bad User
guest account = nobody
log file = /var/log/samba/smbd.%m
max log size = 50

[Public]
comment = Public Files
path = /home/samba
public = yes
writable = no
browseable = yes
guest ok = yes
write list = @staff

Let's go through that line by line:

[global]

These are settings global to all shares and printers.

workgroup = WORKGROUP

This doesn't mean anything significant to a home user, but it should be set to the same as whatever your other PCs are set to. WORKGROUP is the default, so leave it like this.

server string = Kaashif's Server

A short description of your server.

security = user
map to guest = Bad User
guest account = nobody

If a user trying to access your shares is not recognised, it is mapped to the guest user, which is an alias for the "nobody" user locally. So whatever permissions you set for the /home/samba directory, it should be readable by the "nobody" user.

log file = /var/log/samba/smbd.%m
max log size = 50

This creates a log file for each machine that accesses your shares. Have a look in /var/log/samba after you've accessed the share to see what the logs look like.

[Public]
comment = Public Files
path = /home/samba
public = yes
writable = no
browseable = yes
guest ok = yes
write list = @staff

"Public" is the name of the share. The rest is self-explanatory, other than the "write list" line, which means any user in the login class "staff" will be allowed to write. If your user (for example, "fred") is not in this class, then add him:

# usermod -L staff fred

This means that if you give Samba the username "fred" and his password, you can write to the share, assuming the file system permissions allow fred to write there.

Starting Samba

Instead of starting smbd and nmbd separately, you can start them together using the "/etc/rc.d/samba" script:

# /etc/rc.d/samba -f start
smbd(ok)
nmbd(ok)

If you want it to start at boot, edit "/etc/rc.conf.local" and add the line:

samba_flags=""

That should be that, and you should be able to see your server on the network, click on it, and be able to see and download any files from "/home/samba" on your server.

Haskell isn't difficult

2014-07-25T00:00:00Z

People like to say that learning Haskell is hard, because a pure, lazy, functional language isn't something most people are used to. The simple fact of the matter is that most people (nowadays, at least) will be introduced to programming through Python, Javascript or maybe even Java or C#. The problem with learning dynamically typed languages that don't tell you

about types is that you don't get to see an integral part of how your program works. It's worse if you program in a weakly typed language like Perl or Javascript - you won't have any concept of types unless you read around, which is a poor substitute for actually using types.

"But I tried to learn Haskell and it was hard!"

Most people who try to learn Haskell fail to "get it" for a while. After writing programs in Haskell, going through a few tutorials, seeing some examples of code, and having the power of purity and lazy evaluation click, people tend to find Haskell a lot easier. The problem is that most people don't get past the initial barrier to entry, which is "unlearning" the paradigms they're used to.

Coming from Java and C#, they'll have to unlearn that style of OOP and get used to Haskell's way of doing things. Coming from Javascript and PHP, they'll have to learn about types. In fact, unless the beginner we're talking about comes from ML, OCaml or similar, they will have to get used to how much types are used in Haskell in ways they may not be used to.

Nullable Types

In C, let's say, you might want a function which takes a name and does something with it, returning something else, perhaps a boolean indicating whether the name is valid and in some database somewhere:

bool validate(char* name)
{
    // some code
}

That is all well and good, but what if the pointer passed to the function, instead of pointing to an array like you'd expect, is null? You'd have to add in a null check.

bool validate(char* name)
{
    if (name == NULL)
        return false;
    // some code
}

That's not terribly shocking, but it's a bit inconvenient. That seems like a low-level problem you'd never get in a higher-level language like Java, for example. That is wrong. A similar function would look like this:

bool validate(String name)
{
    if (name == null)
        return false;
    // code
}

Why are our types nullable? That seems like a terrible design decision which results in NullPointerExceptions for everyone. Abstracting the hardware away hasn't solved this problem automatically, it seems.

In Haskell, types are not nullable. You can instead use optional types. For example, if you have a function that could fail, you either return "Just" the answer or "Nothing":

validate name = if condition then Just answer else Nothing

While this sort of thing can and is done in other languages, beginners tend not to be exposed to them. Even if they were, the fact is that most code out there is riddled with checks for null.

Even worse, most people are unaware of how easy it is to code such a solution when you have a good algebraic type system:

data Maybe a = Just a | Nothing

In other languages, this sort of thing would probably done via sentinel values, which brings with it a whole host of problems. Beginners would have to unlearn that habit, because it's all to easy to code a "getIndex" function that returns "255" when there's an error, despite the fact that 255 is a valid list index for an element.

Loops? What loops?

This really threw me off when I started learning Haskell. Back when I started to learn Haskell, I thought it looked simple enough to write a simple text adventure in. In Python, such a game could be boiled down to:

player = Player()
while player.alive:
    # get input, update world etc

The key part is that there is a main game loop. This is how it is in most games: there is some sort of loop that is run over and over until the player dies or quits. Simple enough, right?

My first experience with Haskell and IO was with the typical "Hello, world!" program:

main = putStrLn "Hello, world!"

That seems simple, too. Surely it won't be too hard to do something like this:

player = new Player
main = while (alive player) (play game)

Well, that would be very hard for a beginner to do, considering that "player" is a constant (there are no variables in Haskell) and there are no loops in Haskell, unless you install "monad-loops" and use whileM, untilM and the like.

The preferred solution, I now know, is to write a recursive function that uses the State monad to pass along a Player data type. A beginner has no idea how to approach a problem in Haskell when their usual tools are not present.

Monads?

In imperative, "traditional" languages, there is no concept of the monad. In Haskell, it's everywhere, even the "Hello, world!" program was 100% monadic: the main function always has type "IO ()". The "IO" means we're in the IO monad.

While this would be a great time to have a terrible monad analogy tutorial, the easiest way of understanding monads, I think, is to come at them from a mathematical angle. I think I did this in that other post about functors here. A monad is just a container for a value with a few functions defined: "return" takes a value and puts it into a container, and "join" turns a value in a container in a container into a value in a container.

That's really simple, but also really useful. Most beginners are not exposed to anything like monads or functors, so they are naturally very hard to grasp.

Conclusion

Don't be discouraged because you find learning Haskell, Lisp, APL, Forth, or any other esoteric but useful langage hard. The only reason it seems difficult is that you're starting from scratch: it gets easier once you know the basics.

How I share a file, simply

2014-06-08T00:00:00Z

Earlier, I saw this article claiming to describe how to share a file "simply" by running Python's web server module (with either Python 2 or 3). While that may be easy, it's not simple, and certainly not fast.

The largest problem with using Python or Ruby to run a web server instead of a single-purpose program is initialisation time. Regardless of performance under load or latency or whatever, the fact is that running "python3 -m http.server" takes tens of seconds to reach a usable state on an older machine, and minutes on even older ones. This is unacceptable if all you want to do is share a text file which would take less than a second to download - it's ludicrous to spend more time waiting for a server to start than actually using it.

The alternative

I use thttpd. While the site may not look modern and it seems like they went out of their way to make it look ugly, the software that runs it is very performant, portable, and everything you could ever want in a quick-and-dirty web server solution.

Even better it starts within a second even on really old hardware. If you want to share a directory, here's what you do:

$ thttpd

That was very simple, fast and easy! The problem is that you need to be root to bind to port 80, so just kill that process and start another one, like this:

$ thttpd -p 8000

That can be run as any user, since port 8000 is high enough not to need root to be used. Since it daemonizes straight away, to kill it, you need to use pkill.

$ pkill thttpd

And that's that. A simpler than starting up an interpreter for a programming language and loading modules within modules of code just to serve a couple of files.

Scripting it

While that's all well and good, just like the author of the article I linked, I want to be able to run share and share my files, so I wrote a script. This is going beyond the realm of simple, but it'll certainly be convenient.

#!/bin/sh
echo -n "Starting thttpd: "
if thttpd -p 8000; then
    echo "done"
else
    echo "failed"
fi

That does start the server, but doesn't give us any way to kill it and it doesn't tell us our IP address (which we might not know due to DHCP on an unfamiliar network). After echo "done", add in something to tell us our IP, then wait for the user to press enter and kill the server:

#!/bin/sh
echo -n "Starting thttpd: "
if thttpd -p 8000; then
    echo "done"
    ifconfig | grep inet
    read line
    pkill thttpd
else
    echo "failed"
fi

That is OK, but we can do better to show us our IP - as it is, the script outputs this:

inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
inet 127.0.0.1 netmask 0xff000000
inet6 fe80::213:e8ff:fe73:485%iwn0 prefixlen 64 scopeid 0x2
inet 192.168.0.9 netmask 0xffffff00 broadcast 192.168.0.255

Sure the IP is in there, but it's too messy. I replaced grep inet with something a little more sophisticated:

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}'

That regex shows us all of the IPs in ifconfig's output, but it also shows the broadcast address and doesn't show us the port. To fix that, we can pipe the whole thing into this:

awk '!/255$/{ print $0 ":8000"}'

The finished script looks like this:

#!/bin/sh
echo -n "Starting thttpd: "
if thttpd -p 8000; then
    echo "done"
    ifconfig \
    | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' \
    | awk '!/255$/{ print $0 ":8000"}'
    read line
    pkill thttpd
else
    echo "failed"
fi

And its output looks like this:

Starting thttpd: done
127.0.0.1:8000
192.168.0.9:8000

Now you can type share and have a web server up in a minimum of time.

Workflow

2014-05-31T00:00:00Z

The inspiration for this came from this blog post, where the author describes how he uses his computer. While he does use CRUX, a GNU/Linux distro and I use OpenBSD, our workflows are actually surprisingly similar, which can, in part, be attributed to the

inspiration for the ports system both CRUX and OpenBSD took from FreeBSD, although OpenBSD is obviously a lot closer to the original.

Using OpenBSD

OpenBSD is a very traditional Unix-like OS that doesn't get in your way or do anything for you. In contrast to the author of the blog post I linked to, I appreciate a good set of defaults and a smooth install. Indeed, you can install OpenBSD by mashing enter and only stopping to enter a root password, and you'll end up with a very sane set of defaults - an X server, some X programs, tmux, and not to mention the other software that comes in the base system - an FTP server, Nginx, and so on.

Even with all of this "bloat", the install footprint is well under a gigabyte and RAM usage (checked from the text console) stays under 32 MB. So either most GNU/Linux distros are using all that disk space and RAM to do things I don't want or need, or they do the minimum, but a lot less efficiently than OpenBSD.

To summarise, if you want to use OpenBSD, here are the steps to follow:

Download an install image from here
Boot it from USB, CD or wherever
Hit enter until it tells you to stop
Reboot
Install packages

After that, it's really identical to using Solaris, AIX or another BSD - OpenBSD is fully POSIX compliant. That does, however, mean that it's slightly different from the GNU userland (cp -R instead of cp -r, for instance).

Note that I didn't mention ports: the idea with OpenBSD is that you use the packages - the maintainers put a lot of effort into making sure you don't need to compile packages from ports yourself. Largely, every flavour of package you want already exists and has been compiled. Maybe you want vim with Perl, Python and Ruby support - good news! There's a package with that compiled in. You want rsync to use libiconv? They've accounted for that use case too! Ports should only be downloaded and used in very niche cases.

Window Manager

I'm not very adventurous with my WM, I just use i3, a very popular WM. I say that like it's a bad thing, but there's a reason it's popular - it does what it says on the tin very well. The default keybindings are fine for me, although I use Mod4 (super) instead of Mod1 (alt). You can find more about it here, and to install it, you should be able to find it in your OS's repos. It's in OpenBSD under "i3", unsurprisingly.

For the uninitiated, it's a tiling window manager, which means that, most of the time, my windows take up the whole screen and when there are multiple, they "tile", meaning they are sized such that there are no gaps and no windows behind another. Here's a GIF of i3 in action:

Web Browsing

This is the activity I spend most of my time doing. Nowadays, I use luakit, which is a Webkit-based browser with Vim-like keybindings, mostly based on Pentadactyl, a popular Firefox plugin I used to use.

Checking Email

The activity I spend the second-most amount of time on while on my laptop is checking my mail. I'm subscribed to a few mailing lists, but the one I check the most is misc@openbsd.org. A close second is tech@openbsd.org. While these aren't overly high-volume lists, it takes a while for each message to be fetched from the server, and this must be done one-by-one as I view them if I were to use IMAP.

Those seconds add up. Maybe they don't, but waiting a second between reading messages is very annoying, so my only option was to store the messages locally, allowing me to read them with no network delay. Setting this up was easy, I just used the same Mutt config as I used when on IMAP and configured it for local mail in a Maildir located at ~/mail/. Even better, this meant I could configure my system MTA to deliver all system mail there, too (cron job errors, security output, nightly backups).

Here are the relevant lines from my .muttrc:

set mbox_type=Maildir
set spoolfile = "~/mail/"
set folder = "~/mail/"

To actually fetch the mail, I used getmail, a small Python script (can be installed with pkg_add getmail). Setting that up was very simple, too, just install and place the following in .getmail/getmailrc:

[retriever]
type = SimplePOP3SSLRetriever
server = pop.example.com
username = username
password = password

[destination]
type = Maildir
path = ~/mail/
user = yourusername

[options]
read_all = false

Getmail can be run by simply running getmail, but this is very annoying to do every time you want to read mail, and sort of defeats the point - this was supposed to save time. I set up a cron job for it - place the following in the file opened via crontab -e:

30 * * * * getmail

That runs getmail every 30 minutes, meaning your mail stays up to date. If you're ever waiting for an important message, you can still run getmail to refresh your inbox directly, although it is mildly annoying to open your shell every time you want to do that - why not bind it to a key in Mutt?

To do that, add the following to your .muttrc:

macro index "^" "!getmail<enter>"

That means that, whenever in the index, you can press "^" to get any new mail.

No place like ~

You already know I have a ~/mail/ directory, but I have a few more:

~ >> tree -L 2
.
|-- bin -> etc/scripts/bin
|-- etc
|   |-- README.md
|   |-- [...]
|   `-- zsh
|-- mail
|   |-- cur
|   |-- new
|   `-- tmp
|-- notes
|-- sent
|-- src
|   |-- advent
|   |-- [...]
|   `-- www
`-- var
    |-- documents
    `-- music

47 directories, 4 files

As you can see, I keep my source code in ~/src/, data files in ~/var/, mail in ~/mail/, scripts in ~/bin/ and configs in ~/etc/. My etc directory is filled with configs which are symlinked to from their expected locations. I'm sure I've written a post about it at some point.

Just like most people, I'm a fan of text files, but not any ordinary text files - most of my notes are written using Emacs' org-mode, which I believe is a note-taking environment second to none. Unless I'm programming in a REPL language (Ruby, Python, Lisp), I tend to use Vim. I favour Emacs for those languages because...well, you have to use it to really understand what I mean - look up some cool things you can do with SLIME.

You can check out my dotfiles here.

Installing OpenBSD on a T61

2014-05-08T00:00:00Z

I'm sure lots of people (dozens, perhaps) have installed OpenBSD on ThinkPad T61s of some description, but with the recent release of OpenBSD 5.5, lots of documentation has become (or already was, and now is even more so) obsolete, like this article_on_a_Thinkpad_T61)

, which deals with OpenBSD 4.5, which is really old - it's from slightly over 5 years ago! OK, that may not seem too bad, but much has changed in OpenBSD since then, so you might want to reevaluate the situation if you're thinking of upgrading or something.

Hardware

The ThinkPad I bought isn't the best T61, but it is a decent, reliable laptop that served its previous owner for 7 years and will serve me for a while longer. The important bits of the laptop:

1280x800 screen - not quite the 1680x1050 IPS screen of the better model of T61, but good enough for me, certainly.
Intel GMA 965 graphics
1 GB of RAM
250 GB hard drive
Smart card reader
Fingerprint reader
Intel WiFi Link 4965

I made sure that everything I actually use is supported by free drivers. Funnily enough, the T61 I have also supports coreboot, but if you want help using flashrom or something, you'd be better served by looking at their wiki, taking advice from me would probably result in a bricked laptop.

OpenBSD Installation

With the release of OpenBSD 5.5, the project also released USB disk images you can dd straight to install media instead of having to install to the USB drive then boot from the ramdisk kernel. Now, it's a lot easier to install! Assuming you have a 64-bit CPU, you'll want to get the install media here. Every T61 comes with a 64-bit Core 2 CPU, so this is the one you want if you've got a T61. If you have a T60, you might need to get the i386 install media instead, but as far as I'm aware, they're all Core 2 as well.

After getting the install media, you'll want to dd it to a disk. Make sure to modify the following command for your disk:

$ dd if=install55.fs of=/dev/sd1c

After that, the USB drive is bootable, so reboot, wait for the kernel messages to scroll by (all of that text with a blue background), and you'll be taken to a screen where you'll be asked the following question:

Welcome to the OpenBSD/i386 5.5 installation program.
(I)nstall, (U)pgrade, (A)utoinstall or (S)hell?

Here, you'll want to type "i" and hit enter, you're installing. Unless you're upgrading, but then, you probably don't need my help.

The majority of the questions are things any GNU/Linux, or *BSD user should be able to answer, so I won't go into detail on those. Instead, I'll focus on the T61-specific parts. In case you want help with a general OpenBSD installation, look to the OpenBSD project's guide.

Getting the network to work

When you see this:

Available network interfaces are: iwn0 re0 vlan0.
Which one do you wish to configure? (or 'done') [re0]

If you try to use the wireless card straight away, it will not work, and the kernel will spit out the error "iwn-4965: could not read firmware". Due to licensing issues, the OpenBSD project is not allowed to distribute the firmware for Intel wireless cards on the installation media. The best they can do is provide firmware packages here.

You don't actually need the network to install OpenBSD, so I advise that you type "done" and leave the networking be (unless you can connect to the internet using Ethernet) until you boot into OpenBSD for the first time.

When you reboot, you have a few choices:

Find an Ethernet cable to get the firmware
Never use the internet
Put the firmware on a USB drive using another PC

I didn't have any spare Ethernet cables on my desk, but I did have another laptop, do I went for the third option. If you do have an Ethernet cable and a router, pick "re0" as the network interface to configure during the installation, and everything will work fine and you'll be able to download the WiFi firmware through the internet, on your laptop.

Assuming you went the same route I did, you'll want to create a new file system on the now-useless installation media, so you can use it to store firmware.

$ mkfs_msdos /dev/sd1c

Then take out the USB drive, download the firmware on another PC onto the drive, and install the firmware package on your laptop with pkg_add:

$ sudo pkg_add -Dnosig iwn-firmware-5.10p0.tgz

You need to tell pkg_add to ignore signatures, because the firmware packages are signed by a different key to the normal packages, so pkg_add will reject firmware packages' signatures.

Graphics acceleration

Since my T61 has an Intel integrated graphics card, it just works automatically, without me having to do anything special. You can tell if OpenBSD detects your Intel card during boot because the font changes to something that looks a bit different.

If you're still in doubt, try this:

$ dmesg | grep intel

My laptop shows this:

inteldrm0 at vga1
drm0 at inteldrm0
inteldrm0: 1280x800

That means that the Direct Rendering Manager works, and has managed to work out the resolution of the laptop's screen. This means that X will work without any xorg.conf hacks. If you have an Nvidia card, my condolences. The nv driver sucks, despite the stellar work of the OpenBSD project does keeping it maintained. The fact is that reverse engineering is no substitute for actual documentation, so users who have ATI/AMD or Intel cards are far better off than Nvidia users.

Extra features

I haven't tried the smart card reader since I don't have a smart card, and I haven't gotten around to trying the fingerprint reader. It might work, but I don't really care. The one time I did try to use a fingerprint reader, it was cool but pointless.

The PCMCIA port works, I tried it with an 3Com EtherLink III card I had lying around, and it works perfectly.

The trackpad works very well - two finger scrolling works and all three buttons work fine out of the box. Even that odd scrolling thing that happens when you press button 3 and use the clit mouse works. No-one ever uses it, though, but it's nice to know that it works.

Summary

All in all, my experience with the T61 just goes to show that OpenBSD works great as a laptop OS, especially compared to the OS that came with the laptop (Windows 7 and driver CDs everywhere).

Maybe at some point I'll buy a newer ThinkPad and see how the radeondrm support is firsthand.

Configuring Mutt

2014-04-18T00:00:00Z

On all of my computers, I like being efficient. That means eliminating everything which uses all that precious CPU time and using applications which are very customisable and configurable. These sorts of applications tend to be text-based, which is, in my eyes, a good thing, since they'll show you the information you need with a minimum of

distraction.

Mutt is a text based email client, and I think it's very well suited to the task - generally, when reading emails, you are dealing with text. The few times there are images, they're usually annoying HTML emails from websites or as an attachment, meaning it's not integral to the viewing of the email body itself.

Now that you're sold on the idea of a text-only email client, we can get started installing, configuring, and using it.

Installing Mutt

Mutt should be in the package manager repository for your OS. On FreeBSD, you can install it with pkg install mutt. On Debian, it's apt-get install mutt. The command to run should be similar on most operating systems. If you're on Windows, I suggest you install Cygwin and build from source, since that's really the only way you can get Mutt.

Configuring Mutt

If you just run mutt, you'll see that you've opened the local mail box for your current user. In decades past, local mail was actually really important - people sent mail to and from each other's servers, not centralised servers like Outlook or Gmail. In any case, you don't want this - you likely want to access your Gmail account, which is what I'll be showing you how to do. The instructions for Yahoo or Outlook should be similar, but consult their help pages for the server address and port.

You need to create/edit the file ~/.muttrc and input the following:

# The bit before @gmail.com in your address
set imap_user = "myname" 

# Google's IMAP server, used to view mail
set folder = "imaps://imap.gmail.com:993"

# Google's SMTP server, used to send mail
set smtp_url = "smtp://myname@smtp.gmail.com:587/"

# Displayed in the "from" field of your mail
set from = "myname@gmail.com"

# Change this to your preferred editor
set editor = "vim + -c 'set textwidth=72' -c 'set wrap'"

# Sets your default inbox to the one on the IMAP server
set spoolfile = "+INBOX"

# Makes sure all your folders are polled for new mail
set imap_check_subscribed=yes

# Interval to check for new mail
set mail_check = 120

# Time to wait for server if connection drops
set timeout = 300
set imap_keepalive = 300

# Where Mutt keeps drafts
set postponed = "+[GMail]/Drafts"

# Mutt's cache settings, important for speed
set header_cache=~/.mutt/cache/headers
set message_cachedir=~/.mutt/cache/bodies

# Where mutt stores SSL certificates
set certificate_file=~/.mutt/certificates

# Mutt will never move mail from Gmail to your local mailbox
set move = no

# Include message you're replying to in the reply
set include

# Sorting by threads is very useful for keeping your mail in order
set sort = 'threads'
set sort_aux = 'reverse-last-date-received'

# Sets the height of the window opened to view a message
set pager_index_lines = 10

# I prefer not to show some of the headers, they get in the way
ignore "Authentication-Results:"
ignore "DomainKey-Signature:"
ignore "DKIM-Signature:"
hdr_order Date From To Cc

# A useful key binding to refresh your inbox
bind index "^" imap-fetch-mail

That might seem like a lot of variables, but they're all well-named and well-documented in Mutt's online manual. The manual is very extensive, and documents every variable you see above.

Opening Mutt and logging in

With the settings above, you're all set to open Mutt again. Type mutt into a terminal and you'll see a prompt telling you whether to accept Google's SSL certificates. Press "a" for "accept always" a few times until all the certs are accepted.

Next, you'll see a password prompt, like this:

Password for kaashifhymabaccus@gmail.com@imap.gmail.com:

Of course, type in your password, and it'll be sent off over an SSL connection to Google's IMAP server.

If all goes well, the next thing you see should be your inbox. If it hasn't worked, I advise that you ask on an IRC channel. Although it isn't strictly related to Arch, #archlinux on irc.freenode.net is very, very active, so you're likely to get an answer there.

Using Mutt

You can use the up and down arrow keys to scroll through the mail, as well as the standard Page Down and Page Up keys. You can also type in a number and a prompt should appear like this:

Jump to message: 1

If you look ot the far left of Mutt, you should see a column of numbers. Type in any one of those, and Mutt will jump to it.

If you want to open a message, press enter over one, and the bottom 4/5 of the screen should display the message, with the message inde open in the top 1/5.

When you're at the index (the screen you see just after loggging in), you should see this at the top:

q:Quit  d:Del  u:Undel  s:Save  m:Mail  r:Reply  g:Group  ?:Help

You'll see bars like this at all screens of Mutt, so press "?" if you forget what a key does.

To send a message, press "m". You'll have to say who it's for and what it's subject is, then your editor will open, where you can type out the body of the message. It's best to keep the message to 72 columns wide, to maximise viewing pleasure for everyone involved.

After quitting your editor, you'll be taken to a screen with this at the top:

y:Send  q:Abort  t:To  c:CC  s:Subj  a:Attach file  d:Descrip  ?:Help

It should be obvious how to send the mail or change the headers.

If you use mailing lists a lot (like I do), then you'll want to change mailboxes. You do that by pressing "c", typing "=mymailbox", and pressing enter. Of course, replace "mymailbox" with the actual name of the mailbox.

At this point, you should be able to use your new email client for most day-to-day tasks. If you need to do something more advanced, you should look at the manual I linked earlier.

For other interesting and useful text-based programs, this blog isn't a bad place to go.

Ideas for a project

2014-04-18T00:00:00Z

There is always a lot of buzz around the idea of "learning to program". While I think it's very important that children learn logical thinking and problem solving, I also realise that the majority of children, and people in general, would probably not benefit from a very language specific, rote learning based, generally old-fashioned approach to

teaching programming. Those that would enjoy programming largely have the aptitude to learn it themselves, with a bit of guidance. I think that the best way to learn is to do - write programs that serve some purpose or have some goal in mind - like a game or web app.

Obviously, my aim isn't to convert people who hate computers into hackers, but to offer some ideas and advice to the people who are trying to learn, but only find books and articles which walk them through syntax and have some exercises, but no larger projects to complete to draw together everything that they know.

Since writing simulations and AI programs are probably a bit beyond most beginners, and boring system utilities are generally too tedious for a beginner to feel motivated to write, I think the best direction to go in for a project would be a game.

But games aren't serious programming!

I've never heard anyone say the above, but someone might. If they did, they'd be completely and utterly wrong. If you're trying to learn about object oriented programming, a game is perfect! OOP is intuitive if you think of the objects as real things (a player, enemy, level etc). That might not be entirely in the spirit OOP was conceived in, but it works for teaching the basics of OOP: encapsulation, polymorphism etc.

The jist of it is that when you tell an object to do something and fetch an answer, you don't need to know how the answer is obtained, just the answer - the functionality is encapsulated. In turn, this means that you can change the internal structure of the object without repercussions, provided the interface remains the same. Here's an example, in Python.

class Enemy:
    def search(self):
        # Very long and complicated search algorithm
        return direction_moved

enemy = Enemy()

while True:
    direction = enemy.search()
    print "Enemy moved " + direction + " while searching for you"

In the while loop, the Enemy instance's search method is called and a result obtained. The person programming the while loop does not have to know anything about search algorithms or how to write one - the search method takes care of all of that. Even better, if the programmer who wrote the search algorithm changes his mind about it, whoever wrote the while loop doesn't have to change anything.

Games, while they don't have a reputation as serious business among laymen, can be very useful as vehicles for learning about maths, physics and, most importantly for me, computer science.

Where should I start?

That small example, while sort of neat to someone who's never seen a working program before, isn't even valid code, so you can't use it as a starting point.

Before you start a project, I'd advise you learn how to program and solve some simple exercises. A good website for this, is codeabbey.com. Their exercises are quite good for getting to grips with the basics of a language, and get harder as you go.

After you've done that, you can start thinking about what sort of game or program you want to make. Since it's best not to over-complicate things, I suggest you write a text-based game. Since that is a bit broad, let me suggest an adventure game, like "adventure" or "battlestar", which you may have heard of. You probably haven't, so here is an example. That's a screenshot of Zork, a classic text-based adventure game. Essentially, you get a bit of description of your surroundings, then you can type in commands like "go north" or "eat leaflet", and the game responds accordingly. Not very hard to program, but very entertaining.

To make such a game, all you need is to be familiar with standard input and output. Everything else is optional, but it's very easy for such a game to be very deep and complex, both for the programmer and player, believe it or not. Here is a very basic example, in Python.

print("Welcome to Generic Text Adventure!")

running = True

while running:
    command = input("Enter command: ")
    if command == "quit":
        running = False
    elif command == "die":
        print("You died!")
        running = False
    else:
        print("Command not recognised!")

That's not a very fun game, but it's easy to see how it could be extended using your programming knowledge. Perhaps the command processor should be extracted into a separate function? Maybe you could put it into an object of its own and load the list of command from a text file? The possibilities are endless, and the end result will always be something workable, because the game is so simple that you'll never get into problems with graphics drivers or network latency or anything you didn't create yourself.

If you're looking for a simple project and you haven't already made a game - make one! It can't hurt you, and it might expose some weaknesses in your knowledge or just make you feel better about yourself.

How this blog works

2014-04-13T00:00:00Z

UPDATE: I now use Hakyll. See http://kaashif.co.uk/about for more. Also, 100% of the information in this blog post is now wrong or outdated.

When I decided I wanted to write a blog, I had to come up with some way of writing posts (in a markup format which isn't HTML), and serving them somehow.

First, I turned to Jekyll, a static blog generator written in Ruby and used by GitHub. That turned out to work quite well, but I wanted more control over the process of post generation. Next, I went for a CGI application written in good old tried-and-tested Perl. That, too worked, but CGI is a bit of a primitive technology, and didn't scale well when I stress-tested it. Next on my list was a Python WSGI app, which ran in a container which served pages. That was OK, and was the most stable incarnation of my website, until I realised how incredibly insecure it was compared to some of the other options I had available. Also, I was getting bored of a solution which worked too well to be any fun.

Obviously, I had the right idea in the first place, with generating static pages, which I could then transfer (using rsync or similar) to a chrooted Nginx server running in a virtual machine (it isn't any more or less secure than running it normally, but it helps me sleep at night), but I couldn't lose face and go back to Jekyll, I'd be humiliated (in my eyes only, though - most other people wouldn't care)! So I had to write a Jekyll clone in a language that was safe, not prone to bugs, had a rich package ecosystem (not unlike the Python package index or Ruby's gems) and was pleasing to write. Obviously, the only sane option was Haskell, and it's the language I chose to write my blog generator in. I called it Muon, because it seems like a cool name - catchy, 4 characters and easy to type.

Getting Muon

At some point, when I consider Muon to be ready for use by normal people, or at least, as normal as a Haskell package user can be, I will upload it to Hackage and you'll be able to install it with a simple:

$ cabal install muon

As it is, you have to get the latest snapshot from my darcs repository. You can still use cabal to install it, though, so dependencies are handled automatically.

$ darcs get http://repos.kaashif.co.uk/muon
$ cd muon
$ cabal install

It might take a while to build the dependencies from source, so if your OS's package manager already has prebuilt binary packages for them, look inside muon.cabal to see what you need to install, and install those.

Using Muon

If you want a quick rundown of what you can do with Muon, have a look at the README file in the darcs repo. If you're too lazy for that, just access Muon's help:

$ muon help

In reality, it doesn't matter what you type as the command - muon outputs a help message for all inputs that aren't already a command. So that's everything other than "generate", "init" and "upload".

It's worth noting that, at the moment, my server is hard coded into Muon's upload command, so you'll have to rsync, scp or FTP the files to your web server yourself, unless you happen to also have a web server called "webserver" serving from /var/www/htdocs, which is entirely possible.

Lessons learnt

Before writing Muon, I had the idea in my head that Haskell was useless for writing anything useful. That is clearly not the case. In fact, I'd say that my Haskell code is cleaner and easier to read (for me) than my Haskell code is cleaner and easier to read by necessity - Haskell's syntax demands it. Also, I find the "name arg = blah $ blah arg" easier to read than "def name(arg): return blah(blah(arg))", due to an allergy to parentheses.

There is always this talk of explaining monads using analogies and shit to make them simpler or easier to understand, but the reality is simple: monads are boxes you put types into. If you have an IO String, it's just a String inside a box labeled IO you have to use \<- and fmap to get at. Once that is realised, you can abandon all of this "monad is a burrito" crap and actually start using them. I'm sure that doesn't count as an analogy, if it does, I'm a complete hypocrite.

Anyway, Haskell isn't too hard to learn. Neither is Common Lisp. It gets really hairy when you move onto Template Haskell and macros in Lisp, but even then, once you get the hang of it, you'll be defining new syntax constructs without much effort.

Happy Haskelling!

How not to run a website

2014-04-12T00:00:00Z

After a few months of running a website, there are a few things I have realised about how I ran my server when it was first delivered, and how I run it now. The changes have been, for the most part, for the better. Needless to say, when I first started out, I was clueless, overeager and far too ambitious with my plans for "the next Facebook" or something

equally stupid. Mistakes were made along the way, and I'll make sure I never make them again.

Using FTP

Using FTP at all is a mistake, but my blunder was twofold - I had an account with write access to my file store which was not backed up. Worse yet, the account in question had a six character password consisting of letters and numbers which could have been brute-forced within seconds by a decent CPU.

Of course, when I first set up my FTP server, I was thinking more about how cool it was to be able to transfer files using command line FTP clients and how cool that was. It's not really that cool and one day, the weight of the security vulnerabilities I had exposed came crashing down on me. In every directory of files.kaashif.co.uk, there were an "index.php" and "index.aspx" files containing links to some online Bible shop. It could have been much worse - I could have been hosting my site using FTP to some VPS somewhere and my site could have been turned into an advert for Bibles. Luckily, my server was only set up to serve static files, with no index pages and certainly no PHP or other scripting languages installed.

The fix was simple, I stopped using FTP and went back to using the chrooted Apache server that comes with OpenBSD 5.4. Of course, now that it's going to be removed from base, I should probably switch to Nginx, but that can wait - OpenBSD's Apache, while old and probably insecure, isn't anywhere nearly as bad as having an FTP server with a weak password.

I count myself lucky that I chrooted my account into the FTP user's home directory. If I hadn't, the entire machine I was running the server o would have been compromised and I'd have had to spend 10 minutes spinning up a new one. Terrifying to think about.

Backups

As I mentioned, I didn't back much of anything up. Mostly, my backups consisted of spreading files over every PC I owned in the hopes that none of them would fail. This worked for static, never-changing files like pictures and videos, but had no place as a strategy when dealing with ever-changing code and documents. The first crisis I faced involved a power cut. Now, as you may know, FreeBSD systems use the Unix File System (UFS2) by default. While it's OK, a far better alternative is ZFS. I was not using ZFS at the time of the power cut, and just happened to be doing something disk IO intensive at the time. This was a recipe for disaster, as this caused the file system on one of my USB disks to die.

Eventually, the power came back on and I had to pick up the pieces. After booting into single-user mode and fscking the disks, all but one disk survived with only a few jumbled inodes here and there. The last disk could not be recovered for some reason. I ran some data recovery tools and I did manage to, eventually, save most of the data, but I learnt three important lessons:

Keep 3 copies of everything
Keep those copies on different storage media
Have some off-site backups

I heard those somewhere (probably from /usr/games/fortune), and it seems reasonable.

Nowadays, I tend to back my storage array to:

Another storage array (this is more redundancy than backup)
Tarsnap, an online backup site
For the really static data, I have a pile of DVDs

It is impossible for me to erase some of those backups without physically destroying them, which can't be done by accident. I think that, if there is another power cut, I'll be fine.

Those are the big two things I've done. It's worth noting that, since I only deal in public key authentication, not transmitting passwords over SSL, my site and SSH access to it was not affected by the recent OpenSSL debacle. Perhaps there's an OpenSSH issue lurking in the shadows, revealing private keys to hackers. That's unlikely, though, black hats would have already exploited it if such a thing were possible.

In any case, my pile of DVDs are safe from electronic intrusion, unless you count lightning.

Creating a GNU/Linux distro

2014-04-03T00:00:00Z

UPDATE: The project died, it went nowhere.

On nixers.net, the IRC channel of which I spend a bit of time in, there has been a bit of a stir as the community tried to decide on a project to commit to. The idea of creating a distro of some OS came up. A few people wanted a BSD-based distro, but it was decided that the Linux kernel was

where the hardware support was, so that's the direction we went. After working out that we wanted a musl libc based distro with a focus on minimalism, someone drew our attentions to morpheus, a GNU/Linux distro that seems awfully similar to what we were doing, and had been established for a very long time compared to our still-hypothetical project (started in September 2013). After that, it seems that the buzz around making our own distro has died down. Nevertheless, I'm still up for it, and played around with building a few live USB images of TinyCore GNU/Linux (basically just Linux plus a rather fully featured initrd) and FreeBSD (something more substantial). Although I have nothing concrete yet, I have a few ideas on what I want in a distro, and what my environment usually ends up like after I'm done customising the package manager, programs, and so on.

Package Manager

This was a subject of much contention, but I think the best path for a new distro run by a small community would not be to write our own package manager, but to use one that already exists and works. I suggested pkgsrc, the NetBSD package manager, which has been ported to GNU/Linux and has been proven to work well. Indeed, I have personal experience using pkgsrc on Slackware, and I'm confident it'd be perfect for our needs. It:

has over 12000 packages ready for use
comes with a binary package manager, pkgin
is a tried and tested system for packages
is bug-tested and securely audited by the NetBSD project

Writing our own package manager would require a monumental effort if we ever wanted to support anything more than a handful of programs, which is a large failing of morpheus' ports tree, and a reason it can never be anything more than a toy distro. Installing a package from source would be as simple as:

# cd /usr/ports/category/program
# make install clean

Similarly, installing from binary packages would require a simple:

# pkgin install program

This is assuming that we have built all packages, hosted them on a file server, and pointed pkgin there by default, which is definitely doable.

pkgsrc has ports for an X11 server, thousands of applications and scripts, and, most importantly, a way to keep the tree up to date. All of these features are things we'd have to poorly reproduce in custom, buggy, hacked-together scripts if we wanted to roll our own solution. Thankfully, this is not necessary.

Init System

A while back, there was a big kerfuffle about Debian switching to systemd, then Ubuntu doing the same. This came as a shock to many, but to others, it represents a necessary shift towards a more modern init system. While I wasn't shocked, I wouldn't use systemd either. My belief is that /sbin/init should do as little as possible before handing off init to less important userspace programs (e.g. /etc/rc) and maybe waiting to run a shutdown script. This is exactly what sinit does and it is incredibly simple. We may end up writing our own init for fun and profit, but it'll be very, very similar to sinit. The plan, I'd imagine, is to populate /etc/rc.d/ with shell scripts, similar in format to OpenBSD's or FreeBSD's, but with our own /etc/rc.subr functions.

Alternatively, the init scripts could take the form of sysvinit services, which are essentially self-contained shell scripts. The latter option is the easiest to implement, even if the first isn't that hard to implement. In any case, the distro will end up with:

Minimal /sbin/init binary
Simple service files requiring little parsing
/etc/rc.conf based service activation

Or at least something similar. Slackware's system of adding execute permissions to init scripts to activate them is also something worth considering. Also, the chmod +x /etc/rc.d/service could be aliased to initctl enable service, to make it look like a lot of work has been done. At this point, we have basically overtaken sysvinit in features, and are catching up with systemd rapidly.

File System

Back when a BSD kernel was still on the table, ZFS was also on the table. Unfortunately, even in FreeBSD, ZFS root is experimental, and it's even more so on Linux. Thus, ZFS is not an option, but btrfs may be. It's stable, has many of the features of ZFS, and is used in many high-profile distros, like Fedora. In the end, though, considering that ease-of-use needs to be a consideration, it would probably be easier to use ext4. That said, btrfs is a very attractive file system, but for now, ext4 is a lot easier.

Default Shell

The default shell, while not discussed at length yet, will no doubt be the subject of many an argument on the IRC channel. After a few discussions, I got the impression that others had the impression that zsh was bloated and slow, and that OpenBSD's ksh derivative, pdksh, was lighter. While this may be true in relative terms, I urged them to take into account than both shells used less than 1 MB of memory and both were responsive, even on an old 1995 ThinkPad running other programs in the background.

In fact, here is the line from top(1) for a zsh instance on my laptop right now:

PID  USERNAME THR PRI NICE SIZE   RES   STATE C TIME WCPU  COMMAND
1457 kaashif  1   52  0 37172K 1724K ttyin 0 0:00 0.00% zsh

So you can see, zsh uses 1724K of memory. Not exactly "bloated", is it?

Evidently, givina users a choice of modern shells like bash and zsh is far more important than saving the miniscule amount of memory switching to a gimped shell like pdksh would give. Perhaps it'd be worth it on an embedded system, but that is most certainly not the target audience.

The Plan

Taking into account what I have written here, I think the distro is headed in a good direction, provided development ever gets off the ground. While there are some disagreements about specifics, the overall sentiment is one of agreement, considering the sorts of people that congregate on #nixers (nee #unixhub) and their desires: minimalism, pragmatism and utility.

I hope our hacking will be happy and productive.

Vim

2014-03-01T00:00:00Z

When I first started programming, I barely had any idea of what constituted a good text editor, or why I'd want to use some old, texty editor from the 90s which didn't even have most of the features I took for granted in the IDE I was using at the time. Maybe this had something to do with one of my first languages being Java, which is widely considered an IDE language, but I went through the

same thing when I started learning C and Python. Why take all this time to learn Vim when I could just open Gedit or Kate and get to work immediately?

As always, with this type of "Vim changed my life" blog post, I could go on to describe my experience of slogging through hours of using Vim, getting used to it, then wondering how I ever coped using terrible editors which weren't even modal and didn't even have macros. I won't do that, since I'd just be regurgitating very generic stuff. Instead, I'll distill all of that down into a few points:

Modal editing
Text objects and motions
Plugins

Modal editing is actually very simple. At its most basic level, there are three modes (there are really more, but those aren't as important) - normal mode, insert mode and command line mode. Normal mode is where one should spend most of their time - it's where you move around the text, yank and put (copy and paste), execute macros and so on. Insert mode is where you type text and it is written to the file you have open. While in normal mode, typing "dw" would delete a word, doing the same in insert mode actually inserts a literal "dw" into the text. A mistake most beginners make is trying to spend all of their time in insert mode, because it's the most similar to what they already know (Notepad, Gedit, etc). Command mode is where you type commands into a command line prompt, which is needed for more complex commands which can't be accessed through key combinations unless you remap keys manually.

Text objects are key combinations that represent structures found in text. For example, "i(" represents the text inside brackets and "t." represents the text up to the next full stop. Motions are keys you press to move around text - there are the usual "hjkl" keys to move left, down, up and right, but also "w" and "gg", to move to the next word or the start of the file. There are many, many more objects and motions.

Plugins are scripts which extend Vim's functionality. You are probably already familiar with plugins or extensions for web browsers, so there isn't really any need to explain what they are or do, suffice it to say that they add or improve Vim's features.

Vim in action

Reading about text editing is all well and good, but it doesn't allow you to really get a full grasp of the power of Vim. Neither does looking at a static series of screenshots - you can't really see a comparison between your current editor and Vim unless you see it in action. So here are some GIFs of Vim in action, demonstrating the power I mentioned.

Underlining text

If you're using Markdown or reStructuredText or anything similar, it's a common task to "underline" a title by putting a few equals signs or dashes under it. Using Notepad (for instance), you'd probably press enter to go to the next line, mash "=" until you get the desired look, and call it a day there. In Vim, you can yank the line, put it on the next line, and replace that line's characters with "=". This is quick and simple, but does seem like a bit much to learn just to underline some text, especially when this sequence in Vim translates to a key combination of "yypv$r=o".

Repeating commands

In Vim, any sequence of actions you do can be recorded to a macro. It's very easy, you just press "q" followed by the key you want the macro to be assigned to (I always choose "w" since it's next to "q" on most keyboards), then do whatever it is you want to do, and press "q" again to stop the recording. Let's say I recorded the action of underlining a title into a macro. If I want to repeat the macro 3 times, I just type "3@w". If I want to repeat it 1000 times, I'd just type "1000@w". This comes in very useful for more complicated macros, like if you wanted to format a large list into a nicely formatted table.

Changing text within delimiters

This is another very common problem - haven't you ever wanted to change the text inside quotes, or inside brackets? The answer to that question is always yes, and Vim has a very easy way of doing it. Changing the text inside double quotes is simply "ci"", for "change inside " ". Notice how I didn't have to move to the first quote - Vim moves automatically to the first quote on the same line. You can do this with any other delimiter, like curly braces, single quotes, but also for blocks and sentences. When I say "block", I'm referring to a block of a program, which is a useful text object to have when your blocks aren't delimited by curly braces.

Using regular expressions

You might be familiar with regular expressions, so you might not be surprised to hear that Vim can use them extensively - not a surprise for a program that deals with text. If not, this is a quick demonstration of reversing the lines of a file. You might find this useful if you're viewing a log file and you want the lines to be from newest to oldest, but the point is that Vim has a native regex engine that can be accessed quickly in command mode by using ":g".

Git integration

Lots of programmers use Git, so it's natural that Vim, a programmer's tool, has a variety of plugins that claim to integrate with Git. Fugitive is one of the best, in my opinion. You can see how effortless it is to view the authors of every line in a file (although in this case, all of the authors are "kaashif"), add the file to the staging area using ":Gwrite" and commit it, using ":Gcommit". I don't see how it could get any easier, especially considering the tab completion.

This was by no means a Vim tutorial, just a demonstration of what one can do if you know Vim, and a pretty limited one at that. Learning the basics of Vim is quite easy, all one needs to do (assuming Vim is installed) is run the "vimtutor" program in a terminal and it'll walk you through the basics. If you're on Windows, you can go to the Vim website and download their installer for gVim, which is Vim with a slightly graphical interface. After that, I think you can run "vimtutor" from cmd.exe, but I'm not sure.

I'm tempted to sign off by saying "happy hacking!", but that's far too cheesy. Happy editing!

My Desktop

2014-02-09T00:00:00Z

Over the years, I've used a few different OSes and desktop environments, and the one I use currently is portable to many operating systems, mostly due to the efforts of the writers of i3 over at i3wm.org, but also the standards-compliance of POSIX, meaning that my shell scripts (which you can find here) work on all

operating systems worth using (naturally, this excludes Windows).

EDIT: That link doesn't work. This does.

You might already know how to manage dotfiles effectively with GNU Stow, because that's exactly how I manage mine - it allows me to install the config files I need, and no more. This isn't what I want to write about, though, I've already written something about that - I want to show off the final result of my dotfiles - my desktop.

This is what my blank desktop looks like. There isn't much to look at, and I haven't put any fancy conky widgets or anything there because I hardly ever just look at my wallpaper anyway. That's the Haskell logo, it seems as good a logo as any to have on a desktop I never see (other choices include Puffy, Tux and Beastie, all OS logos).

At the top is a bar, created by piping conky output into dzen2. The music player is mpd (music player daemon), which I can control using my phone, command line clients, GUI clients, scripts, and so on. Some might say running a service to play music is overkill, but it's actually more lightweight than most solutions, and is very Unixy, too.

If there is an application I want to launch that I haven't already bound to some keyboard chord (for example, Super-C for Firefox and Super-G for gVim), I use dmenu to get to it. I had to apply a plethora of patches to get dmenu to support colour changing, Xft fonts, and variable height/width. It turns out that you can get a pre-patched version of dmenu here. While it is called "dmenu2", it's only dmenu with a few patches, and it's not an official suckless.org tool anyway. All of the output on he bar comes from all the normal Unix status programs - df, uptime, date, and so on. I don't remember where I got the icons, but I saw Bill Indelicato using icons which looked similar, so maybe I got them there or we both got them from the same place.

Using the magic of mpc (media player client) and dmenu, I managed to get dmenu to create a randomly shuffled list of artists in my music collection. When I select one of them, the current playlist is replaced by the discography of the chosen artist. It's very convenient, only a Super-Shift-D away - far faster than opening a music player, going to a list of artists... I still do that, but less often, since most of the time I just want some randomly chosen music, not anything specific.

I'd like to say this is what my laptop usually looks like, but the reality is that I've just opened some random files in /usr/include with gVim. Well, this might be what a kernel hacker's laptop looks like sometimes - mine is mostly just the one shell script or Haskell program open, and usually a browser with Reddit or 4chan on the side. The window manager is i3, by the way, a WM I like because of its simple and efficient manual tiling - none of that XMonad dynamic crap, way too inflexible and sometimes annoying.

I hope this has given you some insight into how I use my laptop, but realistically, you've probably just commented on the similarities between my desktop and someone on /g/'s. Fair enough, I haven't worked hard enough on my desktop for it to be very unique, and I don't intend to, because my setup works for me, and lets me work without using a mouse, so it's a win-win-win for me, my "hacker cred" (I hear the kids using Ruby and node.js say that nowadays) and my laziness.

Functors in Haskell

2014-02-05T00:00:00Z

Whenever you hear something about Haskell, chances are it sounds arcane and involves lots of complicated and intimidating mathematical language. Well, the truth is that all this talk of 'endofunctors' and 'monoids' is really unnecessary, if the concept of functors is explained using a simple analogy.

If you have a situation where a type contains other types, you are dealing with a functor. For example, when you have a list of integers, the list is the functor. If you have a herd of sheep, the herd is the functor. At its core, this is what a functor is - something which contains types and can be mapped over. That last part is crucial and is central to how functors are defined in Haskell, but are not central to how one should think about functors.

In practice

At a GHCi prompt, you can import a few modules here and there with the word 'functor' in them, but at this stage, you do not require any of the more complex functions, you only need one - fmap. The functor we'll be using as an example is the 'Maybe' functor, which contains either a single value ('Just' the value) or Nothing. If this is confusing to you, just think of the 'Just' constructor as representing a box you put types into, and the 'Nothing' constructor as representing an empty box. So 'Just 1' can be represented as 'put the value 1 into a box'. This isn't entirely contrived, the 'Maybe' functor does have a use in functions that can fail, so can return an empty box or a box with a result in them.

Let's examine the type of 'Just' using ':t'.

Prelude> :t Just
Just :: a -> Maybe a

If you are familiar with Haskell syntax, you should know that this means that 'Just' takes any type and turns it into a 'Maybe' functor containing that type. No surprises here, that is the definition of a functor.

Let's say we have a few functions that return numbers contained in a Maybe functor, and we want to combine them in some way. When mapping over a single functor, the function you use is fmap, like so:

Prelude> fmap (+2) (Just 3)
Just 5

This is fine for function that only take 1 argument, but remember, we want to apply this to multiple arguments. If we define a function like so, which takes three (non-Maybe) numbers and adds them together, it becomes obvious that using fmap on its own won't work:

Prelude> let f x y z = x*y*z

If we just try to fmap our three-operand function over a Maybe functor, and examine its type, we see the following:

Prelude> :t (fmap (f) (Just 3))
(fmap (f) (Just 3)) :: Num a => Maybe (a -> a -> a)

This means that fmap has transformed our function which took things of typeclass Num into a partially applied function, which still takes things of typeclass Num, but the function itself is within the box of the Maybe functor. This means that we want to map a Maybe function over a Maybe Num. If the function takes yet another argument, doing this will return yet another function. If not, it'll give us the answer, wrapped in a functor. If we want to apply a function with multiple arguments to functors, we need to use Applicative functors.

Prelude> import Control.Applicative

And then, we can use the <*> operator, which takes a function in a functor and applies it to a value in a functor. We can use it in more trivial cases like so:

Prelude> Just (*4) <*> Just 3
Just 12

We can see that (*4) takes one value, so we are given the answer we want. Applying this to our situation gives us this:

Prelude> (fmap (f) (Just 3)) <*> Just 4 <*> Just 5
Just 60

We can improve the look of this ugly statement by using fmap as an infix function.

Prelude> f `fmap` Just 3 <*> Just 4 <*> Just 5

Because this is such a common use case, there is some syntactic sugar provided - fmap can be replaced with <$>.

Prelude> f <$> Just 3 <*> Just 4 <*> Just 5

At this point, we just have the function and its arguments as we usually would, just with some extra operators thrown in. The most important thing you should take away from this is that using Applicative is far better than using liftM and its numbered equivalents, as Applicative can be applied to an arbitrary number of arguments. Monads are useful in other cases, and Maybe is actually a monad, but thinking about it like a monad rarely helps.

Updating DNS records

2014-01-22T00:00:00Z

When running a website on a residential connection, a problem one might run into is the dynamic IP address usually assigned by one's ISP. There are a few dynamic DNS services which basically let you have a subdomain (e.g. mydomain.example.com) and let you update it to point to your IP address whenever it changes. At one time, your IP might be 10.0.0.1, and your domain correctly

resolves to 10.0.0.1, but after you reboot your modem or router, your IP may change to 10.0.1.2 and your domain will still point to your old IP (thus won't work), until your dynamic DNS client somehow updates the A record of your domain when your IP changes.

Assuming that getting a static IP is impossible, there is only one solution: dynamic DNS. While it is possible to transfer your records to someone other than your domain registrar to manage DNS records, I just keep all things related to my domain with Namecheap, which is where my domain was registered.

Choosing a client

Several websites, DynDNS and NoIP being examples, offer their own clients. You can install these, put your site-specific username and password into a config file, and let your DNS be handled automatically. Usually, these programs can do a lot more than update your DNS, and are needlessly complicated if that's all you want to do. The same goes for most multi-service dynamic DNS clients, for example, ddclient.

Most services provide a URL with which you can update your A records. Fetch this URL, and the A records on your domain will be changed to match the IP you fetched the URL with. It's a very simple system, and can be automated within minutes, with a simple script. Because I like to keep my servers bare, and avoid installing anything bulky (like Python, Ruby, and the like), I'm going to ues Perl, which is in the base system of every good *BSD, and comes installed by default on most GNU/Linux distros.

Writing the client

There is some Perl boilerplate we have to get out of the way - the shebang and some "use" statements.

#!/usr/bin/env perl
use strict;
use warnings;

It's worth commenting on my use of "env". On FreeBSD (the OS my servers use), Perl is located at /usr/local/bin/perl, while on most GNU/Linux systems, it's at /usr/bin/perl. Using "env" avoids the issue of finding the Perl binary, and is more cross-platform.

Next, we have to find the URL, password, hosts, and domain we will use. Namecheap uses the word "host" to mean the subdomain, e.g. the "www" part of "www.fsf.com". For this example, let's just say we want "ftp", "www" and "@", "@" being no subdomain, e.g. "mydomain.com".

my $password = "blahblahblah";
my $domain = "mydomain.com";
my @hosts = ("ftp", "www", "@");
my $update_url = "http://dynamicdns.com/update?domain=$domain&password=$password&host=";

Notice we didn't attempt to put the hosts into the initial definition of the update URL. We will do that when we loop over the array and fetch the URL using cURL, thus updating the domain.

foreach(@hosts)
{
    my $final_url = "$update_url$_";
    my $output = `curl -s "$final_url"`;

So far, we have fetched the URL, so the IP of the host's A record should be updated at this point. We still need to check for errors and such, so let's do that next.

   if ($output =~ /<ErrCount>0/)
    {
        print("Update of $_.$domain succeeded.");
    }
    else
    {
        print("Update of $_.$domain failed!");
    }
}

It's possible to use XML::Simple and actually parse the XML output by the server handling DNS updates, but that would be overkill and waste everyone's time. So I just regex the output for errors, it works fine, and the problem is generally obvious if it fails, which is why I didn't print the output. It's perfectly possible to add some verbosity and do all of this "correct" stuff, but no-one, not even you, will ever care (unless you just do it for practice).

So save this script in /usr/local/bin/dnsup, or somewhere else appropriate and memorable.

Scheduling updates

Your IP address can change instantly at any moment, so you must balance the downtime users experience with the practical considerations of running a script repeatedly. Personally, I run the script every 10 minutes. My IP only seems to change once in a blue moon, and when it does (due to me rebooting the firewall or router), I tend to notice right away, get impatient, and run the script manually. That's beside the point, though, just add this to your /etc/crontab to make it run every 10 minutes.

*/10 * * * * nobody /usr/local/bin/dnsup

I run it as "nobody" to avoid any unnecessary root usage. What if you made a mistake in the script and overwrote / with garbage? A program having privileges is bad, don't do it.

At this point, you should be free of any possibly proprietary, bloated DNS clients and you should have an automatically updating domain.

What's a monoid?

2014-01-14T00:00:00Z

If you have spent time on programming or technology boards like /g/ or /r/programming, chances are you might have heard the word "monad" thrown around a lot. You may have even heard the oft misquoted phrase "A monad is a monoid in the class of endofunctors" intended to be a joke, or to scare programmers away from scary functional languages like Haskell. The truth is that monads aren't

even slightly the same sort of thing in Haskell, but they do relate somewhat. Monoids are, in actual fact, very, very simple.

Sets and operators

Let's take an example: suppose you have a set of things S and a single binary operator. This binary operator takes two things from your set and outputs a third thing, which is also in your set. There is no pair of operands for which the result is outside the set you started with - the set is closed. The set also contains an identity element - there exists an element which does not change other elements when combined with them, using your binary operator. The operator must also be associative, i.e. the order in which the operations are evaluated does not change the results. These laws are expressed more clearly as follows: $$S \circ S \to S$$ $$i \circ e = e \text{ for all } e \text{ in } S$$ $$a \circ (b \circ c) = (a \circ b) \circ c$$

We could make this an even more relatable example by using the set of real numbers and the addition operation, forming a monoid and satisfying the laws as follows: $$\mathbb{R} + \mathbb{R} \to \mathbb{R}$$ $$0 + e = e \text{ for all } e \text{ in } \mathbb{R}$$ $$a + (b + c) = (a + b) + c$$

This means that whenever you're adding real numbers (which we all do on a daily basis), you're utilising monoids. Notice that subtraction and division are not associative, so cannot form monoids. We aren't limited to sets of numbers either, you can form monoids from matrices, vectors and functions. Indeed, a set can contain any sort of structure, meaning monoids can be very general or very specific, depending on the sort of structures it contains. For example, defining a monoid of numbers and addition is useful in a small number of cases, while a monoid of functions and function composition is very useful in many cases. Monoids can be generalised, and the resulting generalisation will be very useful, generally.

Generalising monoids

While it is all well and good having a single set and a single binary operator, which maps members of that one set to members of the same set, it would be useful to have a structure defined as an arbitrary number of sets and an arbitrary number of functions. We could define such a structure as consisting of the following:

A set of sets
A set of functions mapping sets to other sets

In some cases, rather than having sets of sets within our structure, we may want to use magmas or semigroups, or something unrelated to groups entirely. For this reason, it is better to say that our new structure contains a set of objects, and morphisms between these objects. "Object" is a term which refers to any algebraic structure, and "morphism" refers to any possible mapping between these objects. So now, our structure consists of:

A set of objects
A set of morphisms mapping objects to other objects

This structure is not all that different to a monoid, and it becomes very obvious if we say a monoid merely consists of:

A singleton, containing a single set (e.g. all real numbers, all functions)
A single morphism (e.g. addition, function composition)

Since there is only one object, the morphism can only map the object to itself, thus is always an endomorphism. We can also think about the binary operation (e.g. addition) as a set of unary operations (e.g, add 1, add 2, add 3), and see the monoid as a set S, and a set of unary functions equal in order to S. We get the same result whether we think about a single endomorphism or multiple, so it doesn't really matter.

What we are slowly working towards, as our definitions get progressively more general, is a category. Categories are incredibly useful and can become so general that they are used to formalise all of mathematics. They actually have uses too, such as considering the objects as representing types in a programming language, and the morphisms as representing functions.

Writing Unix manual pages

2014-01-08T00:00:00Z

There are a few very important things that everyone involved in software (particularly free software) can do to help out. The most important is to file detailed and helpful bug reports, so the developers working on your favourite program can get the problem fixed. Since it is not very hard to write a bug report, and projects generally have their own bug report guidelines, I won't

talk about that. The second most important thing you can do is writing guides, manuals and the like, in order to get more people using the software without headaches. Readily-available documentation not only saves the users time, but helps the developers to spread their software, since it might develop a reputation for being easy to learn to use, or any number of other things.

I'm a bit biased, since I'm a Unix user, but I think that a good place to start is to write manual pages. There have been a few times when I was trying to do something with a program, I had trouble, typed man program, saw that there was no manual page, and installed another program, one that did not force me to rely on guides written on blogs and such, which may not be accurate or up to

usually write one in Pod, compile it into groff, and submit it to the devs upstream. It's not hard, in fact, it's very easy.

What does a manual page look like?

Here is the manual page for write(1), a pretty vanilla Unix command.

WRITE(1)                  BSD General Commands Manual                WRITE(1)

NAME
     write -- send a message to another user

SYNOPSIS
     write user [tty]

DESCRIPTION
     The write utility allows you to communicate with other users, by copying
     lines from your terminal to theirs.

     When you run the write command, the user you are writing to gets a mes-
     sage of the form:

     ...

Most man pages follow the convention of "NAME", "SYNOPSIS", "DESCRIPTION", "SEE ALSO", and optionally "AUTHOR" and "BUGS".

What does this look like in Pod?

Pod is very simple and was designed to be used for Perl. Despite that, it is very readable and it should be possible to get the gist of what this Pod markup means just by looking at it:

   =head1 NAME
    program -- does something

    =head1 SYNOPSIS
    program [-o I<optional argument>] positional argument

    =head1 DESCRIPTION
    Describes what program does in more detail.

    -o, --optional
        describes what option does

    =head1 SEE ALSO
    ls(1), write(1)

    =head1 AUTHOR
    Kaashif

    =head1 BUGS
    Report bugs to rms@gnu.org

As you can see, "=head1" denotes a top level header. This is all we really need for writing a simple manual page. To convert it to groff, you can use the following command:

$ pod2man my_first_page.pod

And a file called "my_first_page.man" will be created with the appropriate groff markup. It is my opinion that Pod is much better than groff if we're just writing a manual page, and this can be verified if you look at the groff output - it's not very readable unless you know your groff.

You might not always want to create manual pages, you can use pod2html and pod2latex to create web pages and LaTeX source which can be put into another, longer document. In fact, I could have written this post in Pod (but I didn't), it's very powerful, considering its simplicity.

Reviving an old ThinkPad

2013-12-23T00:00:00Z

While I did have some old hardware lying around, I had never committed to actually getting that hardware usable. By that, I mean I had never tried to browse the web, read emails and that sort of day-to-day stuff on anything older than a few years. To see if it were really possible, I decided to buy an old ThinkPad (a 760EL from 1995) and see if I could get it working. Before I

started, I remembered that people like K.Mandla had already done things like this before, but he had such luxuries as 32 MB of RAM, a CD drive, a USB port. Really, he was pushing the boundaries of what could be considered low-end (right?). Here is what my ThinkPad 760EL had when I started off (I have upgraded it a bit since then):

Pentium I 133 MHz
16 MB RAM
2.1 GB hard drive
Trident TGUI 9660 graphics
3.5 inch floppy drive
800x600 TFT display

The lack of network support and any removable mass storage meant installing anything was impossible, so I bought a 3Com Etherlink III PCMCIA Ethernet card, which is well supported by both NetBSD and OpenBSD, the two candidates for the OS that would eventually be on my laptop.

Installing OpenBSD

The minimum memory requirements for OpenBSD 5.4, as listed in INSTALL.i386, are either 24 or 32 MB of RAM. NetBSD 6.1.2 requires 20 MB. I had only 16. After some consulting with misc@, I was advised to either get more memory or install an older release. Since the latter would involve running 5+ year old unsupported software (a terrible idea in all cases), I decided to open up my ThinkPad and take a look at the RAM modules. After reseating the only module to make sure I was familiar with the process, I rebooted to find that...I had 32 MB of RAM?! Apparently, 16 MB was soldered on the mainboard and another 16 was removable.

That was very fortunate, because that meant I could install OpenBSD 5.4! I could have also gone the NetBSD route, but it required 5 floppies, while OpenBSD only required one (floppies are actually quite expensive nowadays). The installation went well - my NIC was automatically detected and configured and the sets downloaded and extracted without a hitch.

Configuring the text console

OpenBSD doesn't have much framebuffer support, and there isn't much interest in writing drivers for images (and other complex 2D graphics) in the framebuffer either. This means we're limited to text unless we set up X, which performs abysmally on this hardware. The first order of business is getting more characters on the screen. There are instructions to do that here. It boils down to doing three things:

Loading a half-height font
Deleting all the screens configured to use the full-height font
Creating new screens which use the new font

Here is what the relevant part of my /etc/rc.local looks like:

wsconscfg -dF 1
wsconscfg -dF 2
wsconscfg -dF 3
wsconscfg -t 80x50 1
wsconscfg -t 80x50 2
wsconscfg -t 80x50 3

So I just did that for screens 1, 2, and 3, so I can now see twice the amount of text.

Text Applications

The OpenBSD project provides an extensive collection of packages, including web browsers and text editors which can function in the text console. Here are a few applications I make use of:

tmux (terminal multiplexer)
vim (the best text editor)
elinks (text web browser)
mutt (email client)
ssh (remote shell access)

Using those, I can access the web. check my mail, and write these posts. There is not much else I need to do on a day-to-day basis.

Getting X to work

Unsurprisingly, I could not find much documentation on getting an X server to work on a Trident video card from 18 years ago. This meant I had to fiddle around with the files in xorg.conf.d a bit. X -configure segfaults every time I run it, meaning it's not much help. That doesn't matter, though, because the configuration file it generates only does the obvious - it changes the driver to "trident", sets up the screen with the correct resolution, these are all things I already knew. To get my card to work, I had to add several options to the "Device" section. Here is what that part of my config looked like:

Section "Device"
    Identifier "gfxcard"
    Driver "trident"
    Option "NoAccel" "True"
    Option "ShadowFB" "Enable"
    Option "NoPciBurst" "Enable"
    Option "FramebufferWC"
EndSection

To clarify, this did not go into my xorg.conf, I created a new file in /usr/X11R6/share/X11/xorg.conf.d called 99-trident.conf. The "99" ensured that this file would be sourced last, and would override any other device settings. Documentation on what these options do can be found here. If the resolution is incorrect (it wasn't in my case), you may have a timing issue, which can be fixed by adding one of these options:

Option "UseTiming1024" # For 1024x768
Option "UseTiming800"  # For 800x600

With these settings, one finds that the performance of X is very, very bad on this hardware. After spending hours getting it to work, I decided to stick to the text console, where perfomance was far greater. I felt it was worth the tradeoffs (fonts, resolution, colours).

The final result

It's worth mentioning that the screenfetch script took about 30 seconds to display anything, and that elinks is completely unusable unless I let it take up the whole screen. These screenshots are really just for show. I can't show you what my screen looks like when I'm doing real work, because OpenBSD has no facilities in place to take framebuffer shots (like fbgrab).

Using GNU Stow

2013-12-01T00:00:00Z

stow is a cool little Perl script which basically just creates and deletes symlinks. That sounds pointless, but let me explain with an example. Let's say you want to install a program using the usual make install, which probably installs into /usr/local, which means it's separated from the rest of your system, which is managed with a "real" package manager. Unless you're using a

system like BSD ports or Gentoo portage, you cannot have packages which are both managed by the package manager and customised according to your needs. In fact, if you need to do any in-depth modification of the source, I'd say you have no other option than using make, meaning you have a whole mess of files under /usr/local. This wouldn't be a problem with package managers, since this "mess" of files can be removed by uninstalling the package, which usually has a list of files it installed, ensuring all stray man pages and example configs are purged from your filesystem. If you used make install, and there is no make deinstall or similar, then you're out of luck. The only way to remove the programs you just installed is to wade through the mess that is /usr/local and remove the files manually. That's where stow comes in.

The idea of stow is that you compile your programs normally using make, but install it into a different directory, usually something like /usr/local/stow/$PKGNAME. This means there is an entire directory tree in that directory containing all of the files that would have been installed by that Makefile. You then enter the /usr/local/stow directory and use the command stow $PKGNAME, which creates appropriate symlinks in the directory /usr/local to make it seem as if you just installed the package with make install into /usr/local. At first, this seems completely useless, but it means you have a guaranteed way of deleting that package - tell stow to delete all the symlinks. You do this by going into /usr/local/stow and typing stow -D $PKGNAME, which removes all of the symlinks in /usr/local. Here is a series of commands someone using stow might execute.

$ sudo mkdir /usr/local/stow/my_program
$ cd /tmp/my_program
$ make install DESTDIR=/usr/local/stow/my_program
$ cd /usr/local/stow
$ stow my_program

This makes it very simple to upgrade packages or keep multiple concurrent versions lying around.

Using Stow for dotfile management

Admittedly, stow is not really that useful on a bog standard GNU/Linux desktop or laptop, since most packages would be compiled with the usual combinations of compile-time features enabled and available in package repos. The behaviour of stow can be exploited to provide some other uses, however, one of which involves dotfiles.

Let's say a user named bob wants to check his dotfiles into his favourite version control system and deploy them easily and quickly on other computers. Right now he has a home directory which looks like this:

bob
|-- .bash_profile
|-- .bashrc
|-- .config
|-- .mutt
|   |-- config_1
|   |-- config_2
|   `-- config_3
|-- .vim
|   |-- bundle
|   `-- ftplugin
`-- .vimrc

Since stow works by recreating a tree of symlinks in the directory above where it is invoked, bob can make a directory called "~/dotfiles" or similar, put all of his dotfiles in there, and stow will create symlinks for him in the directory above "~/dotfiles", his home. His ~/dotfiles should look something like this:

bob
`-- dotfiles
    |-- bash
    |   |-- .bash_profile
    |   `-- .bashrc
    |-- config
    |   `-- .config
    |-- mutt
    |   `-- .mutt
    |      |-- config_1
    |      |-- config_2
    |      `-- config_3
    `-- vim
        |-- .vim
        |   |-- bundle
        |   `-- ftplugin
        `-- .vimrc

This means that he can deploy his dotfiles by going into the dotfiles directory and executing stow bash config mutt vim, which creates symlinks in ~/, duplicating the hierarchy he had in place before. Advantages include: being able to selectively deploy configs only for programs you need on that PC, easy integration with Git, Mercurial and the like as the dotfiles directory can be checked into VCS. Stow has certainly avoided me a few headaches, I recommend it for anyone who needs to use multiple machines.

Emacs

2013-11-30T00:00:00Z

For a few years now, I have been using Vim to edit config files, program in C, Python, even Lisp (people apparently think that Vim isn't the best for programming in Lisp). This isn't because I took a side in the so-called "editor wars", it's just because it came preinstalled on the first GNU/Linux system I used, Debian. Over time,

I abandoned IDEs and moved to a full Vim setup, which works well for programming in C and Python, which is what I do most of the time. I like to think my text-editing environment is actually quite good, and that Vim's modal editing model has aided my programming. But there was always the elephant in the room - Emacs. I had always heard about it from the usual "Why I use Emacs" and "Vimscript considered harmful" blog posts but I had never used it, mostly because I was too entrenched in Vim's editing model. Then I heard about evil-mode and viper-mode, the Emacs vi emulation modes, and I wondered whether Emacs' sheer flexibility was a good reason to use it. It seemed that this was the reason most people used it (not specifically for vi emulation, but for the many, equally exciting modes), so maybe I should try it out.

Modal editing

For the uninitiated, here is a quick rundown of Vim. There are 2 modes you need to know about: Insert and Normal. Normal mode is the mode you are supposed to spend most of your time in: you can execute motions, editing commands, macros, searches, everything you need to edit text apart from actually inserting (typing) any text. Insert mode is entered by typing "i" and exited by typing ESC. Between these actions, you can type text and it will be inserted into the file you're editing. Most newbies attempt to spend most of their time in Insert mode, but this eschews the real power of Vim, the easy composition of motions and commands that allow efficient editing. For example, to delete the next 10 words in Normal mode, you type "10dw". In Insert mode, you must tediously hold down the arrow key and backspace. It gets better, too, you can delete everything in quotes with "di'" for "delete inside '" and so on.

Emacs' model

Emacs has a completely different model, when you type anything, it appears in the buffer, just as if you were in Gedit, Kate, or any other "normal" editor. Emacs is just as powerful as Vim, however, it just relies on a system of key chords to execute commands and motions. For example "C-k" deletes to the end of the line (C- means hold down Control and press a key, M- means the same thing, but with Alt), and "C-s /" moves to the next "/". Some might say that these key chords have a habit of twisting your hands into unnatural positions, but the advantage of having a built-in Lisp interpreter is that you can script anything you want, however complex, resulting in the included viper-mode and easily installable evil-mode, which "fix" Emacs for Vim users.

My setup

I haven't opted to go for any vi emulation mode in my use of Emacs, I'm using Emacs as Stallman intended. Surprisingly, my hands don't hurt as much as I thought they would, and I certainly don't have RSI or any wrist injuries. The first thing I noticed when I opened Emacs was...well, just look at this:

It's not very pretty, not in the slightest. The first order of business was to change the colour theme. It was not immediately apparent how to do this using the ".emacs" init file, so I resorted to using the GUI, which was quite a bit easier. While there, I also changed my font and hid some of the GUI elements, so it almost looked like a terminal, but with more colours and variable font sizes. Surprisingly, saving the settings did not save them in some unreadable binary format or weird markup language, it just appended some elisp (Emacs Lisp) to my ".emacs" file, which was nice. Here is the result of that:

(custom-set-variables
'(blink-cursor-mode nil)
'(custom-enabled-themes (quote (wombat)))
'(inhibit-startup-screen t)
'(tool-bar-mode nil)
'(tooltip-mode nil))
(custom-set-faces
'(default ((t (:family "Terminus" :foundry "xos4" :slant normal :weight normal :height 90 :width normal)))))

Now I had to get line numbers, which was as simple as appending (global-linum-mode 1) to my ".emacs". To get a space between the numbers and the buffer text (which isn't the default, oddly), I had to append (setq linum-format "%d "). At this point, Emacs was starting to look good. Its real test would be the ease of use of its package system. At the end of all of this customisation, my Emacs looks like this:

package.el

Emacs comes with a package manager called, imaginatively, package.el. It doesn't come with a very expansive list of repos, so I added a few, using the example on the EmacsWiki to guide me. Here is what I had to add:

(setq package-archives '(("gnu" . "http://elpa.gnu.org/packages/")
 ("marmalade" . "http://marmalade-repo.org/packages/")
 ("melpa" . "http://melpa.milkbox.net/packages/")))

Now I had 3 repos to install my favourite packages from. I needed something to auto pair parenthesis and the like, so I installed autopair, which was as simple as "M-x package-install autopair RET". You can also browse a long list of all packages using "M-x list-packages", and search it with the usual C-s and C-r. This is better, in some respects, than my method of managing Vim bundles. Vundle requires you to edit ".vimrc" and add something to a list of bundles, then run ":BundleInstall" in Vim. This is a longer process than Emacs' centralised package repo system, which is more convenient. On the other hand, Vundle lets you install bundles from any Git repo, meaning it's a lot easier to install random bundles you find on the internet. I haven't noticed anything regarding plugin/bundle/package quality, all of the extensions/add-ons/scripts I use are quite good, probably because they're all free software and anyone can contribute. That's one thing all people on all sides of the Editor War can agree on.

Conclusion

I can't really say anything about Emacs until I use it more, but I do like the windows more than Vim's odd buffer system. That's all for now. Maybe I'll have more of an opinion when I get to writing large projects.

Introduction to C

2013-11-27T00:00:00Z

This tutorial is designed for those who have programmed before, perhaps in a higher level language like Python or Ruby. It's not too hard to understand for those who are completely inexperienced, but some knowledge of functions, data structures and pointers might help. Most of the low-level stuff will be new to high level programmers, however.

Setting up the compiler

On GNU/Linux, most BSDs and quite a few other operating systems, gcc, the C compiler of the GNU Compiler Collection, is included. If not, it'll be accessible through your package manager. For example, on FreeBSD, where clang is the default, you'd install gcc with a simple pkg install gcc. OpenBSD comes with gcc by default, and so do the vast majority of GNU/Linux distros. If not, you know what to do. The process of actually compiling source will be elaborated on later. For now, here is a command you can use to compile programs:

$ gcc -o hello hello.c

This compiles the source code in the text file hello.c and compiles it into the executable hello. The use of libraries will be covered later, or maybe not at all. Look it up on StackOverflow or something.

Hello, World!

The standard "Hello, world!" program is the de facto standard for introducing a programming language. Here is a basic implementation:

// This is a comment
#include <stdio.h>

int main ()
{
    printf("Hello, world!");
    return 0;
}

The first line includes the standard header containing IO functions. In C, when you include a header, the C preprocessor just inserts the text contained in that header wherever you place the #include. There really isn't anything more to it than that, no complex module importing hoops to jump through, like in Python or Java.

The main function is self-explanatory, but requires a little exposition. Main functions in C always return an int. This is the return code of the program, which is used in scripts and in the shell to determine whether the program succeeded. Generally, zero means a success while anything else means a failure. You could also use some macros (basically constants) defined in stdlib.h to denote failure and success exit codes. If we did that, we could rewrite the source as follows:

/* You can also do comments like this */
#include <stdlib.h>
#include <stdio.h>

int main ()
{
    printf("Hello, world!");
    return EXIT_SUCCESS;
}

The advantages of using EXIT_SUCCESS and EXIT_FAILURE generally include things like portability across platforms where 1 could mean failure on one, while -1 could mean the same thing on another. These constants are still ints, so you don't have to change anything else to get the main function working. I'd advise you use these constants, since you'll have included stdlib.h most of the time anyway. I'll use them for the rest of this tutorial.

The printf function isn't very complicated. Its function is obvious, but it does have a few quirks you might need to know about.

Printing

Let's say you had an int and you wanted to print it out, along with some words. In Python this would be simple:

print("The value of my integer is %d" % my_int)

That syntax actually comes from C, printf has a very similar syntax:

printf("The value of my integer is %d", my_int);

The format string and its arguments are all arguments to printf, and you can have as many of them as you like. For example:

printf("My car cost %d pounds and weighs %f kg", my_int, my_float);

Simple so far, right? You may already be familiar with the idea of output streams if you use the command line. Everything printf prints, by default, goes to standard output or stdout. This is the output that is piped to other programs and written to text files when you redirect the output using ">" or "|". Sometimes you want to output debug or error information which is supposed to only be read by a human. For this you use stderr, or the standard error output stream. Since printf only prints to stdout, we must use another function, fprintf. It is used as follows:

fprintf(stderr, "This is an error");

That is essentially all the printing you'll ever need to know.

Functions

C is a statically, strongly typed language. There is no type coercion or automatic choosing of types for you. This keeps programs simple and easy to understand, and helps avoid undefined behaviour. This is relevant because function definitions start with a return type and a list of parameters, all specifying their type. If we look at the following function, we can see what I mean.

void say_hello(char* name)
{
printf("Hello, %s!", name);
}

You should know what void means. If you tried to pass a float or a struct of some kind to this function, it would not compile. I assume returning other types doesn't require any explanation or patronising "exercises".

Arrays, Strings and Pointers

You might have noticed that the syntax for passing a string to a function is a bit odd. Well, that's because all strings in C are really just arrays of characters, represented by the char data type. This will become clear in the following code snippet:

char hello[10] = "hello";

That compiles and behaves as expected. You can pass the variable hello to any function that expects a C-style string and it will work. But isn't hello an array, not a char*, whatever that means? Well, you might think so in a language without manual memory management, but in C, arrays are merely pointers to the first element in an array.

The thing which confuses most new C programmers is the concept of pointers. It all becomes vey simple if you just ditch the analogies and realise that a pointer is a variable which stores the address of the data being pointed to. So the pointer points to the data in RAM, but is not actually the data itself, merely an address. If you know that, the act of "dereferencing" a pointer is also very easy, it's just accessing the data being pointed to. Here is some code to explain:

int my_int = 4;
// Creates an integer with the value of 4

int* to_int = &my_int; 
// Creates a pointer to an integer, which is then set to the 
// address of the integer

printf("%d", *to_int);
// Dereferences the pointer and prints the value

There are some new operators in there: the & operator, which takes a variable of any type and returns the memory address at which the variable is stored. The * operator takes a pointer and returns the data stored there. It essentially reverses the & operation.

Memory allocation

We have established that pointers hold the address of a block of memory, which usually has some data in it. Letting the compiler allocate the appropriate amount of memory is fine in some cases, but what if we need to allocate an amount of memory which we only know at runtime. An obvious use case is when copying large files - we cannot simply allocate 4 GB at compile time and hope the file fits, we must get the file size from the filesystem and allocate that much memory. This is done using dynamic memory management.

The two most important functions for this are malloc and free. malloc takes an integer denoting the number of bytes to allocate, and returns a pointer to the allocated block of memory. There is a problem here - you do not always know the number of bytes taken up by a data type, it could vary from platform to platform. You can find out using the sizeof function, which takes a data type and returns an int telling you how many bytes that data type takes up per instance. Let's say you wanted to allocate memory for an array of 24 ints. You'd combine sizeof and malloc to produce the following:

int* my_array = malloc(24*sizeof(int));

And now the pointer my_array points to an allocated block of memory which can fit 24 ints. But how do you access the ints? Well, the variable's name should give you a clue - it's just an array. You can use this like an array, because it is an array. This is all there is to arrays in C, pointers to memory.

After you're done with the array of ints, you may not want the memory to stay allocated. If you allocate memory throughout your program but never deallocate it, this causes the memory usage of your program to grow and is known as a memory leak. These can be avoided by deallocating (or freeing) memory using the free function, which takes a pointer and frees the memory it points to.

free(my_array);

In your programs, every call to malloc should be accompanied by a call to free at some point, to eliminate memory leaks. This is very important when you're dealing with large files or databases, where leaking memory could cause gigabytes of damage, causing the OS to swap pages to disk and slow down or even crash.

Email is the Future

2013-11-12T00:00:00Z

Often, people look at me oddly when I suggest that they email me something. "Why can't I just send it to you on Facebook or Skype?" they say. Well, it doesn't have to be those two media of communication, but it's usually something like that. When I say often, I also mean that this has only happened on two occasions, so bear that in mind as I make things up about the types of people

who ask me this.

I'd like to think that the people who advocate use of Facebook or Skype are the sorts of people who move from social media site to site, forgetting about their old posts from years ago on Livejournal or MySpace, saying they don't matter. Maybe the content of the messages didn't matter, but if they did, wouldn't you want them preserved somewhere permanent, somewhere you could always access them, regardless of how successful the site is this quarter and whether it has to shut down? That's not the best argument for using email, since it means you'd have to save and back up your emails yourself - this is all irrelevant, since my point is not that you should use email because it lets you archive all of your creepy private messages. No, my point relates to the control the social media websites have over you.

Isn't this just another rant about free software?

No, not exactly. You could have a proprietary email client and it would still be better than using a proprietary social media SaaSS (Service as a Software Substitute). Those social media websites, while they may restrict your freedom for other reasons, primarily rob you of control over your computing. When you upload a picture to Facebook, then edit it, should you not have done that using a program on your own PC, instead of a less functional, half-baked "web app" version of the program? You could argue that it's more convenient, but this is only true because you are already entrenched in Facebook's lock-in scheme. If you did all of your editing locally, you would not be able to make the same complaint.

Doing your editing (whether it be of text, images, or music) on your own PC, using a program you have some control of, offers you far more choice and control over your experience than using a webpage to do it. If said website (this does not only apply to Facebook) changes their UI, you are powerless to protest, let alone change it back. There are two ways to avoid this situation: only use websites licensed under free copyleft licenses (such as the Affero GPL) or do eveything locally, where you have the most control over what software is used to do the editing. In an ideal world, you'd use a free software program to edit your content, removing the issue of someone having control over you entirely.

After you edit your content, uploading it to a SaaSS social network would be nullifying any advantages to gained by editing locally. Ideally, you'd want to do as much as possible locally - editing content, addressing it, but not sending it, as doing that locally is impossible, and the only thing which should be done over a network. A protocol exists to send messages over the internet, it's called SMTP.

Email is limited and inconvenient

I'd argue that logging into a website or using an app which I cannot examine or audit is a far greater inconvenience than being "forced" to resort to mailing lists and email clients. Of course, the majority of people do not think this, so I'll have to convince them with arguments which appeal to laziness or convenience. As Larry Wall said, "The three principal virtues of a programmer are Laziness, Impatience, and Hubris". Hubris doesn't really come into this issue, but the other two definitely do.

I'll use an example to illustrate. What if you want to put all messages starting with the word "Important", containing the word "deadline" and with a date in the format "YYYY-MM-DD". Maybe that's a bit specific, but I'm sure that filtering for something that specific is impossible without access to scripting tools. Lo and behold, such tools exist on your machine, but not on the SaaSS website! Using utilities like procmail, or even just a Perl script, you could easily walk a Maildir and put all messages matching the above conditions into a separate folder. Usually, it will just be things like moveto("programming") if ($subject~=/^GitHub/), but it is important to have tools which work in all cases, not just the cases the programmers at Proprietary inc. see fit to program in. Any arguments relating to the odd SaaSS tool which provides greater functionality than a local program are flawed because of the inherent limitations and danger of putting a group in charge of communication for another large group.

My social network is free, is there still a problem?

People in charge of projects like Diaspora and GNU Social might tell you there isn't, but I think there is. Who actually uses Diaspora? Who has even heard of GNU Social? Maybe I should kickstart them and get all my friends to use them, but what's the point when I have another method of communication which is almost as ubiquitous as internet access - email. Everyone has an email address and everyone will for the forseeable future. The first email standard was published in 1973 and is still followed (with some revisions and additions) to this day. That's forty years of use, something Facebook, Google+ and Twitter can never hope to achieve.

In 2005, it may have been tempting to say that MySpace would still be in use in 10 years. Right now, that might be said about Twitter or LinkedIn. No-one would ever think of saying the same things if asked whether they would be used in 2053, 40 years into the future. Not only would people say this about email, it has already happened. A standards compliant email client from 2013 would not fail to read an email from 1973, or send an email using SMTP (standardised in 1982). Email was the past, it is the present, and it will be the future. What more could you ask for in a communication protocol?

How to Wipe a Disk

2013-11-06T00:00:00Z

This article is not only about disk wiping, it will hopefully teach you something about using some GNU command line tools . This tutorial was written on my ThinkPad, which runs Debian, so the output should be pretty similar to what you'd get on Ubuntu, Mint or any other Debian- or Ubuntu-based systems. Basically, if you're using something

non-standard, you already know that you are, since those types of operating systems are few and far between.

Getting into the right environment

Since you can't reliably wipe a disk with the OS which is on the disk you'll be wiping, you need some way to run an OS from something other than the hard drive. Enter live CDs, DVDs, and USB drives. You can download a disk image and write it to a drive any way you want to, it doesn't matter, as long as the disk boots into a GNU userland of some sort. I recommend Debian, you can get a selection of disk images here. Burn a disc, write it to USB or whatever. After you do that, insert it into your PC and reboot. Make sure the removable media of your choice is higher in the boot order of the BIOS than the hard drive, or this won't work. When you reboot, you will hopefully be confronted with a bootloader menu with several options. Pick the one which sounds most like "Try before you install" or "Live DVD", and you will be put into a GNU/Linux environment, hopefully with some sort of shell prompt. You are now ready to execute some commands!

Which drive are we wiping?

The first thing you need to do is find out which disk you need to wipe. Your USB drive may be one of the drives detected by the OS, so it's important you wipe the right thing. Even if you only have one drive, it's best to check which drive you're wiping just to make sure. The command to list block devices is lsblk. It is used as follows:

$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:00 232.9G  0 disk 
`-sda1   8:10 232.9G  0 part
sdb  8:16   1  14.4G  0 disk 
`-sdb1   8:17   1   739M  0 part /

Here we see two disks: sda and sdb. With the Linux kernel, drives are given names based on what type of drive they are and the order in which they were detected. In this case, the 250 GB hard drive is detected first, so it ends in "a". All disks are given the "sd" prefix, unless they're really old IDE hard drives. The USB drive was detected 2nd, so its name is "sdb". So we have established that we want to wipe the drive "sda". Note that we are targeting the drive itself, not any of its partitions ("sdaX").

But how do we get to that drive? This is an question which really exposes the convenient and useful nature of the Unix philosophy. Specifically, the part which says all programs should aim to represent as much as possible using files. With this in mind, it's logical to say that the drive "sda" is represented somewhere on the filesystem. It happens to be in the directory "/dev", with all the other device files. Essentially, we will have to wipe the file "/dev/sda", which is, for all intents and purposes, the disk.

Wiping the disk

To wipe a disk, we have to first consider what we actually want to do to the disk. Of course, we want to erase the data on that disk permanently. Since erasing it by replacing the partitions doesn't actually erase the data (it is still there, it just cannot be read without restoring the partition table), we have to actually overwrite all of the data with some other data. We could write lots of random data, but remember that generating random noise takes time and CPU power. Instead, let's just overwrite it all with zeroes, which is very simple. Linux users do things like this so often that the devs saw fit to add a virtual device consisting entirely of zeroes, at "/dev/zero". You can think of this as a disk of infinite size consisting entirely of zeroes. Our aim has changed from the initial "I want to wipe a hard drive" to "I want to overwrite /dev/sda with data from /dev/zero".

There is a command to copy and convert data, and that program is dd. We'd use it to copy data from /dev/zero to /dev/sda as follows. Be careful, because the following command will irreversibly overwrite all of data on your primary hard drive. Make sure you have the right drive and that you really want to do this.

$ sudo dd if=/dev/zero of=/dev/sda

You won't see any output unless there's an error, so don't worry about dd's complete silence. A lack of errors means it's working! Now, you should switch to a different terminal, and let dd run in its own. You can do this by simply opening another terminal window, if you're using a GUI, or press CTRL-ALT-F2, to switch to the 2nd virtual terminal, if you're in text-only mode.

What to do while you wait

Now is a good a time as any to tell you about several useful features of Unix systems which let you find and learn about commands without resorting to the internet (they'd probably point you to this anyway). I am talking about the Unix manual pages. They are accessed through the man command, followed by the name of the program you want to learn more about. For example:

$ man man
MAN(1)               Manual pager utils              MAN(1)

NAME
    man - an interface to the on-line reference manuals

And a lot more information. You can exit the manual pager by pressing "q". This is not all you can do while you wait, if you read the man page for "dd", you might find a way to make it print how much it has copied. The man page is intended to be a reference for experienced users, so don't worry if you don't understand it.

On any OS, processes are not only known by the human-readable names of the programs, they are also known by Process IDs, or PIDs. When a program is run, it is assigned a PID. This means that the higher the PID, the later in the boot process or interactive session it was run. After a program terminates, its PID is recycled and given to the next program to be spawned, or just left unused. To find out the PID of the dd process you ran earlier, we can use the program pgrep which takes a program name, and outputs all of the numerical PIDs associated with programs with that name.

$ pgrep dd

It's safe to assume that the most recently started instance of dd is the one we just started - the one with the highest PID. Now that we know its PID, we can start sending it signals. The program to send a signal to a running program is kill, which is a bit of a misnomer, because not all of the signals it's capable of sending actually kill the process. In this next command, substitute "$pid" with the PID of your dd process.

$ sudo kill -USR1 $pid

This command won't output anything in the terminal you run it in. Instead, it sends a signal to dd to make it print out how much it has copied. This information would be in the terminal that dd was run from. If you're using a GUI terminal, switch to the window dd was run from. If you're in text-only mode, switch back to the 1st virtual terminal by pressing CTRL-ALT-F1.

Closing remarks

I hope you've learnt about more than just how to wipe a disk, although that is a useful skill, too. If you want to learn about Unix using the built-in system tools, you always have your trusty man pages, but also another new tool: apropos. It takes a list of keywords, and parses the man pages, searching for commands which match your description of what you want to do. For example:

$ apropos extract archive
unrar (1)           - extract files from rar archives
unar (1)             - extract archive file content

It generally tends to output many results, not all of which are commands. Remember "man man", the manual page for man? It had a list of section numbers and what they mean. We can see these section numbers in the search results for apropos, in the brackets just after the program name. To save you some effort...

$ man man
1   Executable programs or shell commands
2   System calls (functions provided by the kernel)
3   Library calls (functions within program libraries)
4   Special files (usually found in /dev)
5   File formats and conventions eg /etc/passwd
6   Games
7   Miscellaneous (including macro packages and conventions), e.g. man(7), groff(7)
8   System administration commands (usually only for root)

So out of all the results in that list, only the ones in sections 1 and 8 are usable as programs from the shell prompt. Using these tools, your knowledge of Unix can grow without resorting to searching the internet for hacks other people have made. Even after you know the ins and outs of a program, the man pages are still useful for when you can't remember the order the arguments go in, or the switch to make a program change behaviour, or things along those lines.

Why I Use FreeBSD

2013-10-28T00:00:00Z

Installing packages from source

Recently, I had SSHed into one of Debian Stable virtual machines I was using as a file server. The main services I was running were FTP, an HTTP server with a directory listing and a Samba server, with shares set up for a few users on my

LAN. All in all, it wasn't anything very complex, it was extremely run-of-the-mill, and could have almost been installed by tasksel file-server or some similar list of packages almost certainly installed on thousands of servers.

A problem arose with the server after I realised I needed to enable several options in ProFTPd, the FTP server I was using. There isn't a way to do this easily on Debian, as far as I'm aware. I know about the apt-get source command, but, as far as I have read, that command can only build packages with the options that the person who made the source package specified. Essentially, it's a system focused not on customisation of packages, but on staying on the bleeding edge of updates for that package. I don't really see any point in this, so apt-get source seems useless for my purposes. It would be useful in a context similar to that of sbopkg, the SlackBuilds package manager, which is useful because it provides access to a wider variety of packages than official mirrors. Debian has all of the packages one could desire prepackaged and mirrored, so the one possible use of apt-get source is null.

At this point, the only option available to me was to download the source from proftpd.org and use the Makefiles provided with the source to compile and install ProFTPd. There are several problems with this approach:

dpkg, the Debian package manager, is never involved, meaning dependencies and upgrades cannot be handled automatically - it is up to me to fetch the tarball and recompile it.
Having a multitude of extracted tarballs, CVS repositories and Git repositories lying around on my system causes clutter and confusion. One might say that naming and organising my directories should be my responsibility. The fact of the matter is that it should not be handled manually - that is a time-wasting and error-prone approach.
This solution is not scalable. With dozens of packages across many servers needing to be recompiled whenever there is an important security fix, I could find myself without any time to devote to real, important server maintenance tasks. Eventually, manual compilation of packages would become impossible.

This is a stark contrast to FreeBSD's solution to the problem of package customisation. On FreeBSD (and any other *BSD), the ports collection provides a convenient way to compile your own packages. Quite simply, you search for your packages using "whereis" or "make search", navigate to its location under /usr/ports, then make it. That is obviously not the only way to make use of ports - that method suffers from essentially all of the same problems as before!

If we look back to the original example, that of ProFTPd, I can demonstrate the power of ports. Using sysutils/portmaster, a handy script for building, then installing packages from source, I can install ProFTPd with my desired configuration:

$ sudo portmaster ftp/proftpd
===>>> Port directory: /usr/ports/ftp/proftpd
===>>> Gathering distinfo list for installed ports
===>>> Launching 'make checksum' for ftp/proftpd in background

<Most of the output removed for brevity>

===>>> The following actions were performed:
Installation of ftp/proftpd (proftpd-1.3.4d)

Basically, I specified which port to install, which caused portmaster to begin the process of downloading the source, checking its integrity and most importantly, launching make config, which opens a menu with the compile options for ProFTPd. After I select the ones I want, I can then let portmaster compile the port into a package, which is installed using the FreeBSD package manager. From this point, I can either update it from a mirror of binary packages (this would be faster, but the package would not have my compile-time options enabled) or upgrade it using portmaster. The process of upgrading hundreds of packages from source is a single command: portmaster -a. The convenience and power of FreeBSD ports is only one of the reasons I use it and the other *BSDs on my servers.

Jails

On Debian and indeed all other GNU/Linux operating systems, you have a few choices when it comes to virtualisation. You can go with KVM, Xen or VirtualBox, all of which are the standard, heavyweight, fully virtualised solutions. Alternatively, if you think the immense overhead that comes with fully virtualising a server is too much, you can use Linux containers. There are problems with using LXC, however, problems which are extremely important for security conscious sysadmins.

If a malicious hacker gains access to your LXC and manages to use a privilege elevation exploit to gain root on that box, it is not only the virtual box which is compromised - root within an LXC translates to root outside the container. This means that LXCs are essentially useless if you are virtualising for the sake of security. Similarly, shutting down an LXC using the shutdown binary results in the host machine shutting down. While this is less serious from a security perspective than the gaining root (since you need to be root to shut down in the first place), it can be irritating.

Jails on FreeBSD face none of these issues. FreeBSD jails have existed since FreeBSD 4.0, which was released 13 years ago. A massive amount of development has been targeted at adapting the FreeBSD kernel and userspace to playing nicely within a jail and when hosting jails; this effort has resulted in an excellent virtualisation solution. Much like a traditional virtual machine, a jail has its own kernel, its own procfs, its own devices and everything you would find in a real FreeBSD system. Using make buildworld && make installworld into an appropriate jail directory is all that is needed to create a jail. After this is done and the networking is set up, the environment viewed from within a jail is indistinguishable from a real FreeBSD system.

What is LaTeX?

2013-09-26T00:00:00Z

You probably have not heard of LaTeX before now. If you have, then it is likely that you have no idea what LaTeX is, save for a vague feeling that it relates to documents in some way. By the end of this short post, you will not only know what LaTeX is, but be able to understand why people use it, and what its advantages are. While you probably won't switch to LaTeX immediately, you might

be able to see yourself doing it if you ever have to write a non-trivial document. Non-trivial means, in this context, something longer than a few dozen pages, or something requiring a considerable amount of formatting which may take a while in your WYSIWYG editor of choice.

What is LaTeX?

LaTeX is a document preparation system, in the words of latex-project.org. That seems a bit vague, so I'll say that it's the name of a system involving two things: the writing of source "code" files in the LaTeX "language" and the compilation of that source into a final, prepared document. So you'd have your text file (referred to as LaTeX source) which may look something like this:

\documentclass{article}
\title{My Article}
\author{Me}
\date{\today}
\begin{document}
\maketitle
This is my document.
\end{document}

Then you'd run a LaTeX compilation command, like pdflatex myarticle, resulting in something like this:

Why use LaTeX?

It appears very pointless and silly to install a suite of document preparation programs and learn how to use a very large set of typesetting commands just to be able to write documents. Despite what you may think, it is not. LaTeX confers several advantages to the document writer.

The process of writing a document then compiling it divorces formatting and content. This means that, if you just want to write a 10,000 word essay, you will only have to type the 10,000 words into your favourite text editor and you won't have to worry about faffing around with fonts, margins and whatnot. You might argue that the defaults in a WYSIWYG office program (such as LibreOffice Writer) are good enough that you won't have to faff around anyway. You'd be wrong. When you are writing a title, for example, you must first center the text, make it bold or underlined, maybe increase the font size. This is all pointless and gets in the way of creating the content, which is what actually matters. With LaTeX, you write all of the content then worry about formatting. Even better, you could leave that to someone else.

The clear divide between source files and final documents allows you to write one source file and compile it into any of a variety of formats. This is not possible with formats such as Microsoft's OOXML. It is impossible to 100% accurately convert documents to and from this format. That is a rather extreme example, but it is clear that the plain text standards (ASCII, Unicode) are completely open and that thousands of plain text editors exist. This allows you to write LaTeX source in whichever editor you want, and to compile it into whichever format you require, whether it be PDF, HTML, PostScript, or any other format.

The vast number of packages available for LaTeX online is staggering. Visiting the LaTeX website tells you that packages for everything from mathematical formula rendering to bibliography generation exist and are available freely. In LibreOffice Writer, for example, to create a section, you'd have to create a human readable line of text with some sort of formatting to distinguish it from the surrounding text. In LaTeX, you'd use a section command, like so:

\section{Why I love LaTeX}
Reasons.

\section{Why I hate clouds}
More reasons.

These commands are machine-readable, thus readable to the LaTeX compiler you are using. This means it is possible for a list of sections, subsections and subsubsections to be generated - this uses a simple \tableofcontents command from one of LaTeX's packages. The flexibility and power of LaTeX becomes obvious when you start dealing with generating glossaries and indexes of thousand-page books.

How do I learn LaTeX?

I'm not going to write a sub-par LaTeX tutorial just for the sake of it. It'd be a waste of both my time and yours. Instead, I'll point you to this site, which is a far better intro to LaTeX than I could ever write. There is also substantial documentation of many LaTeX features on that Wikibook, making it useful as a reference as well as a tutorial.

Web Servers

2013-09-17T00:00:00Z

Imagine you're a person on some sort of device, using the internet. You see all of these websites and what do you ask? "How can I set up a web server?", of course. If you did not ask that question, then this guide is not for you. Anyway, down to business. You will need:

A server (an old laptop, desktop or other computing device)
Access to the internet

See, it's not that hard, despite what all of these hosting companies would have you believe. The only money you might have to spend is for a domain, which I'll get into after you've set up everything.

Installing an OS

The first thing you have to do when setting up a server is to choose the OS which will form a base for all of your services to from. I recommend Debian GNU/Linux, because it's pretty stable, there are many, many packages for it, and lots of guides tend to be written for it due to its popularity as a server OS. Feel free to choose something else, but remember that "apt-get" won't usually work on CentOS or anything else. I'm not going to walk you through installing Debian since it's so easy due to the way the installer works. There are a few things you should note.

Please connect to a router or switch via Ethernet. You don't want a wireless connection between your router and server to become a bottleneck.
Either set up a static IP when the Debian installer prompts you, or set up a static IP for your server's MAC address in your router's DHCP server settings.

Both of these things are quite important. Without a static IP, your router won't know where to send packets it receives on the HTTP port.

Installing a web server

There are quite a few web servers to choose from. The most popular two are Apace and Nginx (pronounced "engine x"), and for the purposes of this guide, I'll be using Apache. This isn't really for any reason, I'm just more familiar with it. Run apt-get install apache2 as root to install the apache2 web server. It should enable itself as a service which runs at boot, but in case it doesn't update-rc.d apache2 defaults should do it.

Serving content

You should be familiar with writing HTML, CSS and JavaScript, so I won't bore you with the details of how to make your first website. All of the content served by your server lives in /var/www/. If you list the files in that directory, you should find only one file named "index.html". If you visit the IP address of your server on your local area network (for example, 192.168.0.5, or similar) in your browser, you should see the test web page for Apache. You can edit index.html, add some more pages and some images, then you'll have a website. But what good is a website if you can't access it from the internet?

IPv4 addresses and NAT

Before you can get your server online, you have to understand how computers communicate with each other across the internet. Each packet which is sent anywhere on the internet contains the address of the recipient. This address is an IP address, usually an IPv4 address, which you have come across before (think xxx.xxx.xxx.xxx patterns of numbers). When you ping a website, say google.com, that domain name is resolved into an IPv4 address using DNS. The details of DNS are irrelevant, but it basically looks at the DNS records associated with a domain, finds the A record, and translates the domain name into the IP of the A record. This is what happens:

So that's what you see when you try to communicate with someone else over the internet. You see a single IP address which refers to their LAN. But what if there are multiple computers on the LAN? The public, or external, IP address usually points to the router. This router takes packets and routes them to the computers on the LAN. This means that the computers on the LAN are not directly accessible from the internet, they must first go through the router. This also means that the LAN is completely separate from the rest of the internet - there is no end to end connectivity, all packets go through a router. The translation of the external IP (the one everyone else sees) and the internal IP (the one you and your router see) is known as Network Address Translation, or NAT.

The main consequence of this is that you cannot point someone to your server using the internal IP address (192.168.x.x, usually), you must instead set up a rule for your router to pass all incoming requests on a port (for HTTP that is port 80) to the internal IP address of your server.

Getting online

The first order of business is getting a domain name. If you visit this website and create an account, you will be able to choose from a wide variety of free subdomains (e.g. your-name.mooo.com). You cannot create a free domain (your-name.com), since those require registration at a domain registrar (like Namecheap or GoDaddy) in exchange for money. This means you are stuck with "my-cool-site.mooooo.com" or whatever you pick, until you stop being cheap. There are instructions on setting up your DNS records to update automatically, but for now you can visit this page and put that into the A record for your domain. If you do try to go to this domain, you won't be able to, because your ports aren't open! Since instructions vary from router to router, you should just go to this helpful website, find your router and follow the instructions to forward port 80 to the static IP of your server.

Your website may or may not be fully functional at this point. If not, search the internet, ask on StackOverflow or whatever your preferred help website is. The important thing is that you do not link to this post, because I won't take any responsibility for what you do with your server.

How to use GPG

2013-08-11T00:00:00Z

Why am I writing this?

I have looked up "how to use gpg" so many times, on so many websites, and have found every guide to be focused on something I don't use or worded in such a way that I get confused and revoke all of my keys (that hasn't actually happened...yet). I thought I'd whip up a quick guide

that could serve as a reference for future Kaashif, who may not remember anything about GPG other than gpg -ear and gpg -d.

Installing GPG

This is easy. Most distros come with it, for package signing among other things. The ones that don't have it easily installable from their package repos as either "gpg", "gpg2", "gnupg" or "gnupg2". While GPG and GPG 2 are actually different programs, many distros don't make the distinction, since hardly anyone uses GPG1 anymore.

Generating a key

gpg --gen-key You have to be an idiot to get this wrong. Defaults are fine, unless someone has broken RSA with quantum magic. Make sure the email is right.

After generating a key

Two things:
1. Create a revocation cerificate gpg --output revokecert --gen-revoke $KEY
2. Back up everything
I somehow managed to lose two GPG private keys, of which I had only generated one revocation certificate. I'll never make that mistake again - I have it backed up on a CD, on a USB drive and on a server. Nothing off-site, though, so someone could theoretically burn down my house and I'd lose everything.

How to use your newfound encryption powers

To encrypt plain text from stdin, just do gpg -ear $KEY The $KEY refers to the recipient. It's fine to use your own pubkey when testing, but you have to use the pubkey of the person who will decrypt the text! That's the cornerstone of everything to do with keys. Imagine someone saying "I'll send you this lock only I have the key to", that would be idiotic when they have the means available to send you a lock only you have the key to.

If someone sends you a properly encrypted message, invoke gpg -d. Since you should only have one private key at this point, it'll take input from stdin which, hopefully, has been encrypted with your pubkey and can be decrypted with your private key.

Signatures

Let's say someone doesn't want to use GPG because they're too lazy (a very realistic scenario). Maybe you're posting on a mailing list, where GPG isn't necessary, and just annoys everyone. You still want people to know that you sent the message and not an imposter with fake headers, correct? Well you're in luck, you can attach a GPG signature to your messages. This is basically a copy of the message which can be decrypted with your public key. Since you are the only person with the private key, you must have been the person to sign the message. The command to use is gpg --clearsign. No need to specify a key, because you only have one private key

Encrypting files into a binary format

Remember using gpg -ear? The "a" means ASCII. Take that out and it magically outputs a binary file, with the input filename and a ".gpg" extension.

What is my key?

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.14 (GNU/Linux)

mQENBFH38X4BCAC25Ra3yhPjXtqWmYbxHHG3Esn4en9z0yWCE4AukNUm8MX1kPna
1TqBFkw8WhhQKV1v+U0T18zoWwpMm1tUJdVWaUVc2/4iyR49d3SI81K+g7CuQuz1
YjyMG1zOzWeswfcJjZF4+Ti/fZIR7fNc+neAfGg9WMpAvdfMWAjuuV44vrAZQey+
bgpVN3uEJYntyzJRkgkqTNS2JatrL0IOVmfKtrNzuHQy5THJn2uDm9+Eg/tVe0Jp
bfz5AJKqqGUpE9jitCKc55n5xvJrlOQyWaIWuSiaKRRRiupswVAVoEa5y8JOckyb
l5q10F6hCIWq+ohGV/huGSvMNAMoU9+8vdXbABEBAAG0R0thYXNoaWYgSHltYWJh
Y2N1cyAoUHJvZ3JhbW1lciwgc3lzYWRtaW4pIDxrYWFzaGlmaHltYWJhY2N1c0Bn
bWFpbC5jb20+iQE4BBMBAgAiBQJR9/F+AhsDBgsJCAcDAgYVCAIJCgsEFgIDAQIe
AQIXgAAKCRBa0dhzPoELBO6IB/40BRr2DrajYJ6y9yGpUHJIPT+KC90i62r0D3vM
raHz3/shBOOJEeyqhxcmwByWkhBRjEkkt3xgaWlj6XzvuBY457p54f6bIKeKwXpT
WJAVdhM2VSQdTyX/Svo3lnVYv0bozbRIb88M6FvTF8Cv631zSImAAKuPD2X7ZYl1
2p3gLVWB//vkAr2WAJjq1qrcmoVtixbs5HeM65MR7hcE30vCJzswev7m+4mQXFR3
LoMNoC1Zx2iYBNgUNMpoGaGdPTohMD8gCklr86R+OzCORrWyKBl4qc608Dmt+myG
Rs5OH6c9yBYiHIfc9UYaMoPXIdvQBwO/4bOOOZbwqp7Onjy7uQENBFH38X4BCADo
Obc//asuZBCJtf2GSZGrWvWJVYv06a3noIBb9TG+6fAZ40c1zlPJzCSa6wU8aeXn
6UvbQI7W014wWlO+JvvpaZzEsJ+qnxkZQqEne1BqTb32OmIJLInDQhsZsoR/PNSQ
KRUshS9kLBSez2EBA7rV2cJ6X2a2Dtb75PlzysjmHrws1ZOelYRu3DorYfUQ05jL
IhCFQRCHgryK0yD+mZy55F9JHGHWooTGTStBmNW5dDpXdPUcHeZm5ICXkuQC1tzV
apW/vy77PB3ZVTxX/wEuATgbfbPiGpvqVQWr+CTnQOpVGoAMByHdlkHaKtGnjUlI
vp9Xep+cbS4GOjpJ0khZABEBAAGJAR8EGAECAAkFAlH38X4CGwwACgkQWtHYcz6B
CwTqWgf/ROZ5mmloJ/86iCeGzxzIHMWF4m8dwMGa3SZ310umsl83ydM5hixmRM43
cABzEq7sbgirmg31GAkGwU0dQ6z9R4TSSbDFS8nz1EztvuNabMTPfc9AdtE/ig2P
o6Hul8n3A6330RDk94QtqBw3Eppsr6PgQ+hA2rbfy7YRca6p100cC9yOAdc5gvmr
0qTfFtX71/Nxrcok+88uOk3PMwyvW/6HCs9m7jfx2RZe3DmQ5ykZ7qMe5YM4xGKy
SteLT0+yQNETungL5lyC0V/JAgAIQXItQytEXL9TEKeEP2jCrRD4lDa98qMkbNDI
ba2AGJ2aj3+u9787VaY7bSRFQH8Kwg==
=1hN6
-----END PGP PUBLIC KEY BLOCK-----

How did I generate that?

gpg --export -a $KEY. Once again, you can take out the "-a" and add an "--output " to get binary output.

Where do I find more public keys?

Go to a keyserver, like pgp.mit.edu. You should also submit your public key there by invoking gpg --send-keys --keyserver pgp.mit.edu $KEY. The key will propagate to other servers, so you cannot delete or edit a key once it's there. Make sure everything is correct and backed up. Don't search my name, I don't want to be embarrassed. If you must, my key is the most up to date one, I lost the old one, and revoked the other one...due to losing it. Do what I say, not what I do.

Summary

gpg -ear $KEY - Encrypt plaintext from stdin

gpg -d - Decrypt plaintext from stdin

gpg --export -a $KEY - Export ASCII-armoured key to stdout

gpg --import $FILE - Import key from a file

gpg --clearsign - Sign a message from stdin, leaving the message human-readable

gpg --detach-sig $SIGFILE $FILE - Sign a file and create a detached signature in another file

gpg --some-sort-of-command --output $FILE - Do something, then output to a file

kaashif's blog: Computers, with some mathematics on the side

macroexpand-1 for C++ Coroutines

Minimal coroutine example

Decompiling

What's really happening?

Conclusion

Book Review: Children of Memory

Tell, don't show

More crow or less crow?

Missed opportunity for a depressing ending

We get it already! Stop with the time loops!

Kill the parasite!

Verdict

binfmt_misc: The magic behind Linux/Windows interop

What happens when you run a normal executable?

What about a shell script?

What is binfmt_misc?

How does WSL tell clip.exe is a Windows executable?

How do fully executable jars work?

How does java -jar execute a jar with text at the start?

Conclusion: why we can't use binfmt_misc for jars

Final verdict on binfmt_misc

Differences in backwards incompatibility between Rust and C++

Why the C++ change is backwards incompatible

Why the Rust change is backwards incompatible

Conclusion

Addendum: sizeof

How large are the arbitrage opportunities in Eve Online?

Getting the datasets

Setting up our analysis

Finding arbitrages

Building the jump graph

How much do these opportunities return?

Are these opportunities really risk free?

Conclusion

Valuing converters in Sidereal Confluence

How to value future trades

Valuing Eni Et converters

Conclusions

Is implementing alloca(3) in C really impossible?

What should alloca do?

Can alloca be implemented in C?

Can alloca be implemented in x86 assembly?

How is alloca actually implemented?

Final word

Booting the 1994 Dr Dobb's 386BSD 1.0 CD

Poking around the CD image

Reading the instructions

Trying to boot it from DOSBox

Running setup.exe

Attempt 2: MS-DOS 6.22 in QEMU

Trying the Gunkies instructions

Conclusion

Adding keyword arguments to Java with annotation processing

Errors at runtime - the worst kind of "builder"

The usual builder pattern

Staged builder

My attempt at a builder annotation

The billion dollar elephant in the room

Conclusion

The problem with using splice(2) for a faster cat(1)

What's our performance metric?

read-write implementation

splice implementation

Profiling, fast and slow

The final straw: why splice isn't more widely used

Conclusion

Searching for Planet X with the Z3 solver

Step 1: Generating all possible boards

Step 2: Picking the best actions

What's the best action?

Step 3: How does it perform?

Conclusions

Why does Mockito need JVM bytecode generation?

Trying to implement Mockito.mock without anything fancy

How do you mock a class, not an interface?

Conclusions

Java doesn't really get immutability

final

Immutable data structures

How does `java -jar` execute a jar with text at the start?

Conclusion: why we can't use `binfmt_misc` for jars

Final verdict on `binfmt_misc`

What should `alloca` do?

Can `alloca` be implemented in C?

Can `alloca` be implemented in x86 assembly?