kaashif's blog

Programming, with some mathematics on the side

binfmt_misc: The magic behind Linux/Windows interop

2024-01-03

I was running something in WSL, as you do, then I thought about it for a second. When I'm doing this in WSL:

$ clip.exe < file.txt

How does that actually work? It turns out this is done using /init which is two things:

  1. PID 1, it's the init system, the parent of all processes in WSL.

  2. An "interpreter" for Windows executables. When you run clip.exe, that's the actual Windows binary you're running directly. This works via the binfmt_misc mechanism of Linux, which allows you to register runners for any binary with specific magic bytes.

/init is a bit hard to get at since it's a closed source component of WSL. We can get some idea of how it might work by looking at (1) a Microsoft blog post describing how this works at a high level and (2) cbwin, an open source implementation of this.

We can also do fun things, like make Java jars directly executable without needing to run them with java -jar. But beware - if you have "fully executable" jars with scripts embedded at the start (like the ones Spring Boot makes), binfmt_misc can't possibly be able to tell that they're jars.

But java -jar still works on them! Weird. Here are the questions we want to answer:

  • What happens when you run a "normal" Linux executable? What about a shell script?

  • How does Linux tell that clip.exe is a Windows executable, and how does it run from inside Linux?

  • How can Java tell that a shell script with some binary junk at the bottom is really a jar, but the Linux kernel (via binfmt_misc) can't?!

Answers are below.

What happens when you run a normal executable?

Let's take cat as an example. From inside your shell, you execute:

$ cat file.txt

Your shell will probably find cat in the PATH and use the execve system call to execute that file.

This is not mysterious at all. You can see the source code of execve here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/exec.c?id=HEAD#n2030.

This blog post isn't supposed to be a deep dive into execve, the point is that execve executes executables.

What about a shell script?

Believe it or not, also execve! execve reads the first two bytes of the given file, if they're #!, then the file gets executed in the way we're all familiar with.

If file.txt is given to execve with these contents:

#!/bin/sh

echo Hello

then execve will run /bin/sh file.txt, and we go back to the first case: a normal executable.

So far, so good, everyone should be familiar with this. The interesting part comes next.

What is binfmt_misc?

binfmt_misc is documented very well here: https://docs.kernel.org/admin-guide/binfmt-misc.html. At a high level, binfmt_misc is a feature of the Linux kernel that allows you to specify a rule matching either a filename suffix or magic bytes at an offset in the file, and an executable to use to run that file, similar to how a shell script is run.

For example, to match the .txt extension and cat the text file when "run", you could run:

$ sudo sh -c 'echo ":cattxt:E::txt::/bin/cat:" > /proc/sys/fs/binfmt_misc/register'
$ vim file.txt
$ chmod +x file.txt
$ ./file.txt
this is my file
it has content
hello

This isn't very useful. The next part is more interesting.

How does WSL tell clip.exe is a Windows executable?

Let's look at clip.exe:

$ vim /mnt/c/Windows/system32/clip.exe

Right at the start, you'll see the characters "MZ" - these are the first two bytes of any .exe file on DOS or Windows (and the initials of Mark Zbikowski).

MZ<90>^@^C^@^@^@^D^@^@^@ÿÿ^@^@¸^@^@^@^@^@^@^@@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@è^@^@^@^N^_º^N^@´   Í!¸^ALÍ!This program cannot be run in DOS mode.^M^M

...

Let's look at the binfmt_misc registrations (this example only works in WSL, of course):

$ ls /proc/sys/fs/binfmt_misc/
WSLInterop  register  status

It's too easy!

$ cat /proc/sys/fs/binfmt_misc/WSLInterop
enabled
interpreter /init
flags: PF
offset 0
magic 4d5a

And 4d5a is hex for "MZ". So when you execve a Windows executable like clip.exe, Linux will invoke /init to run clip.exe. The magic is thus inside /init.

/init is not open source. The blog post linked above has some hints and I encourage you to read it.

There's also https://github.com/ionescu007/lxss which contains some interesting proofs of concept for interacting across the Windows/Linux boundary.

How do fully executable jars work?

The interesting part about these is that they don't involve binfmt_misc at all, instead they use a different trick.

Go to https://start.spring.io/ and generate the example project. Add this section to the build.gradle to generate the "fully executable" jar:

bootJar {
  launchScript()
}

Run ./gradlew build to build the project. You get two jars:

$ ls build/libs/
demo-0.0.1-SNAPSHOT-plain.jar  demo-0.0.1-SNAPSHOT.jar

The first jar is not executable and has no main. The second jar is, with either java -jar or directly:

$ java -jar build/libs/demo-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v3.2.1)
...
^C
$ build/libs/demo-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v3.2.1)

But what gives, there was no binfmt_misc for Java jars?! The trick here is that the jar isn't a jar, it's a shell script:

$ less build/libs/demo-0.0.1-SNAPSHOT.jar
#!/bin/bash
...
<shell script>
...
exit 0
<what looks like binary data>

The binary data after the exit 0 is the jar. This is clever: when run directly, the shell script re-invokes the jar itself (the shell script itself!) with java -jar.

You can verify the binary data is a jar by looking at the magic bytes:

...
*)
  echo "Usage: $0 {start|stop|force-stop|restart|force-reload|status|run}"; exit 1;
esac

exit 0
PK^C^D^T^@^H^H^H^@
...

PK^C^D is exactly the magic byte string for a zip archive. A jar file is just a zip file with special contents.

This explains how directly invoking the jar executes it without involving binfmt_misc.

How does java -jar execute a jar with text at the start?

java isn't doing anything clever here, it just treats the jar as any other zip file - we can even extract the "fully executable" jar with unzip:

$ unzip build/libs/demo-0.0.1-SNAPSHOT.jar
Archive:  build/libs/demo-0.0.1-SNAPSHOT.jar
   creating: META-INF/
  inflating: META-INF/MANIFEST.MF
...

The cleverness here is in the zip file format itself, see https://en.wikipedia.org/wiki/ZIP_(file_format). A tool that reads a zip file must scan for the central directory data structure signature (some magic bytes) and read it from there. This means that we are allowed to have whatever preamble we want at the start of the file, including executable code, commonly used for self-extracting archives (e.g. an .exe you can run or open with your archive viewer).

This jar isn't self-extracting, but it is kind of self-running. I think it's a neat trick.

Conclusion: why we can't use binfmt_misc for jars

It's pretty common for fully executable jars to not have a .jar extension, since the whole point of being fully executable is that it's like a "normal" executable. This means we can't use binfmt_misc's extension matching.

We can't use the magic byte matching either since:

  1. Jars are just zip files, they don't have any unique magic bytes! #! is at the start (which we can't and shouldn't hijack), and PK appears later, but we can't hijack that either, those are the zip file magic bytes and not all zip files are jars.

  2. Even if there were jar specific magic bytes, we don't know the offset! The shell script at the start can be any length.

So binfmt_misc is useful for running files with a specific extension, magic bytes at a specific offset (e.g. Windows executables!) but jars don't have any of those.

Final verdict on binfmt_misc

binfmt_misc doesn't really seem incredibly useful if you ask me. One cool use case is registering QEMU as a handler for ARM executables while on an x86 machine, then you can run those binaries as if they were native. That doesn't seem like a real use case to me.

The WSL interop use case actually seems the most compelling to me, but is that a reason to have a whole kernel thing? I don't know.