macroexpand-1 for C++ Coroutines
2024-07-27
I went to a talk recently about C++ coroutines, and I don't think it was very good. The talk went through some examples of C++ coroutines and had a surface-level handwavy explanation of what "really happens" when compilers see a coroutine.
But a non handwavy explanation is really easy - you can just look at what the compiler does to coroutine code to see what's really happening. No analogy, no handwave, just looking at real code.
How do we do that without looking at LLVM IR or something? Easy - compile the binary then decompile it into normal C++ and see what it looks like! So let's do it!
The title is a reference to macroexpand-1 from Lisp. Looking at the source code resulting from a coroutine kind of reminds me of expanding a macro in Lisp.
This post is not intended to actually be accessible to beginners or readable for anyone, but it does illustrate an approach to demystifying coroutines that I like.
Minimal coroutine example
A minimal coroutine example needs to demonstrate suspending and resuming execution
at a minimum. It's not interesting if we just have a single co_return
and
it's all optimized away.
Here's my example in its entirety. First, the coroutine type.
#include <iostream>
#include <coroutine>
struct Coroutine {
struct promise_type;
std::coroutine_handle<promise_type> handle;
Coroutine(std::coroutine_handle<promise_type>&& x) : handle(x) {}
struct promise_type {
Coroutine get_return_object() { return Coroutine{std::coroutine_handle<promise_type>::from_promise(*this)}; }
void unhandled_exception() noexcept {}
void return_void() noexcept { }
std::suspend_never initial_suspend() noexcept { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
};
};
This post is not a coroutine tutorial, so I won't explain this code in depth. The point of this post is to look at the real code you get. A few points:
The
coroutine_handle
is what you can callresume()
on to resume the execution of the coroutine. It must keep track of the state of the coroutine when it was suspended.The other methods like
return_void
are just required by the standard and the compiler, but we don't really do anything interesting in them.
Here's the coroutine itself:
Coroutine test_coroutine() {
std::cout << "started!\n";
co_await std::suspend_always{};
std::cout << "returning!\n";
co_return;
}
Very simple conceptually - when we call the coroutine, we print something, suspend, then when we resume we print something else and return.
suspend_always
is an awaitable that just has await_ready
defined as false,
so using co_await
on it always suspends the coroutine.
Finally, main:
int main() {
auto coro = test_coroutine();
std::cout << "main\n";
coro.handle.resume();
return 0;
}
Again, very simple - call the coroutine, it suspends, we print something to demonstrate that the coroutine really was suspended in the middle, then we resume it.
Decompiling
This is the point at which someone might be tempted to wax lyrical about state machines, pseudocode, draw an analogy, etc etc. No.
Let's compile the above and feed it into https://dogbolt.org/ which lets you run various decompilers on any executable you upload. I'll do that and walk through the nicest looking output I can find.
Compile and run the example:
$ g++ -o a.out.clang -std=c++20 coro.cpp
$ ./a.out
started!
main
returning!
Perfect! It works. Let's decompile it. Upload a.out
to https://dogbolt.org/
to follow along. I looked at the output of all of the decompilers and I think
dewolf is the most informative and readable for this particular case.
Dewolf can be found here: https://github.com/fkie-cad/dewolf.
What's really happening?
Now we can walk through the decompiled code, which looks like C-ish code and not C++20 coroutine code. You'll notice some name mangling but it's actually very readable!
Let's start at main:
int main() {
int var_0;
long var_3;
long var_4;
var_4 = test_coroutine(/* frame_ptr */ var_0);
std::operator<<<std::char_traits<char>_>(/* __out */ std::cout, /* __s */ "main\n");
var_3 = var_4;
std::__n4861::coroutine_handle<Coroutine::promise_type>::resume(/* this */ &var_3);
return 0;
}
Something small and hard to notice happened here - where we wrote
test_coroutine
to take no arguments, here it appears to take an argument
frame_ptr
.
This is key to how coroutines work - when you call a coroutine, in this case
test_coroutine
, the compiler rewrites your code to add a frame pointer
argument. In this coroutine frame, we keep track of where the coroutine was
when it was suspended, and local variables. This allows us to resume the
coroutine with the same state, from the same place as when it was suspended.
Let's look at test_coroutine
:
Coroutine test_coroutine(_Z14test_coroutinev.Frame * frame_ptr) {
long(void *) ** var_0;
var_0 = operator_new(/* sz */ 40UL);
*(var_0 + 34L) = 0x1;
*var_0 = test_coroutine_actor;
*(var_0 + 8L) = test_coroutine_destroy;
*(var_0 + 32L) = 0x0;
test_coroutine_actor(var_0);
return Coroutine::promise_type::get_return_object(/* this */ var_0 + 16L);
}
This was rewritten significantly. Notice that no calls to std::cout
appear
here. The real work has been moved to test_coroutine_actor
.
What's left in test_coroutine
is just setting up the coroutine frame with:
A pointer to the function that does the real work
test_coroutine_actor
.A pointer to the cleanup function
test_coroutine_destroy
.The initial state of the coroutine,
0x0
.
We then call test_coroutine_actor
, which is where the real work is. This is
where the handwaving about a state machine ends and we can actually look at the
real state machine the compiler gives us.
long test_coroutine_actor(void * arg1) {
long var_9;
void * var_0;
void * var_1;
void * var_2;
void * var_3;
void * var_4;
void * var_5;
var_1 = arg1 + 32L;
if ((*var_1 & 1) == 0) {
var_2 = arg1 + 38L;
if (((unsigned short)*var_1 <= 6) && ((unsigned short)*var_1 != 6)) {
var_3 = arg1 + 37L;
var_4 = arg1 + 16L;
}
if (((unsigned short)*var_1 <= 4) && ((unsigned short)*var_1 != 4)) {
var_0 = arg1 + 24L;
var_2 = arg1 + 35L;
var_5 = arg1 + 36L;
}
switch((unsigned short) *(var_1)) {
case 0:
*var_0 = data_0x1904(/* __a */ arg1);
*var_2 = 0x0;
Coroutine::promise_type::initial_suspend(/* this */ var_4);
std::__n4861::suspend_never::await_ready(/* this */ var_5);
case 2:
*var_2 = 0x1;
std::__n4861::suspend_never::await_resume(/* this */ var_5);
std::operator<<<std::char_traits<char>_>(/* __out */ std::cout, /* __s */ "started!\n");
std::__n4861::suspend_always::await_ready(/* this */ var_3);
*var_1 = 0x4;
data_0x18de(/* this */ var_0);
std::__n4861::suspend_always::await_suspend(/* this */ var_3);
return sub_158e(&var_9);
break;
case 4:
std::__n4861::suspend_always::await_resume(/* this */ var_3);
std::operator<<<std::char_traits<char>_>(/* __out */ std::cout, /* __s */ "returning!\n");
Coroutine::promise_type::return_void(/* this */ var_4);
*arg1 = 0x0;
Coroutine::promise_type::final_suspend(/* this */ var_4);
std::__n4861::suspend_never::await_ready(/* this */ var_2);
break;
}
std::__n4861::suspend_never::await_resume(/* this */ var_2);
}
if (((*var_1 & 1) == 0) || ((unsigned short)*var_1 == 7) || ((unsigned short)*var_1 == 1) || ((unsigned short)*var_1 == 5) || ((unsigned short)*var_1 == 3)) {
if ((unsigned char)*(arg1 + 34L) != 0) {
operator_delete(/* ptr */ arg1);
}
arg1 = var_0 + 40L;
return *arg1 - *arg1;
}
}
This looks exactly like a standard switch/case state machine you might find in any C or C++ codebase, except with really poorly named variables.
We can see that the state variable arg1
contains a member at arg1 + 32L
which indicates the point the coroutine has reached. Initially it's 0
so we
execute the first case.
Case by case:
0: When writing the coroutine class, we set
initial_suspend
to returnsuspend_never
- callingawait_ready
on that returnstrue
and thus we don't suspend in the first case. Nobreak
means we fall through.2:
suspend_never
does nothing on resume, we printstarted!
, then we need to construct a coroutine handle to return when we suspend. This is critical - saving our state is what lets us resume. Our handle really only needs to keep track of one thing - where to resume. We want to resume at4
, which is the next state. That's what*var_1 = 0x4
saves.We then return the coroutine handle.
This is where the first call to
test_coroutine_actor
ends.4: When
main
callshandle.resume
,handle.resume
callstest_coroutine_actor
with the saved frame from earlier, with state4
. That means the switch/case skips straight to case4
and we printreturning!
. Next is a few lines of uninteresting cleanup.
The compiler isn't doing anything complex or clever here, it's just
transforming your coroutine that uses co_*
into a function that takes a state
argument and has a switch/case.
Not magic at all!
Conclusion
Learning always benefits from motivation. When writing a state machine by hand to e.g. parse input or do something while waiting for non-blocking I/O, programmers often want something like language support for coroutines.
Problems with coroutines aside, I think a good coroutine talk would go something like this:
Start with the problem (whatever it is), show a pre C++20 solution, then show a C++20 solution, and finally show that C++20 coroutines are actually totally equivalent to something you could write yourself - coroutines are just syntactic sugar.
You can obviously write coroutines and use non blocking I/O or state machines even in C++98! It's just easier (sometimes) in 20.
Lots of coroutine talks do look like this, but some are beset by padding and nonsense that expand 20 minutes of content into an hour.
Again, no magic here. Don't handwave. You don't need to be a compiler engineer to understand this stuff and claiming otherwise is disingenuous. If you handwave and can't answer deeper questions, I'll lose respect for you. If half of your talk is filler but you can answer questions, that's fine but it's annoying.