note: the article may contain errors, you have to keep in mind that it is a really barebone implementation of fibers, and the overview of coroutines is really basic. If you want to complete your research, you should read the sources at the end of the article. If you find an error do not hesitate to make an issue or a pr to github.com/supercip971/website ❤️
TLDR: if you like example code, or if you just want a plug'n'play library. You can find my fibers library here: github.com/supercip971/fibres
What are coroutines/fibers?
What is a coroutine?
In a programming language, a coroutine is a concurrency abstraction. Generally, in an operating system, you have a lot of processes running concurrently. The kernel has the ability to force a process to pause, it is responsible to save its context and then and switch to another process. This is called preemptive multitasking.
But in a coroutine, the process is responsible to save its context and switch to another coroutine. The distinction is that instead of being forced to pause at "random" times, the process can decide when to pause. This is called cooperative multitasking.
This means that in a code you could have :
And the output would be :
The yield function is responsible to save the context of the current coroutine and switch to another one (in symmetric coroutines) or the caller (in asymmetric coroutines). The kernel is not responsible whatsoever to switch between coroutines, everything is done in userspace.
What is a fiber?
A fiber is generally a coroutine, but it has its own scheduler. The scheduler is responsible to switch between fibers when they call the yield function.
The advantage of this scheduler is that you can have a blocking condition. It's like a kernel scheduler but in userspace. For example, if you have a fiber 'A' that needs to wait for 10 sec, the scheduler would switch to other fibers and select fibers only when their blocking conditions are met.
You could also have complex blocking conditions such as waiting for a network packet, a file to be writable, etc...
But what are the advantages of coroutines/fibers?
Fibers are generally used when you have a lot of concurrent tasks that are not CPU intensive. They can switch between tasks without having to go to the kernel, which is way faster. For example, if you have a lot of network tasks, you could use fibers to switch between them.
Fibers are also used in game engines to have a lot of concurrent tasks. For example, if you have a game with a lot of entities, you could use fibers to switch between them.
Generally, in multithreading, you need to lock resources because you can't control when a thread is going to access it. But with fibers, as you control the control flow with the yield function, you don't need to lock content.
(except if you have multiple thread using fibers).
How to implement coroutines/fibers in C?
The implementation of fibers in C is not hard, it's divided into multiple parts:
- The context switching
- The fiber scheduler
You need a scheduler to select which fiber to use next, and a way to save the fiber.
And in a CPU architecture, you can't just move the instruction pointer by just setting a new value.
The CPU is storing temporary data in registers and you need to save them before
moving to somewhere else when you call
Here we are going to use the x86_64 architecture with the system V ABI , but the same principle is used in other architectures and ABIs.
This is the hardest part of the implementation of fibers. First, we need to know what we need to store and load when we switch context in the
In general, each CPUs have registers, a stack and an instruction pointer.
For short, The instruction pointer is used to tell where we execute code, the stack is used to save data across calls and registers are used to pass and manipulate data.
This is true for x86 (64-bit), the architecture consists of 16 (64-bit) registers.
rbp(stack pointer and base pointer)
But what about the stack?
The stack is in memory, and the
rsp register points to it. When you
want to use another stack, you just have to use another
So switching the stack is like switching a register:
When you need to save the context, you could save all the registers, but it would be a waste of time. The ABI (Application Binary Interface) is a set of rules that define how to call a function. An ABI is like a low-level standard that is used to tell how different parts of the code are communicating with each other (like when you call a function, how do we pass the arguments ? how do we return a value ? ...). There are multiple ABIs, but the most common is the System V ABI. (Windows use another one called the Microsoft ABI, if you need to implement fibers for windows, you should rework the code to support it).
But what is important for us is that an ABI defines which registers are preserved and which are not during a call.
For example, if an ABI says that the
rax register is not preserved, you don't need to save it when you switch contexts.
Because the compiler will automatically save it when you call a function:
Here saving rax in
my_magic_function is useless, because the compiler will do it for you.
In the SystemV ABI for AMD64 the preserved registers are:
x86-CW(they are special registers, we will talk about them later).
And those are the only registers that you need to save when you switch contexts.
If we need to make a structure that represents the CPU context of a fiber we could do something like this:
But what is this
rip register? It's the instruction pointer.
The way to load and save it is harder than loading other registers.
We will talk about it when we will implement the switch function.
Imagine a function
switch_context that takes two arguments:
switch_context will save the current context in
old_context and load the context in
new_context. And run the
For this, we are going to implement the function with assembly. Hold on! I know that assembly may be hard, but I will try to explain it as much as I can.
The Implementation of the context switching
Note: I'm using nasm syntax, but you can use any other syntax you want.
First, we are going to use a neat feature of NASM:
structs. It may be used for representing structs in memory, like structs in C.
If you see this, it's the same as the C struct we defined before.
Each time we have a
resq that means that it is a
uint64_t (8 bytes) in memory.
Then we implement the function
When we call the function
switch_context, the first argument will always be stored in
rdi and the second argument in
That's how we will read and write values in
FiberCtx* old_context and
Saving general-purpose registers
When we want to store a value in a register, we use the
If we want to set the attribute
0x1234567890, we can do it like this:
mov instruction will store the value
0x1234567890 in the memory address
rdi + fibers_ctx.rbx.
We use the
] to tell that we are using a memory address. It's like dereferencing a pointer in C.
rdi register is the first argument of the function, so it's the address of
old_context, we also add the
offset of the
fibers_ctx.rbx to access the attribute.
We need to do it for each register we want to save, it should look like this:
For loading registers, we do the same thing but in the other direction.
We also use the
rsi register instead of
rdi, because it's the second argument of the function.
We want to use
new_context instead of
That's great, but we still have the
rsp registers to save and load, how do we do that?
We can't use the same method as before, because it's a little bit more complicated.
Before we can load the
rsp registers, we need to understand how the calling of a function is represented in the stack.
If we call a function
foo(), the CPU will automatically push the
rsp registers on the stack.
It's used to know where to return when the function is done, and to know where to store the return value.
It should look like this:
|rsp + 0 |
|rsp + 8 |
Note: we need to remember that the stack grows downwards, so, for accessing the last pushed value, we need to add the offset to the stack top.
Using this we can manipulate the
That mean that
[rsp + 8].
We need to store in the temporary register
r8because in x86 we can't read and write memory in the same mov.
rsp register, we need to store the value of
rsp in the
rsp attribute of
But, if we look at the stack, we will see that the
rsp register was already modified during the call of the assembly code.
call induced a push of the
That mean that if we want to find the original value of the
rsp register, we need to add 8 to the current value of
But for loading, it's also a little bit different.
We saw that when we are calling a function we push the return address. That's why we need to add 8 to the value of
rsp to get the original value of
When we call the
ret instruction, it will pop the top value of the stack and it will use it as the
So, for writing
rip we need to push the value, then we need to call
That means that we are going to set the value of
rsp, then push the value of
rip on the stack for the
If you don't understand, here is what the stack looks like during the code execution:
For loading the
rsp register, we do not need to add 8 to the value of
For loading the
rip register, we need to push the value of
rip on the stack, then call
That's it, we have now a working switching of context between two fibers !
But hold on, we still have a problem, if you try to use any float value, you may get something wrong. That's because the CPU also needs to save and load the floating point context.
For that, we need to save the
They are used to store the floating point context and status (like rounding mode, flag...). They are different than general purpose registers, that's why they have special instructions to load and save them.
They are both using:
stmxcsrto save the value of the
mxcsrregister in memory
ldmxcsrto load the value of the
mxcsrregister from memory
fnstcwto save the value of the
fcwregister in memory
fldcwto load the value of the
fcwregister from memory
But in the end, you should have something like this:
The final code
That's it, you now have a working fiber system that can switch between fibers! It's the hardest part of the project, and in the end, the assembly file should look like this:
Now that we have a working way of switching between fibers, we need to make a scheduler. A scheduler in software is a component that will manage the order in which tasks are executed.
But before that, we need to define what a fiber is in our code.
The Fiber struct
First, we need to characterize a fiber, a fiber has:
- a context (the
- a stack
- a state (running, waiting...)
- and a piece of code to execute.
- A unique ID (like a PID but for fibers).
The different states of a fiber
First, we need to define which states a fiber can have:
FIBER_STATE_DEAD: The fiber is dead, it can be deleted.
FIBER_STATE_FREE: The fiber is free, it can be reused for another fiber.
FIBER_STATE_RUNNING: The fiber is running
FIBER_STATE_WAITING: The fiber is waiting for something to happen (like a mutex, a condition variable, a fiber to end...)
FIBER_STATE_IDLE: The fiber is idle, it's only selected when nothing else is running.
Note: when a fiber is running, it doesn't mean that it's currently being executed, it means that the scheduler can select it.
We can use those states to know if a fiber is free or not and if it's waiting for something or not. It should be defined in an enum, or by using macros.
Now that we have a way to describe the different states of fiber, we need to represent it in our code:
The fiber struct
As we said before, a fiber has a context, a stack, a state, and a function to execute.
We also need to store its arguments and a unique id.
We may add more attributes to the fiber struct, but for now, this is enough. Now that we have a struct, we can make a way to allocate and free them.
Before doing a scheduler, we need to make some utility functions that will be used by the scheduler, and by the user.
The fiber allocator
First, we are going to store all our fibers in a vector. This vector will be used when we need to schedule and when we need to create a new fiber:
Note: the vector is from the rxi/vec library.
We also add a variable that will be used to give a unique id to each fiber (we increase it each time we create a new fiber).
So for allocating a new fiber:
- First, we check if there is a fiber that is free in the table, if there is, we return it.
- If there is no free fiber, we allocate a new one and add it to the fiber table.
Note: the fiber alloc is like a malloc function, it's not responsible for initializing the fiber, it's just responsible for allocating it.
Initializing the table when needed
We need to initialize the fiber table when we need to use it. This means that we also need to transform the current thread into a fiber. Because, if we don't, when we switch to a fiber we will never be able to return to the main function, it will never be saved to a fiber.
To solve this problem, we allocate a new fiber and treat it like it was always there.
Getting the currently running fiber
We need a way to get the current fiber, so we can know which fiber is currently running when we try to switch to another one.
For that, we are storing the currently running Fiber offset in the
fiber_table in a global variable.
We can then get the current fiber by doing:
There is also the necessity to change the
make_current_as_fiber function to set the
fiber_current_table_id to the current
running fiber id:
Getting a fiber from its id
This will be used by the scheduler, and by the user to get a fiber from its id.
Creating a fiber
Now that we have some utility functions, we can start creating a fiber.
First, we allocate a new fiber and initialize it:
- We set its state to
- We set its member's value like its function and its arguments.
- We allocate a stack for the fiber.
- Generally, you want a stack that is at least
4096bytes, but in some cases, it's better to have a bigger stack. In my case, I used
- Generally, you want a stack that is at least
- We put the
next_fiber_uidas the fiber id. Then we increase the global variable by one.
Then you need to initialize the fiber context:
- The stack pointer is at the top of the stack, so we need to set the stack pointer to the end of the stack. That's why we add the
FIBER_STACK_SIZEto the pointer. (Because CPUs stacks are growing downwards).
- You also need to set the base pointer (
rbp) to the end of the stack.
x86_fcwshould be set to their default value at reset.
mxcsris the x86 control and status register it should be set to
x86_fcwis the x86 control word it should be set to
- And for last you need to set the instruction pointer (
rip) to a wrapper function that we will call
If you want to know more about the mxcsr and x86_fcw, you can read the intel manual volume 1 - 11.6.4 and 8.1.5. It's just bits flipped to their default value.
But why do we use
_fiber_entry for the instruction pointer? and not the
It's because there is the necessity to set up and clean up the fiber state, and
_fiber_entry would be responsible for that.
You also have this sort of wrapper function for the
main function in C. Generally it's called
_entry and it's responsible for calling the
main function and cleaning up the state of the program.
In the end, you should have a function that looks like this:
_fiber_entry function is responsible for calling the fiber function and cleaning up the fiber state.
For now, it's just a simple function that calls the fiber function and set the fiber state to
FIBER_STATE_DEAD. But later, we will add more stuff to it.
Note: you'll se later why we need to set the fiber state to
The scheduler is responsible for switching between fibers, but also for selecting which fiber to switch to.
We will implement a round-robin scheduler, that will select which fiber is the next fiber to run. A round-robin is a scheduling algorithm that will switch between fibers in a circular fashion. That means that the next fiber will be the one after the current fiber in the fiber table, and if the current fiber is the last one, the next fiber will be the first one. There is no concept of priority in a round-robin scheduler, so it's not the best scheduling algorithm, but it's simple to implement and it's good enough for our use case. Some scheduling algorithms are more efficient, but they are generally more complex to implement, and some of them are too complex for the intended use of fibers (which may be to rapidly switch between tasks).
Find the next fiber to run
So now, we need a function that, with a fiber id, will return the next fiber to run.
Our algorithm will work as follow:
start_frombe the table index of the current fiber.
- Search through
start_fromto the end of the fiber table for a fiber that is in the
- If we didn't find any fiber, retry but this time searches from the beginning of the fiber table.
- If we didn't find any fiber, return
Switching between fibers
Now that we have a function that will return the next fiber to run, we need to switch between fibers.
We will implement the
This function will link the scheduler and the fiber switching function:
Note: You may spot issues with this function, and you are right, there are issues with this function. And we will address them later.
And that's it for this part, you should now have a working fiber library that can create fibers and switch between them.
In the next part, we will implement a way to have a way to block fiber, we will fix issues (such as some memory leaks), a way to join fibers and much more!
That's one of the biggest articles I've written, It may be a bit too long, but I wanted to make sure that it was clear enough for everyone. I hope you enjoyed it, and I hope you learned something new! Have a great day!
Note: If you want to see the full code, you can find it on github. It's finalized, so if you can't wait for the next part (or if I'm too slow), you can check it out.