CS 134

Threads

  • Dog speaking

    We've talked about processes, but what about threads?

  • PinkRobot speaking

    Good question. Let's begin with a bit of history.

We can think of processes as an abstraction that creates an idealized machine for running a program. In the early days of computing, these idealized machines were inspired by the actual machines of the day, which were uniprocessors (i.e., single core). So it made sense to have a single execution state per process.

Quickly enough, multi-processor machines arrived, but these were initially rare and expensive, and the idea that programmers might want the complexity that arises from multiple execution states per process was not immediately obvious. If programmers wanted multiple tasks to run concurrently, they could always create multiple processes, and various mechanisms were devised to let processes communicate and synchronize with each other.

  • Rabbit speaking

    Special mechanisms were even created to allow processes to share memory, so that they could communicate more easily.

But programming-language designers liked the idea of providing a more lightweight way to run multiple tasks concurrently within the same program (and thus within the same process). That goal led to the development of threads.

Green Threads

In the early days of threads, they were implemented entirely in user space, without any support from the operating system's kernel. User-space threads are sometimes called green threads (or virtual threads).

  • Pig speaking

    I want to know MORE about why they called them green threads. Are they plant-based? (Yuck!)

  • Jo speaking

    That was the name Java used for its user-space threads, because they were designed by the “Green Team” at Sun Microsystems!! The name stuck, and it's now used more generally to refer to user space threads!!

Here's some example code showing a simple implementation of user-space threads from within a program:

#include <stdio.h>
#include <stdlib.h>
#include <ucontext.h>
#include <signal.h>
#include <unistd.h>
#include <sys/time.h>

#define STACK_SIZE 16384 /* Stack size for each thread */

/* Contexts for main and two threads */
ucontext_t contexts[3];           /* 0: main, 1: thread 1, 2: thread 2 */
int current_thread = 1;           /* Tracks the current thread */
volatile int num_threads = 0;     /* Number of threads */

void
scheduler(int signum)
{
        if (num_threads < 2) {
                printf("Too few threads to schedule\n");
                return;
        }
        int prev_thread = current_thread;

        /* Toggle between thread 1 and thread 2 */
        current_thread = (current_thread == 1) ? 2 : 1;
        printf("-- Switching from thread %d to thread %d\n", prev_thread, 
                current_thread);

        /* Swap context to the next thread */
        swapcontext(&contexts[prev_thread], &contexts[current_thread]);
}

void
thread1(void)
{
        for (int i = 1; i <= 42; i++) {
                printf("Thread 1: %d\n", i);
                usleep(100000); /* Simulate work */
        }
        printf("Thread 1 exiting\n");
        --num_threads;
}

void
thread2(void)
{
        for (int i = 1; i <= 42; i++) {
                printf("Thread 2: %d\n", i);
                usleep(100000); /* Simulate work */
        }
        printf("Thread 2 exiting\n");
        --num_threads;
}

void
newthread(void (*entrypoint)(void), ucontext_t *contextptr)
{
        void *stack = malloc(STACK_SIZE);
        if (stack == NULL) {
                perror("malloc");
                exit(EXIT_FAILURE);
        }
        if (getcontext(contextptr) == -1) {
                perror("getcontext");
                exit(EXIT_FAILURE);
        }
        contextptr->uc_stack.ss_sp = stack;
        contextptr->uc_stack.ss_size = STACK_SIZE;
        contextptr->uc_link = &contexts[0]; /* Return to main if thread exits */
        makecontext(contextptr, entrypoint, 0);
        ++num_threads;
}

int
main(void)
{
        struct sigaction sa;

        newthread(thread1, &contexts[1]);
        newthread(thread2, &contexts[2]);

        /* Set up the SIGALRM signal handler */
        sa.sa_handler = scheduler;
        sigemptyset(&sa.sa_mask);
        sa.sa_flags = SA_RESTART; /* Restart interrupted system calls */
        if (sigaction(SIGALRM, &sa, NULL) == -1) {
                perror("sigaction");
                exit(EXIT_FAILURE);
        }

        /* Configure the timer to trigger SIGALRM every 0.5 seconds */
        ualarm(500000, 500000);

        /* Start the first thread */
        current_thread = 1;
        swapcontext(&contexts[0], &contexts[1]);

        /* If our scheduler wasn't running, we'd run the first thread all 
         * the way to completion and but since it is, both threads will run 
         * concurrently.  When one of them exits (we don't know which one will
         * exit first), we'll return here. */

        /* Finish the other thread thread */
        current_thread = (current_thread == 1) ? 2 : 1;
        swapcontext(&contexts[0], &contexts[current_thread]);

        printf("Main thread exiting\n");
        return 0;
}

This code uses

  • makecontext, swapcontext, and getcontext (provided by the C library) to manipulate the execution context of the program. Importantly, these functions work just by saving and restoring the CPU registers, so they don't require any special support from the operating-system kernel itself.
  • ualarm to set a timer that will interrupt the program and switch to a different thread by calling the scheduler function.
  • thread1 and thread2 each count up to 42 before quitting.

This code is just a simple demonstration. We could write a more full-featured scheduler that can deal with more than two threads, and we could add synchronization primitives to allow threads to communicate and synchronize with each other. In fact, the code would look a lot like what you've already seen and implemented in OS/161.

Run the code above in Online GDB. Think a little about what it does. What's the biggest issue you notice with this approach to threads?

  • Cat speaking

    It feels like there's a bunch of code duplication to me. We've written a scheduler, context-switching code, and so on that mirrors what already exists in the operating system kernel. I already wrote locks and CVs for OS/161; I don't want to write them again!

  • Duck speaking

    And there can only ever be one thread running at a time, since the program is interrupted by the timer and switches to a different thread. So it can't take advantage of a multicore machine!

  • PinkRobot speaking

    These are good points!

  • Horse speaking

    Hay! You forgot to free the memory allocated for the stacks. That's a memory leak!

  • PinkRobot speaking

    True, but the program was exiting anyway, and keeping track of the memory would have made the code even more complicated. That issue isn't one we were worried about here, but it's good that you noticed it.

  • Rabbit speaking

    I've got one: What if a thread blocks because it's doing I/O? Either the timer is going to interrupt the I/O, which doesn't seem good, or the thread will be stuck until the I/O operation is complete and thus prevent any other threads from running, which also doesn't seem good.

  • PinkRobot speaking

    And those issues are all due to the kernel not knowing that there are multiple threads running. It's just the program that's doing the switching.

For some time, these user-space threads were the only game in town, and even after alternatives arrived, they remained popular because they were lightweight and less prone to race conditions (precisely because nothing actually happens simultaneously).

This approach was used by

  • C (via the cthreads library, which later inspired the pthreads library);
  • Early versions of Java;
  • Python (via the greenlet library);
  • Early versions of Go (at least if you didn't adjust the MAXPROCS environment variable);

and many others.

Native Threads

Eventually (mostly in the 1990s), operating systems began to provide support for threads in the kernel itself. These threads are called native threads.

But because having multiple execution states per process was a significant change, and support needed to be added to preexisting operating systems, adding support for native threads required retrofitting this new functionality into existing operating systems, which meant that prior design decisions made in those OSs had to be revisited.

Let's consider the task of adding threading to an existing operating system that previously only supported processes. For each of these per-process attributes, decide whether they should be per-process or per-thread. Check the box if it should be per-thread, and leave it unchecked if it should be per-process.

Use the space below to explain your thinking for any interesting cases.

Things that Need to be Per-Thread

  • CPU registers: Each thread needs its own execution state.
  • Stack: Each thread needs its own stack (for local variables and function calls).
  • Thread state: Each thread needs its own state (e.g., ready, running, waiting).

Things that Need to be Per-Process

  • Process ID (PID): A unique identifier for the process as a whole.
  • Memory for program code: The program code is shared among all threads in the process.
  • Memory for program data: The program data is shared among all threads in the process.
  • Duck speaking

    Couldn't we make those things per-thread too, for maximum flexibility?

  • PinkRobot speaking

    Well, if we did that, our “threads” would basically just be old-school processes, and we'd lose the benefits of having proper threads. The whole point of threads is that they're inside the same process, so they can share memory and communicate more easily.

  • Rabbit speaking

    That said, sometimes threads do need their own specific data. Although that can often be done via local variables and function arguments, sometimes it's more convenient to have a per-thread global data structure, so it's nice to have the option, which means it's something to add to our per-thread list.

Things that Unix Decided to Share

Many of the remaining things on our list are shared among all threads in a process in Unix-like operating systems, including

  • File descriptors: All threads in a process share the same set of open files.
  • User and Group ID: All threads in a process run with the same user and group permissions.
  • Current working directory: All threads in a process share the same working directory.
  • Root directory: All threads in a process share the same root directory.
  • Signals waiting; signal mask.
  • Time of next alarm signal.
  • Horse speaking

    So, like, if one thread changes the current working directory, all the other threads in the process will see the change?

  • PinkRobot speaking

    That's right. In practice, this sharing means that the chdir system call is off-limits in a multithreaded program, because it changes the CWD for everyone and would likely lead to chaos. In some situations (e.g., web servers) this issue can require the use of long file paths to avoid changing the working directory.

  • Duck speaking

    Maybe it would have been better to make the current working directory per-thread?

  • PinkRobot speaking

    Possibly the whole concept of a single “current working directory” is a bit of a historical artifact, and it would have been better to have a more flexible system.

Things that Unix Decided to Make Per-Thread

Unix-like operating systems also have some per-thread attributes, including

  • Priority: Each thread has its own priority.
  • CPU time used: Each thread has its own CPU time usage.
  • Signal mask
  • Horse speaking

    Hay! You said that the signal mask was shared among all threads in a process, but now you're saying it's per-thread. Which is it?

  • PinkRobot speaking

    Actually, it's a mess.

Deeper Dive: Signal Handling

Signals are used in a Unix system for two main purposes:

  • Asynchronous notifications: Signals are used to notify a process that a particular event has occurred, such as a user interrupt (e.g., SIGINT), a child process terminating (e.g., SIGCHLD), or a timer expiring (e.g., SIGALRM).
  • Synchronous exceptions: Signals are used to handle synchronous exceptions, such as illegal memory access (e.g., SIGSEGV) or division by zero (e.g., SIGFPE).

These elements are, in some sense, quite different things, but the Unix developers thought that it was simpler to have a single mechanism for both—a mechanism that was designed before threads were a thing. But the waters got murkier when threads were introduced.

Asynchronous notifications (especially those about external events, such as the user pressing Ctrl-C or a child process terminating) are something the operating system just wants to tell the process as a whole. There is nothing thread-specific about them. So that requirement suggests that signal handling should be done on a per-process basis.

But synchronous exceptions are caused by an error in a specific thread (e.g., a particular thread divides by zero or makes an illegal memory access). So that reasoning suggests that signal handling should be done on a per-thread basis.

As a result, Unix has some rather strange rules about how signals are handled in multithreaded programs. First, if any asynchronous signal is delivered to a process, it is delivered to one thread in the process (which thread is unspecified!). Synchronous signals are delivered to the thread that caused the exception.

There are two signal masks: the process signal mask and the thread signal mask. The process signal mask is used to block signals for the entire process, whereas the thread signal mask is used to block signals for a specific thread.

Although the signal mask has a per-thread option, the signal action (what happens when a signal is delivered) is per-process. So if a signal is delivered to a thread, the same function is always called, regardless of which thread received the signal. We can't have one thread call one function if there is a division by zero, and another thread call a different one.

Overall, it mostly makes sense, but it's a bit of a mess.

  • Duck speaking

    Are there other tough decisions like this that operating systems had to make when adding threading support?

  • PinkRobot speaking

    Yes. Let's think about another one.

In a previous lesson, we talked about the fork() system call, which creates a new process by duplicating the current process. Recall that the code looked like

int pid = fork();
if (pid == 0) {
    // Child process
} else {
    // Parent process
}

So the piece of code that calls fork() gets to easily know whether it's the parent or the child process. But what if there are also other threads running? Those threads might be in the middle of doing something else. Should they be duplicated, too? Should these other threads be told that a fork has happened or left in the dark?

What do you think?

  • Duck speaking

    I think fork should duplicate all the threads. It's supposed to be an exact duplicate of the original and it's hardly an exact duplicate if some threads are missing.

  • Dog speaking

    But wait, what about the other threads? If they're in the middle of something, they might not be in a good state to be duplicated. Like, what if they're writing to a file? Then we'll get two sets of writes, which is probably not what we want.

  • PinkRobot speaking

    Good point, Dog. That does seem like an issue.

  • Cat speaking

    Okay, so instead we should just duplicate the calling thread. That way, we don't have to worry about the other threads being in a bad state, and we don't have to worry about them being duplicated either.

  • Duck speaking

    But what if one of the other threads is holding a lock? If it suddenly vanishes, the lock will never be released in the forked process.

  • PinkRobot speaking

    That's a problem, too.

  • Dog speaking

    So, what if we just tell the other threads that a fork has happened, and let them decide what to do? Maybe send them a SIGFORK signal or something?

  • PinkRobot speaking

    That's an interesting idea, although the details would need to be worked out carefully.

  • Hedgehog speaking

    Gah, it doesn't feel like any of these options are great.

  • PinkRobot speaking

    Let's look at what POSIX actually does.

Honestly, there are no great options here. POSIX decided to duplicate only the calling thread, and to leave the other threads in the dark that the fork even happened.

Generally, it's considered a bad idea for a multithreaded program to call fork() while other threads are actively working. It's okay, however, if just after forking the child immediately calls exec() to replace its memory space with a new program. That is, after all, the most common use of fork().

  • Cat speaking

    This thread stuff is pretty tricky. But at least we're done. We've seen user-space threads that are lightweight compared to native threads, but can't take advantage of multicore machines. And we've seen native threads, which are heavier but can take advantage of multicore machines, even if they were a bit of a retrofit to existing operating systems.

  • PinkRobot speaking

    But wait, there's more!

Hybrid Approaches

There are also hybrid approaches that try to combine the best of both worlds. We could, for example, have a user-space thread library that uses native threads behind the scenes. Options for doing this vary:

Synchronization in User Space

One place where thread libraries try to be more efficient is in synchronization. For example, if a thread is waiting for a lock, it might busy-wait for a short while before giving up and going to sleep. This approach can be more efficient than making a system call to the kernel to put the thread to sleep and wake it up again.

  • Cat speaking

    But this is code duplication again, right? We're writing our synchronization logic in two places: once in the user-space thread library and once in the kernel.

  • PinkRobot speaking

    That's right. It makes the code more complex and harder to maintain. But it can be more efficient.

N:M Threading

Another approach is to have an N:M threading model, where N user-space threads are mapped to M native (kernel-supported) threads. This approach can be more efficient than a 1:1 threading model, because the user-space threads can be scheduled by the user-space thread library, which can be more efficient than the kernel scheduler.

For example, if we have communication channels, and one thread wants to send a message to another thread and wait for the response, we can do that without involving the kernel at all, switching between the sender and receiver entirely in user space.

This is the approach used by the Go programming language, which has a user-space thread library that maps Go routines to native threads. The environment variable GOMAXPROCS can be used to control how many native threads are used.

  • Rabbit speaking

    Fun fact, GOMAXPROCS used to default to 1, which meant that unless you knew about this environment variable, your Go programs would only ever use one core. Go 1.5 changed the default to the number of cores on the machine.

A similar approach is used by Apple's Grand Central Dispatch (GCD) library, which is used in macOS and iOS, and also behind the scenes to support Swift's concurrency model.

The Windows Thread Pool API, which is used by the .NET framework and other Windows libraries, uses a similar approach.

Takeaways

  • Threads are a way to run multiple tasks concurrently within the same program.
  • User-space threads are lightweight and can be more efficient, but can't take advantage of multicore machines.
    • It's actually not that hard to write a simple user-space–thread library, as we saw in the example code. But writing a full-featured one is quite tricky.
  • Native threads are heavier because every thread action is a system call, but they can take advantage of multicore machines.
  • There are hybrid approaches that try to combine the best of both worlds.
  • There are many design decisions to be made when adding threading support to an existing operating system, and no perfect answers.

(When logged in, completion status appears here.)