CS 134

Starvation

  • Pig speaking

    This sounds like a problem I want to avoid at all costs.

Starvation is a problem that can occur when a thread is perpetually denied access to a resource it needs to make progress. This can happen when a thread is always preempted by other threads that are also trying to access the same resource.

  • Cat speaking

    But wait, when we defined the critical section problem, we had a rule, bounded waiting, that said that a thread that requests access to a critical section should eventually be able to enter it. How can starvation happen if we have this rule?

  • PinkRobot speaking

    Well, in theory that rule does help, but let's contrast theory with practice…

Example: Two Threads Competing for a Resource

In this code, we use a semaphore to protect access to a gourmet buffet. Two threads, a hungry thread and a greedy thread, compete for access to the buffet. Both threads

  • Acquire the semaphore to access the buffet.
  • Print a message to the console (indicating that they are eating at the buffet).
  • Release the semaphore.

Each thread also waits for 10ms during the process (so we aren't overwhelmed with messages).

The main driver code starts both threads, waits for two seconds, and then ends the experiment by canceling both threads (you could increase this time for a longer experiment).

Here's the actual code:

#include <assert.h>
#include <pthread.h>
#include <fcntl.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define WRITE_STRING(str) write(STDOUT_FILENO, str "\n", sizeof(str))

sem_t *food_sem;    // Imagine this protects access to a gormet buffet

void *
hungry_thread(void *arg)
{
        (void) arg;
        for (;;) {
                sem_wait(food_sem);
                WRITE_STRING("I am the hungry thread, eating...");
                sem_post(food_sem);
                usleep(10000);  // wait for 10ms
        }
}

void *
greedy_thread(void *arg)
{
        (void) arg;
        for (;;) {
                sem_wait(food_sem);
                WRITE_STRING("I am the greedy thread, eating...");
                usleep(10000);  // wait for 10ms
                sem_post(food_sem);
        }
}

static
const char *FOOD_SEM_NAME = "buffet-table";

int
main()
{
        pthread_t greedy_id, hungry_id;
        int err;

        sem_unlink(FOOD_SEM_NAME);
        food_sem = sem_open(FOOD_SEM_NAME, O_CREAT, 0600, 1);
        sem_unlink(FOOD_SEM_NAME);
        if (food_sem == (sem_t *) SEM_FAILED) {
                perror("sem_open failed");
                exit(1);
        }
        err = pthread_create(&greedy_id, NULL, greedy_thread, NULL);
        assert(err == 0);
        err = pthread_create(&hungry_id, NULL, hungry_thread, NULL);
        assert(err == 0);
        WRITE_STRING("Threads created, parent sleeping...");
        usleep(2000000);  // wait for two seconds for experiment to run
        WRITE_STRING("Parent waking up, stopping experiment...");
        pthread_cancel(hungry_id);
        pthread_cancel(greedy_id);
        pthread_join(hungry_id, NULL);
        pthread_join(greedy_id, NULL);
        sem_close(food_sem);
        WRITE_STRING("Threads joined, parent exiting...");
        return 0;
}

(We're not going to try to run this code yet, just look at it.)

  • Horse speaking

    Hay! Why did you use sem_open and sem_close instead of sem_init and sem_destroy?

  • PinkRobot speaking

    Two reasons. First, POSIX provides two slightly different ways to make a semaphore. The sem_open and sem_close functions are used when you want to share a named semaphore between two (or more) separate programs. (You can use sem_unlink to remove that semaphore name when you're done using it; kind of like deleting a file when you no longer need it.) The name lets the two programs find the same semaphore.

    The sem_init and sem_destroy functions are used when you only need the semaphore within a single process.

  • Hedgehog speaking

    Now I'm even more confused! We're not using multiple programs or processes, so why not just use sem_init and sem_destroy?

  • PinkRobot speaking

    That leads us to the second reason. The sem_init and sem_destroy functions aren't available on all systems, so using sem_open and sem_close is more portable. In particular, sem_init and sem_destroy don't work on Macs.

  • Duck speaking

    I'm still a bit lost. Back to named semaphores. Why have them? When would two programs want to share the same semaphore? And where would it live?

  • PinkRobot speaking

    Gah.

    The real semaphore itself would probably be in inside the kernel so it could be shared across processes. But in reality, almost no one wants or uses this off-beat feature.

  • L-Chippy speaking

    I think Prof. Melissa so wishes the Mac supported sem_init and sem_destroy.

In a system that was fair, what would you expect to see in the output of this program?

Now, let's try running it...

What do you see in the output when you really run the code?

The output ought to look something like

I am the greedy thread, eating...
Threads created, parent sleeping...
I am the greedy thread, eating...
   ... repeats 195 times ...
I am the greedy thread, eating...
Parent waking up, stopping experiment...
I am the greedy thread, eating...
Threads joined, parent exiting...

So we can see that Linux is failing to live up to the bounded waiting rule. The hungry thread is being starved—it never gets to eat.

  • Duck speaking

    I noticed that the code for the greedy thread is different from the code for the hungry thread: the usleep is in a different place. I tried moving the usleep after the sem_post in the greedy thread and reran the code and the results seemed a lot fairer.

  • PinkRobot speaking

    That experiment is interesting! It might help us realize how Linux goes wrong here.

Try it yourself!

Fork the code in OnlineGDB so you can edit it, then change the greedy thread's code so that the usleep call comes after the sem_post and see if that changes the behavior.

What we've seen is that Linux can, with suitably contrived code, starve one thread. Can you explain what it is about the original code that causes this starvation?

  • Cat speaking

    The problem is that the greedy thread is holding the semaphore almost all the time. It releases it and then immediately reacquires it, so the hungry thread has almost no chance to get the semaphore.

  • Duck speaking

    That's part of it, but it's worse! The greedy thread's usleep is kinda sneaky. By sleeping just before we release the semaphore, it means that the scheduler must have just woken our thread up. So it isn't going to switch to another thread any time soon, so that really nails excluding the hungry thread from ever getting a chance.

  • PinkRobot speaking

    That's right.

  • Goat speaking

    Meh. Life isn't fair, get used to it.

  • PinkRobot speaking

    Actually…

If we run our original code on a Mac, we see

Threads created, parent sleeping...
I am the hungry thread, eating...
I am the greedy thread, eating...
I am the greedy thread, eating...
I am the hungry thread, eating...
I am the greedy thread, eating...
I am the greedy thread, eating...
I am the hungry thread, eating...
I am the greedy thread, eating...
    ... more lines ...
I am the hungry thread, eating...
I am the greedy thread, eating...
I am the hungry thread, eating...
I am the greedy thread, eating...
Parent waking up, stopping experiment...
Threads joined, parent exiting...

Here, the hungry thread gets to eat about 150 times, and the greedy thread gets to eat about 160 times, or roughly 50/50. So it is possible to write a fairer semaphore implementation that adheres fully to the bounded-waiting rule.

  • Hedgehog speaking

    So why does Linux get it wrong?

  • Goat speaking

    Meh. Linux is worse because Worse Is Better, didn't you know that?

  • PinkRobot speaking

    Actually, Goat is kinda right here. To produce the unfair behavior, we wrote slightly contrived code that isn't very common in practice. Usually people don't release locks only to immediately reacquire them, and they don't put sleeps or thread_yield calls right before releasing the lock just to get an unfair advantage.

  • BlueRobot speaking

    So Linux is optimized for the common case, where threads are trying to do useful work, not for the case where they're trying to cheat.

  • PinkRobot speaking

    By ignoring this edge case, Linux is simpler, and possibly faster.

  • Hedgehog speaking

    I still think it's wrong.

What do you think, is it okay for Linux to starve one thread in this case?

  • Cat speaking

    Wait, so how does the Mac get it right?

  • PinkRobot speaking

    We'll leave that as something for you to think about.

  • BlueRobot speaking

    When you write your code for implementing locks in OS/161, you'll have to decide whether or not you want to address this issue. And if you do, you'll have to figure out how…

But there are other ways processes can end up starving…

(When logged in, completion status appears here.)