Starvation
This sounds like a problem I want to avoid at all costs.
Starvation is a problem that can occur when a thread is perpetually denied access to a resource it needs to make progress. This can happen when a thread is always preempted by other threads that are also trying to access the same resource.
But wait, when we defined the critical section problem, we had a rule, bounded waiting, that said that a thread that requests access to a critical section should eventually be able to enter it. How can starvation happen if we have this rule?
Well, in theory that rule does help, but let's contrast theory with practice…
Example: Two Threads Competing for a Resource
In this code, we use a semaphore to protect access to a gourmet buffet. Two threads, a hungry thread and a greedy thread, compete for access to the buffet. Both threads
- Acquire the semaphore to access the buffet.
- Print a message to the console (indicating that they are eating at the buffet).
- Release the semaphore.
Each thread also waits for 10ms during the process (so we aren't overwhelmed with messages).
The main driver code starts both threads, waits for two seconds, and then ends the experiment by canceling both threads (you could increase this time for a longer experiment).
Here's the actual code:
#include <assert.h>
#include <pthread.h>
#include <fcntl.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define WRITE_STRING(str) write(STDOUT_FILENO, str "\n", sizeof(str))
sem_t *food_sem; // Imagine this protects access to a gormet buffet
void *
hungry_thread(void *arg)
{
(void) arg;
for (;;) {
sem_wait(food_sem);
WRITE_STRING("I am the hungry thread, eating...");
sem_post(food_sem);
usleep(10000); // wait for 10ms
}
}
void *
greedy_thread(void *arg)
{
(void) arg;
for (;;) {
sem_wait(food_sem);
WRITE_STRING("I am the greedy thread, eating...");
usleep(10000); // wait for 10ms
sem_post(food_sem);
}
}
static
const char *FOOD_SEM_NAME = "buffet-table";
int
main()
{
pthread_t greedy_id, hungry_id;
int err;
sem_unlink(FOOD_SEM_NAME);
food_sem = sem_open(FOOD_SEM_NAME, O_CREAT, 0600, 1);
sem_unlink(FOOD_SEM_NAME);
if (food_sem == (sem_t *) SEM_FAILED) {
perror("sem_open failed");
exit(1);
}
err = pthread_create(&greedy_id, NULL, greedy_thread, NULL);
assert(err == 0);
err = pthread_create(&hungry_id, NULL, hungry_thread, NULL);
assert(err == 0);
WRITE_STRING("Threads created, parent sleeping...");
usleep(2000000); // wait for two seconds for experiment to run
WRITE_STRING("Parent waking up, stopping experiment...");
pthread_cancel(hungry_id);
pthread_cancel(greedy_id);
pthread_join(hungry_id, NULL);
pthread_join(greedy_id, NULL);
sem_close(food_sem);
WRITE_STRING("Threads joined, parent exiting...");
return 0;
}
(We're not going to try to run this code yet, just look at it.)
Hay! Why did you use
sem_openandsem_closeinstead ofsem_initandsem_destroy?
Two reasons. First, POSIX provides two slightly different ways to make a semaphore. The
sem_openandsem_closefunctions are used when you want to share a named semaphore between two (or more) separate programs. (You can usesem_unlinkto remove that semaphore name when you're done using it; kind of like deleting a file when you no longer need it.) The name lets the two programs find the same semaphore.The
sem_initandsem_destroyfunctions are used when you only need the semaphore within a single process.
Now I'm even more confused! We're not using multiple programs or processes, so why not just use
sem_initandsem_destroy?
That leads us to the second reason. The
sem_initandsem_destroyfunctions aren't available on all systems, so usingsem_openandsem_closeis more portable. In particular,sem_initandsem_destroydon't work on Macs.
I'm still a bit lost. Back to named semaphores. Why have them? When would two programs want to share the same semaphore? And where would it live?
Gah.
The real semaphore itself would probably be in inside the kernel so it could be shared across processes. But in reality, almost no one wants or uses this off-beat feature.
I think Prof. Melissa so wishes the Mac supported
sem_initandsem_destroy.
Now, let's try running it...
The output ought to look something like
I am the greedy thread, eating...
Threads created, parent sleeping...
I am the greedy thread, eating...
... repeats 195 times ...
I am the greedy thread, eating...
Parent waking up, stopping experiment...
I am the greedy thread, eating...
Threads joined, parent exiting...
So we can see that Linux is failing to live up to the bounded waiting rule. The hungry thread is being starved—it never gets to eat.
I noticed that the code for the greedy thread is different from the code for the hungry thread: the
usleepis in a different place. I tried moving theusleepafter thesem_postin the greedy thread and reran the code and the results seemed a lot fairer.
That experiment is interesting! It might help us realize how Linux goes wrong here.
Try it yourself!
Fork the code in OnlineGDB so you can edit it, then change the greedy thread's code so that the usleep call comes after the sem_post and see if that changes the behavior.
The problem is that the greedy thread is holding the semaphore almost all the time. It releases it and then immediately reacquires it, so the hungry thread has almost no chance to get the semaphore.
That's part of it, but it's worse! The greedy thread's
usleepis kinda sneaky. By sleeping just before we release the semaphore, it means that the scheduler must have just woken our thread up. So it isn't going to switch to another thread any time soon, so that really nails excluding the hungry thread from ever getting a chance.
That's right.
Meh. Life isn't fair, get used to it.
Actually…
If we run our original code on a Mac, we see
Threads created, parent sleeping...
I am the hungry thread, eating...
I am the greedy thread, eating...
I am the greedy thread, eating...
I am the hungry thread, eating...
I am the greedy thread, eating...
I am the greedy thread, eating...
I am the hungry thread, eating...
I am the greedy thread, eating...
... more lines ...
I am the hungry thread, eating...
I am the greedy thread, eating...
I am the hungry thread, eating...
I am the greedy thread, eating...
Parent waking up, stopping experiment...
Threads joined, parent exiting...
Here, the hungry thread gets to eat about 150 times, and the greedy thread gets to eat about 160 times, or roughly 50/50. So it is possible to write a fairer semaphore implementation that adheres fully to the bounded-waiting rule.
So why does Linux get it wrong?
Meh. Linux is worse because Worse Is Better, didn't you know that?
Actually, Goat is kinda right here. To produce the unfair behavior, we wrote slightly contrived code that isn't very common in practice. Usually people don't release locks only to immediately reacquire them, and they don't put sleeps or
thread_yieldcalls right before releasing the lock just to get an unfair advantage.
So Linux is optimized for the common case, where threads are trying to do useful work, not for the case where they're trying to cheat.
By ignoring this edge case, Linux is simpler, and possibly faster.
I still think it's wrong.
Wait, so how does the Mac get it right?
We'll leave that as something for you to think about.
When you write your code for implementing locks in OS/161, you'll have to decide whether or not you want to address this issue. And if you do, you'll have to figure out how…
But there are other ways processes can end up starving…
(When logged in, completion status appears here.)