CS 134

Other Bugs

Deadlock is, in some sense, a nicer kind of bug. When it happens, it's pretty obvious (as all the processes are stuck), and we can write algorithms to detect and even avoid deadlocks in the first place. But there are other bugs that can happen in concurrent programs that are much harder to detect and fix. We'll look at two of them here: atomicity violations and order violations.

Atomicity Violations

An atomicity violation is a bug that occurs when a program assumes that an operation is atomic (i.e., it happens all at once) when it's not. We talked about atomicity when we discussed the critical-section problem and introduced locks (a.k.a. mutexes) and semaphores as solutions, as well as atomic operations such as test-and-set.

Unfortunately, programmers often make mistakes in their assumptions about atomicity., such as thinking they can “get away with” not using a lock because they believe they understand all the cases that could occur. In fact, figuring out what can happen is extraordinarily difficult for two reasons:

  • Humans are terrible at reasoning about concurrency.
  • Modern computers don't work the way humans think they do.
  • Duck speaking

    What do you mean?

  • PinkRobot speaking

    Let's look at an example…

Dekker's Example

Let's start with the following code:

int x = 0, y = 0;

Then we have two threads,

Alex

A1 | x = 1;
A2 | int r1 = y;

Brin

B1 | y = 1;
B2 | int r2 = x;

Carefully work out the possible final values for r1 and r2.

What are possible final values for r1 and r2?

Write your answer like

r1=NN, r2=NN — A1, A2, B1, B2
r1=MM, r2=MM — A1, B1, A2, B2

After answering the first part, answer this question: Could r1 and r2 both be zero?

Here's a program you can run to see the possible valid interleavings:

You'll see that there are six possible interleavings that can occur.

Possible outcomes:

r1 = 0, r2 = 1, x = 1, y = 1: 1 interleavings
  - [A1] x = 1; [A2] r1 = y; [B1] y = 1; [B2] r2 = x

r1 = 1, r2 = 1, x = 1, y = 1: 4 interleavings
  - [A1] x = 1; [B1] y = 1; [A2] r1 = y; [B2] r2 = x
  - [A1] x = 1; [B1] y = 1; [B2] r2 = x; [A2] r1 = y
  - [B1] y = 1; [A1] x = 1; [A2] r1 = y; [B2] r2 = x
  - [B1] y = 1; [A1] x = 1; [B2] r2 = x; [A2] r1 = y

r1 = 1, r2 = 0, x = 1, y = 1: 1 interleavings
  - [B1] y = 1; [B2] r2 = x; [A1] x = 1; [A2] r1 = y

Total interleavings: 6
  • Duck speaking

    Oh, yeah! I was totally right! I'm awesome!

  • PinkRobot speaking

    Well, before you celebrate too much, let's think about some of the interleavings we've rejected as invalid.

In OnlineGDB, in the bottom pane, press enter to exit the console, and then add the command-line argument --allow-order-violation to see all 24 conceivable interleavings. Now you can see the six interleavings that would produce r1=0, r2=0.

r1 = 0, r2 = 0, x = 1, y = 1: 6 interleavings
  - [A2] r1 = y; [B1] y = 1; [B2] r2 = x; [A1] x = 1
  - [A2] r1 = y; [B2] r2 = x; [A1] x = 1; [B1] y = 1
  - [A2] r1 = y; [B2] r2 = x; [B1] y = 1; [A1] x = 1
  - [B2] r2 = x; [A1] x = 1; [A2] r1 = y; [B1] y = 1
  - [B2] r2 = x; [A2] r1 = y; [A1] x = 1; [B1] y = 1
  - [B2] r2 = x; [A2] r1 = y; [B1] y = 1; [A1] x = 1
  • Horse speaking

    Hay! You can't run A2 before A1! Or B2 before B1! That's not how the code is written! Computers don't work that way!

  • PinkRobot speaking

    Alas, it is.

Issue 1: Compiler Reordering

When a compiler sees code like

x = 1;
r1 = y;

it considers the two statements to be independent of each other—each accesses completely different, unrelated things. So the compiler can reorder them as it sees fit, and might generate code like

    movl    y, %eax         ; Read memory location y into register %eax
    movl    $1, x           ; Write 1 to memory location x
    movl    %eax, r1        ; Write the value in %eax to memory location r1

From the compiler's perspective, this code is fine. Finding an instruction to put between a memory load and a memory store can compensate for the delays in memory access. And for most programs, this kind of optimization is a good thing, because in most programs, accesses that look independent are independent.

But that's not the case for concurrent code where other threads can change the values of x and y. In that case, the reordering can lead to bugs.

  • Hedgehog speaking

    Gah! I had no idea the compiler could do that! If it doesn't think I'll be able to notice, it does things totally differently from what I wrote!

  • Rabbit speaking

    Fun fact! In C, if you write printf("Hello World\n"); the compiler is actually allowed to rewrite it to puts("Hello World"); because it's a more efficient way to do the same thing. But you don't care because the output is the same.

  • Duck speaking

    Unless I set a breakpoint on printf in gdb and I'm totally puzzled why it never gets hit!!

  • Goat speaking

    Meh. The moral is don't try to debug your code. Just write it right the first time.

  • PinkRobot speaking

    I don't think that's the moral here. The moral is that you need to be aware of what the compiler is allowed to do, and that you need to be careful when writing concurrent code.

  • Duck speaking

    But doesn't the volatile keyword fix all this?

  • PinkRobot speaking

    The volatile keyword can indeed prevent the compiler from reordering memory accesses, but it doesn't prevent the hardware from doing so. Let's see how that can happen…

Issue 2: Store Buffers

Hardware designers try to deal with the slowness of computer memory by using store buffers. When a store instruction is executed, the data is placed in a store buffer, and the store buffer is responsible for writing the data to memory. This method allows the CPU to continue executing instructions while the store buffer is writing to memory.

On a single core, special logic is put in place so that if the code tries to read something from memory that is actually in the store buffer, the CPU will read the value from the store buffer instead of memory. That's store-to-load forwarding.

And that all works great and improves performance—so long as we're not executing concurrent code. In that case, using the store buffer can lead to surprising results: because other cores can't see our store buffer, they'll just read the (stale) value from memory.

The Big Lesson

Programs don't execute the way you think they do. The optimizations made are designed for the common case—sequential programs—and they can lead to surprising results in concurrent programs. Unless you want to take a deep dive into the hardware and compiler, attempting to reason out why it's okay not to use a lock is almost always a bad idea.

  • L-Chippy speaking

    You might remember that long comment in lock_do_hold from the Prof. Melissa patch. It gave a detailed explanation as to why it was okay to not use a lock in that case. Some of you felt it was overkill, but actually Prof. Melissa still worried that she might be missing something. And she was right to worry. It's really hard to reason about concurrency.

  • PinkRobot speaking

    In other words, it wasn't that the comment was too much—it was actually barely enough.

Sequential Consistency

Modern systems make as few promises as possible about the order of operations. What programmers desire is sequential consistency, where final outcome is as if operations of all processors are executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. Sequential consistency still allows many possible interleavings, but at least it prohibits some of the more surprising ones.

You only get sequential consistency if you carefully follow the rules of synchronization. And the short version is that, in practice, you're only following the rules if you're using locks.

  • Duck speaking

    What about atomic operations like test-and-set?

  • PinkRobot speaking

    Those have rules too, but for any atomic instruction, you still need to read up on what it promises.

  • BlueRobot speaking

    We don't want to get too far into the weeds here, so we won't dive into all the complexities of memory ordering in that case, but, in short, if you want to use atomic operations, you may need to take a deeper dive into the world of memory ordering.

Order Violations

The second kind of bug we'll look at is order violations. These are bugs that occur when the order of operations is not what you expect. We've already seen an example of this issue in Dekker's example, where the compiler and hardware can reorder instructions in ways that are surprising.

Let's look at an example of a programmer-generated order violation in the cats and mice problem we saw in Homework 3. Let's suppose that in the main thread we had

lock_acquire(animal_lock);
while (active_animals > 0) {
    cv_wait(animal_cv, animal_lock);
}
lock_release(animal_lock);

and in each animal thread, at startup, we had

lock_acquire(animal_lock);
active_animals++;
cv_signal(animal_cv, animal_lock);  // Unnecessary, but it changed, so signal
lock_release(animal_lock);

and, on exit,

lock_acquire(animal_lock);
active_animals--;
cv_signal(animal_cv, animal_lock);  // We need this to signal the main thread
lock_release(animal_lock);

We could run this code and have it appear to work as we'd expect. But there is a bug here.

What is the bug in this code?

  • Cat speaking

    I think I see what's going on. The main thread is assuming that by the time it reaches the while loop, at least one animal will have started. But there's nothing in the code that guarantees that, so if no animal has started yet, the main thread will exit the loop immediately, and the program will exit before any of the animals even get started.

  • Hedgehog speaking

    So the main thread is assuming an order of operations that isn't guaranteed by the code?

  • PinkRobot speaking

    Exactly!

These kinds of bugs are subtle and hard to detect. They can be especially hard to detect because they can be timing dependent. On a multicore machine, it's far more likely that a child thread will have started before the main thread hits the while loop. But on a single-core machine, it's quite possible that the main thread will exit the loop before any of the child threads have been started.

Again, the lesson here is to use your synchronization primitives carefully. If something needs to wait for something else to have happened, you need to make that wait explicit in your code.

(When logged in, completion status appears here.)