CS 134

Load-Linked and Store-Conditional

We've seen how compare-and-swap (CAS) can be a powerful tool for implementing lock-free data structures. But it's not the only game in town. Some architectures, including MIPS (which your OS/161 system uses), provide an alternative pair of instructions: load-linked (LL) and store-conditional (SC).

How LL/SC Works

The load-linked and store-conditional instructions work together:

  1. load-linked loads a value from memory and marks the memory location as “monitored”.
    • Think of it like there is a spy-camera on the memory location, watching to see if anyone else messes with it.
  2. store-conditional attempts to store a value to the monitored location. It succeeds only if the location hasn't been modified since load-linked ran.

Here's some pseudocode to illustrate:

int
load_linked(int *ptr) {
        MEMORY_WATCHER_TRACK(*ptr);
        return *ptr;
}

bool store_conditional(int *ptr, int new_value) {
        switch (MEMORY_WATCHER_STATUS(*ptr)) {
            case MW_MODIFIED:
                return false;
            case MW_SORRY_I_LOST_TRACK:
                return false;
            case MW_UNCHANGED:
                *ptr = new_value;
                return true;
        }
}

(Remember, each of these functions is pseudocode for a single hardware instruction.)

  • Cat speaking

    This seems a lot like compare-and-swap. What's the difference?

  • PinkRobot speaking

    Good question! While they can often be used to solve similar problems, there are some key differences.

  • Dog speaking

    Yeah, I don't see a “compare” part here. Where's the comparison?

  • PinkRobot speaking

    You're right—there is no explicit comparison. Let's dig into this a bit more…

Implementing Compare-and-Swap with LL/SC

You can implement CAS using LL/SC. Here's how you might do it:

bool compare_and_swap(int *ptr, int expected, int new_value) {
    int current = load_linked(ptr);
    if (current == expected) {
        return store_conditional(ptr, new_value);
    }
    return false;
}
  • Cat speaking

    So, LL/SC can do everything CAS can do?

  • PinkRobot speaking

    Yes, and also implement other things like atomic increments and decrements.

  • Duck speaking

    It feels more powerful than CAS.

  • PinkRobot speaking

    It is. LL/SC is more general than CAS, but, perhaps more importantly, it can avoid some of the pitfalls of CAS…

The A–B–A Problem

Let's go back and look at the code we used for our lock-free stack. Although it seems to work, it actually has an insidious bug, due to the highlighted line:

// Helper function to pop an item from the stack
int
pop_item(void)
{
        while (1) {
                struct stack_node *popped_node = atomic_load(&stack_head);
                if (popped_node == NULL) {
                        return NO_DATA; // Stack is empty
                }

                struct stack_node *new_head = popped_node->next;
                if (atomic_compare_exchange_weak(&stack_head, &popped_node,
                                                 new_head)) {
                        int value = popped_node->data;
                        free(popped_node);
                        return value;
                }
                // If CAS failed, another thread modified the stack, so we try
                // again
        }
}

In that line, we think we have the right value for new_head. But let's consider this sequence of events:

  • We read popped_node from stack_head; let's suppose it's at memory address 0x1234.
  • We read popped_node->next, and it's at memory address 0x5678.
  • Then, another thread runs. It
    • Pops a node from the stack, throwing away the node at 0x1234.
    • Pops another node from the stack, throwing away the node at 0x5678, leaving a new stack head of 0x9abc.
    • Pushes a new node onto the stack, where its next pointer is 0x9abc as it should be, but alas, the memory allocator chose to put our new node at location 0x1234 as that location isn't being used.
  • Our thread continues. It does the compare-and-swap, looking to see if 0x1234 is still the head of the stack. It succeeds, replacing the stack head with 0x5678, a deleted node.
  • Dog speaking

    So we looked away, looked back, and everything seemed fine, but it totally wasn't.

  • PinkRobot speaking

    Exactly. This is known as the A–B–A problem. It's a thorny issue with compare-and-swap operations that can lead to subtle bugs like this one.

  • Hedgehog speaking

    Subtle??! I'd never have found that bug! Gah! I'm stressing out just thinking about it! Please don't put bugs like this on the midterm!

  • PinkRobot speaking

    It's good to know these issues exist, but we understand that they can be really hard to see. We're not going to put super tricky bugs like this on the midterm. But we do want you to understand the landscape.

  • Goat speaking

    Meh. Not on the test. I'm going back to sleep.

Why would LL/SC solve the A–B–A problem?

  • Cat speaking

    So it just can't happen with LL/SC, because we'd only succeed if the stack head hadn't been modified, and even setting it to the same value again counts as a modification.

  • PinkRobot speaking

    That's right. The LL/SC pair is designed to avoid the A–B–A problem by ensuring that the store-conditional only succeeds if the memory location hasn't been modified since the load-linked.

  • Horse speaking

    Hay! If it's so great, why don't we use it all the time?

  • PinkRobot speaking

    Good question. Let's look at some of the practical considerations.

Practical Considerations

While LL/SC is theoretically more powerful than CAS, there are some practical limitations.

  1. Hardware Limitations: Many implementations of LL/SC have restrictions. For example, some prohibit any function calls or interrupts between the LL and SC instructions. But that's not really a huge limitation in practice, given that the alternative is a single CAS instruction.
  2. Limited Availability: Not all processors support LL/SC. x86 processors, which are very common, use CAS instead. (On the other hand, ARM, RISC-V, and PowerPC do support LL/SC, as does MIPS, which is what OS/161 is based on.)
  3. C/C++ Support: Standard C and C++ don't provide direct support for LL/SC operations, which makes it less convenient to use LL/SC in portable code.

That said, LL/SC is superior to CAS, and it's good to use when practical.

  • Horse speaking

    Hay! Wait a moment, aren't you going to tell us how to solve A–B–A when all we have is CAS?

  • PinkRobot speaking

    We could, but it's another rabbit hole. We'd have to introduce the concept of “hazard pointers” or deferred reclamation, and that's a bit much for now. We just wanted to show you that there is a problem, and that LL/SC can help avoid it.

LL/SC in OS/161

OS/161 runs on a simulated MIPS32 processor which does support LL/SC, and if you go spelunking in the OS/161 source code, you'll find them used. For example, in kern/arch/mips/include/spinlock.h, you'll see

/*
 * Test-and-set a spinlock_data_t. Use the LL/SC instructions to
 * make it atomic.
 *
 * LL (load linked) loads a machine word from memory, and marks the
 * address. SC (store conditional) stores a machine word to memory,
 * but succeeds only if the address is marked from a previous LL on
 * the same processor. Stores from other processors clear that mark,
 * as do traps on the current processor. Note that there may be no
 * other memory accesses (besides instruction fetches) between the LL
 * and the SC or the behavior is *undefined*. You can only use LL/SC
 * to atomically update one machine word.
 */
SPINLOCK_INLINE
spinlock_data_t
spinlock_data_testandset(volatile spinlock_data_t *sd)
{
        spinlock_data_t x;
        spinlock_data_t y;

        /*
         * Test-and-set using LL/SC.
         *
         * Load the existing value into X, and use Y to store 1.
         * After the SC, Y contains 1 if the store succeeded,
         * 0 if it failed.
         *
         * On failure, return 1 to pretend that the spinlock
         * was already held.
         */

        y = 1;
        __asm volatile(
                ".set push;"            /* save assembler mode */
                ".set mips32;"          /* allow MIPS32 instructions */
                ".set volatile;"        /* avoid unwanted optimization */
                "ll %0, 0(%2);"         /*   x = *sd */
                "sc %1, 0(%2);"         /*   *sd = y; y = success? */
                ".set pop"              /* restore assembler mode */
                : "=&r" (x), "+r" (y) : "r" (sd));
        if (y == 0) {
                return 1;
        }
        return x;
}
  • Horse speaking

    Hay! I just saw that comment. It says you can't access any other memory between the LL and SC. That's a big limitation!

  • PinkRobot speaking

    It is, and it's not one found on ARM, PowerPC, or RISC-V processors. It makes LL/SC weaker on MIPS than on those other architectures, and makes it harder to solve the A–B–A problem.

  • Hedgehog speaking

    Do we need to write assembly code like this in OS/161?

  • PinkRobot speaking

    No, you don't need to write assembly in OS/161.

  • Cat speaking

    So we can use LL/SC in our C code for OS/161?

  • PinkRobot speaking

    We actually recommend you stick to just using locks and condition variables in your code for OS/161. As we've seen, as cool as atomic operations are, they can be tricky to get right. Writing kernel code for OS/161 is hard enough without adding that extra complexity.

  • Rabbit speaking

    There is an atomics.h header that Prof. Melissa could have included in the OS/161 sources, but she decided to leave it out in the spirit of keeping things simple.

  • Duck speaking

    Aww, she made it worse…

  • Goat speaking

    Meh. Worse is better—we all know that.

Takeaways

  • Load-Linked and Store-Conditional (LL/SC) is an alternative to Compare-and-Swap (CAS) for implementing atomic operations.
  • LL/SC can avoid some issues that CAS can run into, such as the A–B–A problem.
  • While theoretically more powerful, LL/SC has practical limitations that make it less commonly used than CAS in general-purpose code.
  • When writing C code, the differences between LL/SC and CAS are often abstracted away, allowing us to focus on the higher-level algorithm rather than the specific hardware instructions.
  • In OS/161, where sys161 simulates a MIPS32 processor, LL/SC is the native atomic operation.

(When logged in, completion status appears here.)