CS 134

Parallelism vs. Concurrency

  • Cat speaking

    We've spent a lot of time talking about concurrency, but I've also heard people use the term “parallelism”. What's the difference?

  • PinkRobot speaking

    Good question! Let's define some terms…

Definitions

  • Hedgehog speaking

    This'll help. I know being precise about terms is important in computer science, but people don't always lay out the definitions.

Asynchrony

Literally, asynchrony means “not at the same time”. In programming, it means that we can set an action in motion then move on to the next action before the first one completes. We don't have to wait for the first action to finish before starting the next one.

In asychronous programming, we usually need some mechanism to deal with what happens once an asynchronous action finally does complete. As we've seen, in JavaScript, for example, we can use a Promise to represent the eventual completion of an asynchronous operation, and the then method to attach an action to be taken when the Promise is resolved.

In other aynchronous systems, we might have some kind of notification receiver that is triggered when the asynchronous action completes.

In operating systems like OS/161, we can also consider interrupts as a form of asynchrony. They tell us about something that happened while we were doing something else.

Concurrency

Concurrency is about overlapping durations. Two actions are concurrent if their durations overlap in time, but they don't have to literally happen at once.

  • Dog speaking

    I'm taking CS 134 and CS 140 concurrently.

  • Cat speaking

    That doesn't mean you're working on them simultaneously.

  • PinkRobot speaking

    Exactly.

Concurrency problems often involve arbitrating access to shared resources among actions that are happening concurrently.

Synchronization mechanisms are about controlling concurrency, especially access to shared data. In other words, saying, “No, two things can't happen at the same time”, or, “Yes, two things can happen at the same time, but only if they follow these rules”.

Simultaneity

Simultaneity is about literally doing multiple actions at the same time. For simultaneity to occur, there must be multiple execution engines (e.g., processor cores) running code at the same time.

Some people call running multiple things simultaneously “running in parallel” or even call it “parallelism,” but in this class we'll use a more precise definition of parallelism.

Parallelism

Parallelism means taking advantage of multiple execution engines (e.g., CPU cores) to achieve speedup by doing some work simultaneously.

Note that simultaneity implies one form of concurrency, but a system with N actions to perform might only run M of them simultaneously (where M < N), so we will only know that some of the actions will run concurrently but we may not know which ones, and, in this scenario, some actions will not run concurrently.

Contention over accessing shared resources is the enemy of speedup.

Summary

Here's a video that explains the difference between concurrency and parallelism:

Techniques for Parallelism

Isolated Tasks

Consider the task of compiling OS/161 from scratch. When we run bmake, it's going to run lots of gcc commands, one to compile each source file.

  • Each compilation command is an isolated task that can be run in parallel with the others.
  • Because each compilation command is working on a different source file, they don't need to share any data.
  • It's not required for any two specific compilation commands to run simultaneously, but it's okay if they do.
  • No compilation command needs to happen before or after any other gcc compilation command.
  • We cannot link the final executable (which also uses gcc) until all the object files are created by the compilation.

When compiling OS/161, bmake has hundreds of compilation commands to run. It's not efficient to

  • Run just one compilation command at a time, or
  • Try to set them all running at once.

Why not? What should bmake do instead?

The Idea of a Work Queue

One way to manage parallelism is to have a work queue that holds tasks that need to be done. We can have multiple worker threads that each take tasks from the work queue and do them.

Example: bmake

A program such as bmake can use the standard Unix facilities like fork, exec and waitpid along with an internal queue to manage parallelism. The pseudocode might look something like

const int MAX_WORKERS = 4;
int num_workers = 0;

while (more work to do) {
    if (num_workers < MAX_WORKERS) {
        fork a compiler process;
        num_workers++;
    } else {
        wait for a compilation task to finish;
        num_workers--;
    }
}

while (num_workers > 0) {
    wait for a compilation task to finish;
    num_workers--;
}

Example: Go

In Go, when we say go functionName(), we're creating a new goroutine to run the function, and it starts immediately.

If we need to do a thousand independent tasks, we can start a thousand goroutines to do them, with code like

for i := 0; i < 1000; i++ {
    go doTask(i)
}

But is that approach going to be efficient?

It's fairly straightforward to create a work queue in Go. Here's an example:

package main

import (
    "fmt"
    "math/rand"
    "runtime"
    "sync"
    "time"
)

// Task represents a generic task to be executed
type Task func() Result

// Result represents the result of processing a task
type Result struct {
    filename string
    duration time.Duration
}

// generateImage creates a random image
func generateImage(filename string, width, height int) Task {
    return func() Result {
        // We'll just fake this...
        delay := time.Duration(rand.Intn(750)+250) * time.Millisecond
        time.Sleep(delay)
        return Result{filename, delay}
    }
}

// processImage applies a simple filter to the image
func processImage(filename string) Task {
    return func() Result {
        // We'll just fake this...
        delay := time.Duration(rand.Intn(750)+250) * time.Millisecond
        time.Sleep(delay)
        return Result{filename, delay}
    }
}

func worker(id int, tasks <-chan Task, results chan<- Result, wg *sync.WaitGroup) {
    defer wg.Done()
    for task := range tasks {
        fmt.Printf("Worker %d processing new task...\n", id)
        results <- task()
    }
}

const MAX_WORKERS = 4

func executeTasks(tasks []Task) time.Duration {
    numWorkers := runtime.NumCPU()
    if numWorkers > MAX_WORKERS {
        numWorkers = MAX_WORKERS
    }
    tasksChan := make(chan Task, len(tasks))
    resultsChan := make(chan Result, len(tasks))

    // Fill task channel
    for _, task := range tasks {
        tasksChan <- task
    }
    close(tasksChan)

    // Start worker pool
    var wg sync.WaitGroup
    for i := 0; i < numWorkers; i++ {
        wg.Add(1)
        go worker(i, tasksChan, resultsChan, &wg)
    }

    // Wait for all workers to finish
    go func() {
        wg.Wait()
        close(resultsChan)
    }()

    // Collect results
    var totalDuration time.Duration
    for result := range resultsChan {
        fmt.Printf("Processed %s in %v\n", result.filename, result.duration)
        totalDuration += result.duration
    }

    return totalDuration
}

func main() {
    numImages := 20
    var generateTasks []Task
    var processTasks []Task

    // Create image generation tasks
    for i := 0; i < numImages; i++ {
        filename := fmt.Sprintf("image_%d.png", i)
        generateTasks = append(generateTasks, generateImage(filename, 1000, 1000))
    }

    // Execute image generation tasks
    fmt.Println("Generating images...")
    genDuration := executeTasks(generateTasks)
    fmt.Printf("All images generated in %v\n", genDuration)
    fmt.Printf("Average time per image generation: %v\n", genDuration/time.Duration(numImages))

    // Create image processing tasks
    for i := 0; i < numImages; i++ {
        filename := fmt.Sprintf("image_%d.png", i)
        processTasks = append(processTasks, processImage(filename))
    }

    // Execute image processing tasks
    fmt.Println("\nProcessing images...")
    procDuration := executeTasks(processTasks)
    fmt.Printf("All images processed in %v\n", procDuration)
    fmt.Printf("Average time per image processing: %v\n", procDuration/time.Duration(numImages))

    fmt.Printf("\nTotal time: %v\n", genDuration+procDuration)
}

For brevity, we've removed the actual image generation and processing code from the example, but if you'd like to check that out, you can download the full code and run it on the CS 134 server. (Note that this full version can't run in Online GDB because the image files generated would use too much space and can't be viewed anyway.)

  • Goat speaking

    Meh. I don't want to read all that, can you break it down for me?

  • PinkRobot speaking

    Sure…

Things to notice about the code:

  • We've made a Task type that represents a generic task to be executed, it's an anonymous function that returns a Result.
    • Result is a struct that holds the filename of the image and the duration of the task, but we could have made it anything we wanted.
  • We have two functions, generateImage and processImage, that create tasks to generate and process images, respectively.
    • The syntax looks a little strange because these are functions that return functions.
  • We have a worker function that takes tasks from a channel, executes them, and sends the results to another channel.
    • We create a fixed number of worker goroutines to process the tasks.
  • We have a executeTasks function that manages the work queue.
    • It creates channels for tasks and results, fills the task channel, starts the worker pool, and collects the results.
  • In main, we create tasks to generate and process images, execute the tasks, and print out the results.

The code is a bit more complex than the previous examples, but it's a good example of how you can manage parallelism in Go using goroutines and channels.

  • Cat speaking

    It's another example of the power of the CSP model. We're using channels to communicate between goroutines, and we're able to manage parallelism in a way that's relatively easy to understand.

  • Dog speaking

    For you, maybe…

  • Hedgehog speaking

    We won't have to code something like this, will we?

  • PinkRobot speaking

    We won't expect you to write Go code in CS 134, but other problems you've worked on are at least as complex as this one. As far as principles go, it's not so out of reach.

  • Hedgehog speaking

    Okay, I guess you're right.

Example: Swift

Here's very similar code in the language Swift, which has a TaskGroup concept that makes it easier to manage tasks and their results. We create a TaskGroup with withTaskGroup and add tasks to it with addTask. Calling waitForAll waits for all the image-creation tasks to complete before moving on to the image-processing tasks.

So Swift allows us to run tasks in parallel without having to build the infrastructure ourselves.

import Foundation

// Image creation simulation
func createImage(index: Int) async {
    print("Creating image \(index)")
    // Simulate I/O-bound image creation
    let startTime = Date()
    let waitTime = Double.random(in: 0.75...1.5)
    while Date().timeIntervalSince(startTime) < waitTime {
        // Busy-wait to simulate I/O-bound work
    }
    print("Finished creating image \(index)")
}

// Image processing simulation
func processImage(index: Int) async {
    print("Processing image \(index)")
    // Simulate CPU-bound image processing
    let startTime = Date()
    let waitTime = Double.random(in: 0.75...2.0)
    while Date().timeIntervalSince(startTime) < waitTime {
        // Busy-wait to simulate CPU-bound work
    }
    print("Finished processing image \(index)")
}

func processImages(totalCount: Int) async {
    await withTaskGroup(of: Void.self) { group in
        for i in 0..<totalCount {
            group.addTask {
                await createImage(index: i)
            }
        }
        // Wait for all images to be created
        await group.waitForAll()
        for i in 0..<totalCount {
            group.addTask {
                await processImage(index: i)
            }
        }
    }
    print("All images processed")
}

// Entry point
print("Running Image Processing Example:")

// Create a semaphore to wait for the async task to complete
let completionSemaphore = DispatchSemaphore(value: 0)

Task {
    await processImages(totalCount: 20)
    completionSemaphore.signal()
}

// Wait for the async task to complete
completionSemaphore.wait()

print("Done")
  • Hedgehog speaking

    Why do you keep showing us all these different languages?

  • PinkRobot speaking

    Well, you don't need to deeply understand the syntax and operation of each language, but seeing the same concept implemented in different languages is useful—it helps us understand the principles behind the code, rather than just the syntax. Another example is that you can see that Swift blends the idea of tasks with asynchrony via async/await, where the Go code doesn't have asynchronous tasks.

  • BlueRobot speaking

    It also helps show that these concepts aren't isolated to one language. They're increasingly prevalent in modern programming languages.

  • Goat speaking

    Meh. Only took, like, 50 years.

  • Horse speaking

    Hay! I noticed that the Swift code doesn't have any explicit limit on how many things run at once.

  • PinkRobot speaking

    That happens automagically in Swift. The runtime decides how many tasks to run at once based on the system's capabilities. That's probably what you would want, but Swift does the work of profiling the system and figuring out the best balance for you.

  • Rabbit speaking

    If you wanted to have more control, I've written a version of the Swift code that limits the number of tasks running at once. All we needed to do was add a semaphore. I wrote my own using Swift's “actor” feature—actors are a lot like the Monitor objects we saw earlier. It even uses continuations!

  • PinkRobot speaking

    Uh, thanks, Rabbit. That's… great. But perhaps we don't need to go down that rabbit hole right now.

Rules for Isolated Tasks

  • Cat speaking

    You said the tasks should be isolated so they could run in parallel. Do they really need to share no data at all?

  • PinkRobot speaking

    Well, technically, what we actually want is to have no data dependencies between the tasks.

If we want to make sure that two tasks won't interfere with each other, we can use Bernstein's conditions:

Given two tasks, \(T_1\) and \(T_2\), with

  • Read sets \(R_1\) and \(R_2\) (the data they read), and
  • Write sets \(W_1\) and \(W_2\) (the data they write),

we can run the tasks in parallel if

  • \(R_1 \cap W_2 = \emptyset\) and \(W_1 \cap R_2 = \emptyset\) — they don't read each other's writes; and
  • \(W_1 \cap W_2 = \emptyset\) — they don't write to the same data.
  • Duck speaking

    These seem like the obvious things to check, but it's good to have a name for them.

  • PinkRobot speaking

    They're named after Arthur J. Bernstein, who wrote a paper in 1966 that laid out these conditions for the first time. And, yes, they are pretty straightforward.

  • Goat speaking

    Meh. I guess in 1966, it was easy to get your name on something.

Limitations of Parallelism

Oftentimes, some of the work can be split up and done in parallel, but some of it can't. For example, in the case of compiling OS/161, we can compile each source file in parallel, but we can't link the final executable until all the object files are created.

Amdahl's Law

Amdahl's Law is a formula that tells us how much speedup we can get from parallelizing a program. It's named after Gene Amdahl, who worked on the IBM 360 mainframe.

The formula is $$ \text{Speedup} = \frac{1}{(1 - f) + \frac{f}{n}}, $$

where

  • \(f\) is the fraction of the task that can be parallelized.
  • \(n\) is the number of cores.

Speedup Calculator



Speedup: 2

Play with the sliders to see how the speedup changes as you vary the fraction of the task that can be parallelized and the number of cores.

If we have a problem where 95% of the work can be parallelized, how much speedup can we get if we have 64 cores?

Amdahl's Law

Amdahl's Law. The speedup we can get from parallelizing a program is limited by the fraction of the task that can be parallelized.
Image credit: Daniels220 via Wikimedia Commons (CC BY-SA 3.0).
  • Goat speaking

    Meh. Amdahl's Law means you shouldn't even try to parallelize things.

  • PinkRobot speaking

    Well, not quite. It's a reminder that we can't just throw more cores at a problem and expect it to get faster. We need to think about how much of the work can be parallelized.

  • Rabbit speaking

    Actually, Amdahl's Law is pessimistic because it assumes that the amount of work is fixed. In reality, we can often increase the amount of work to take advantage of more cores and the serial portion remains about the same, so it becomes less significant. There's another law, Gustafson's Law, that takes this into account.

  • PinkRobot speaking

    But let's not go too far down that rabbit hole right now.

But the big lesson is that when we want to paralellize things, we want as little coordination between the parallel tasks as possible. The more coordination we need, the less speedup we'll get. That's why the golden rule of parallelism is

  • Make your tasks as independent as possible.

When to Use Which?

  • Use concurrency when
    • You need precise control over the execution of multiple threads or processes.
    • You're dealing with shared resources that require careful synchronization.
    • You need multiple things to be happening over the same time period.
  • Use asynchrony when
    • You're dealing with I/O-bound operations (e.g., network requests or file operations).
    • You want to keep a user interface responsive while waiting for a long-running operation to complete.
  • Use parallelism when
    • You have multiple cores available and want to speed up your program by running parts of it simultaneously (but not necessarily all parts at once).
    • You can break your work into independent tasks that can be run in parallel.

Takeaways

  • Concurrency is about overlapping durations.
  • Asynchrony is about not waiting for an action to complete before starting the next one.
  • Parallelism is about running multiple things simultaneously to achieve speed-up*.
  • Simultaneity is about literally doing multiple actions at the same time.
  • Isolated tasks can be run in parallel, but don't need to be.
    • They could be separate processes, or
    • Chunks of work handled in a single process.
    • Should obey Bernstein's conditions.
  • Work queues can be used to manage parallelism.
    • They're built into some languages, like Swift.
    • In other languages, like Go, you might need to build your own.

(When logged in, completion status appears here.)