Quicksort Explained
The essence of Quicksort is to
- Find the value in the array that we think will be in the middle once the array is sorted (we call this value the pivot).
-
Rearrange the array so that all elements less than the pivot come before it, and all elements greater than the pivot come after it (which we call “partitioning”).
After rearranging, a couple of important things are true:
- The pivot is now in its final sorted position.
- All elements before the pivot are less than or equal to it, and all elements after the pivot are greater than or equal to it, but those sets aren't necessarily sorted themselves.
-
Recursively apply the same process to the subarray of elements before the pivot and the subarray of elements after the pivot.
That's it? How do we know it works?
That's the power of recursion! At each step, we're breaking the problem down into smaller pieces that are easier to solve, and at each step we make progress by putting one element (the pivot) in its final place.
This is a divide-and-conquer algorithm, right? Each time the problems become half as big, so that's going to be good, I think….
I think I get it now. Wait… hold your horses! How do we pick the pivot? It's supposed to be the element that will be in the middle when the array is sorted, but we don't know what the sorted array looks like yet!
You're right, it would be quite expensive (but not impossible) to figure out the median element each time. So instead we use a heuristic (a “rule of thumb”) to pick an element that is likely to be near the middle. One option is to just pick an arbitrary element, but we'll explore some additional options using the simulator below.
Quicksort Visualization
Let's explore how Quicksort works with an interactive visualization. You can just dive in and play with it, but we've suggested some particular things to try in the discussion below the applet.
Observing the Basics
Leave the settings at their defaults (size 15, seed 12345, median-of-3 pivot, reverse data) and click to ensure we're starting from the same place. In fact, you can just click the button below when you're ready to start and it'll set everything up for you.
Then click repeatedly to see how the algorithm progresses. Some things to notice:
- We've begin with a reverse-sorted array because everything has to move, which makes the divide-and-conquer nature of the algorithm clearer.
- We've picked a good heuristic for choosing the pivot: median-of-three. By looking at three carefully chosen elements, we make a guess as to what the median is likely to be. In this particular case, we guess right every time.
- The pivot element (yellow) raises up to indicate that it's being used to partition the array.
- The other elements slide left or right to make room for the pivot to drop into its final position.
- The pivot turns green and locks when it's in its final position.
- The subarray currently being processed is indicated by the dashed line and label below the array.
You can change the random seed (or click to pick a new random seed) to see different initial arrays, but because the array begins exactly reverse-sorted, the algorithm will behave identically each time.
Rather than click repeatedly, you can click to have it run automatically. You can pause the simulator at any time, and control the pace with the speed slider. You can also adjust the size of the array (31 is a good size if you want to try a bigger example).
Trying a Randomly Sorted Array
For a first experiment, switch the data to “random” and click , keeping the pivot selection at “median-of-3”. Then click to see how it does. You should see that it finishes quite quickly, because the median-of-three heuristic is doing a good job of picking pivots that divide the array roughly in half each time.
It's probably a good idea to click a few times to see how it does on different random arrays.
It doesn't always hit exactly the middle—it feels like it's often about a third of the way in.
But it's still doing divide-and-conquer, even if a bit unevenly. It's still going to perform well.
Trying a Reverse-Sorted Array with a Random Pivot
Now let's try the same experiment with a reverse-sorted array and a random pivot-selection strategy. Switch the data to “reverse” and the pivot selection to “random”, then click .
You should see that the performance with these settings is also okay. Sometimes it won't make the best choice, but on average it does a decent job of dividing the array, and so it finishes in a reasonable time.
This is really cool! Instead of going to the effort of finding the median, we can just pick randomly and still get good performance on average.
I love low-effort strategies!
Exactly. Making an arbitrary choice is often good enough, and it's much cheaper than trying to find the perfect choice.
Well, if we're going to just choose arbitrarily, why not always choose the first element?
Let's try that and see!
Trying a Random Array with First-Element Pivot
Now switch the data back to “random” and the pivot selection to “first”, then click .
It works—I rock! We don't even need randomness! does a little dance
Hold your horses. It worked this time, but I bet there are cases where it won't work so well. Let's try our reverse-sorted array again.
Good idea.
Trying a Reverse-Sorted Array with First-Element Pivot
Now switch the data back to “reverse” and keep the pivot selection at “first”, then click .
Quicksort's Worst Case
If we pick the pivot poorly, Quicksort can degrade to quadratic time. This worst-case performance happens when the pivot is always the largest or smallest element in the subarray, which leads to unbalanced partitions. For example, if we always pick the first element as the pivot and the array is already sorted (or reverse-sorted), each partitioning step will only reduce the problem size by one element.
OMG! Quicksort is terrible! Why would anyone use an algorithm that can be this bad?!??!
Hay! Wait a moment. We got into this mess because we used Duck's strategy of always picking the first element as the pivot. Maybe that just wasn't a good idea.
To hit the worst case, we need to consistently pick a terrible pivot. Let's look at how likely picking a bad pivot is with different pivot selection strategies.
- First Element: If the array is already sorted or reverse-sorted, this strategy will always pick the worst pivot. So the worst case is quite easy to trigger (sometimes data is already sorted, or nearly so).
- Arguably, no one should ever use this strategy. Yet in practice, some implementations do. 😖
- Median-of-Three: This strategy is more robust, but if it uses fixed positions (such as the first, middle, and last elements), an adversary could force it into the worst case by carefully constructing the array so that the median-of-three pick is always a very poor choice.
- Such an input is very unlikely to occur naturally, but bad actors could deliberately create inputs that exploit this weakness in what is known as an algorithmic complexity attack, where the goal is denial of service by causing the algorithm to run slowly.
- Random: This strategy is the most robust against worst-case scenarios. Because the pivot is chosen randomly, the likelihood of consistently picking a terrible pivot (or, to be fair, a really good one) is extremely low. While it's theoretically possible to hit the worst case with random selection, the probability decreases exponentially with the number of elements.
So for the random case, it's a bit like saying, “Here's a coin: when you flip heads you can go home”. Some people might worry that it's theoretically possible to flip tails forever and never get to leave, but in practice it's not going to happen.
Exactly, there's what exists in the mathematical space of possibility and there is what we could ever expect to see in practice. They're not really the same.
That said, this section of the course is taking a mathematical perspective, so even if the chances are so vanishingly small that we would never see it in practice, from a theory standpoint the worst case is still quadratic time no matter which pivot strategy we use.
So with random pivot selection, Quicksort is totally fine in practice, but on paper, from a theory perspective, it's going to look bad.
Pretty much.
(When logged in, completion status appears here.)