IntVector push_back()
In previous lessons and homeworks, we've seen code for CoordVector and IntVector classes that create an array-backed list. One of the operations this list type provides is push_back(). Our list has both a size (the number of ints it holds) and a capacity (the number of spaces it has in its array that could hold an int). When all the capacity is used up, it has to reallocate the array to a larger sized one, which requires \( \Theta(n) \) time.
So the complexity of a single push_back() operation is \( \Theta(1) \) time in the best case, and \( \Theta(n) \) time in the worst case.
When you wrote the CoordVector class, we told you to use a doubling strategy for resizing the array when it ran out of space. Thus when the array is full, we allocate a new array with double the capacity, copy all the data over, and then add the new item. In the bonus section of the homework, there was an opportunity to think about why this doubling strategy gives us good average performance.
Meh. I remember not doing that bonus question.
Of all the options for what we could do for a bonus, I felt least equipped to do that one.
I could, like, totally see it in my head, but I couldn't quite put it into words.
We're going to show that it's amortized constant time, right?
That's right.
First, let's get some practice running the code:
in a new browser window or tab to get a small program that performs \( m \) push_back() operations on an empty IntVector, using a version of our IntVector code that has some extra debugging print statements and statistics-gathering code.
The main part of the code is this loop:
IntVector vec;
for (int i = 1; i <= numItems; ++i) {
vec.push_back(i);
}
Click the green button at the top of the window, and you should see a line in the bottom input pane saying:
How many items to push onto our array-backed list?
Click the arrows to expand the input pane so that it fills more of the screen, and then type 10 and look at the output.
The results should look like the following (but without the colors you see here):
How many items to push onto our array-backed list? 10
Okay, pushing 10 items
push_back(1):
+ Added 1 in an empty slot (0 empty slots remaining)
push_back(2):
> Moved 1 to new array.
+ Added 2 in an empty slot (0 empty slots remaining)
push_back(3):
> Moved 1 to new array.
> Moved 2 to new array.
+ Added 3 in an empty slot (1 empty slots remaining)
push_back(4):
+ Added 4 in an empty slot (0 empty slots remaining)
push_back(5):
> Moved 1 to new array.
> Moved 2 to new array.
> Moved 3 to new array.
> Moved 4 to new array.
+ Added 5 in an empty slot (3 empty slots remaining)
push_back(6):
+ Added 6 in an empty slot (2 empty slots remaining)
push_back(7):
+ Added 7 in an empty slot (1 empty slots remaining)
push_back(8):
+ Added 8 in an empty slot (0 empty slots remaining)
push_back(9):
> Moved 1 to new array.
> Moved 2 to new array.
> Moved 3 to new array.
> Moved 4 to new array.
> Moved 5 to new array.
> Moved 6 to new array.
> Moved 7 to new array.
> Moved 8 to new array.
+ Added 9 in an empty slot (7 empty slots remaining)
push_back(10):
+ Added 10 in an empty slot (6 empty slots remaining)
10 pushes, 0 pops, 15 moves.
Click the button again to try running the program with a different number of moves.
Because we double the capacity of the array each time, and the capacities are \( 1, 2, 4, 8, 16, 32, 64, \ldots \), and resizes happen when we push \( 3, 5, 9. 17, 33, 65, \ldots \), the worst case for the amount of work done for resizing will be just after we've resized.
Let's look at a table of the number of moves for different values of \( m \):
| \( m \) | Worst-Case Moves |
|---|---|
| 5 | 7 |
| 9 | 15 |
| 17 | 31 |
| 33 | 63 |
| 65 | 127 |
We can guess that the formula for worst-case number of moves is \( 2(m - 1) - 1 \), which is \( 2m - 3 \). So, after \( m \) pushes we will have done \( m \) steps actually pushing new data (the orange steps in the trace above), and \( 2m - 3 \) steps copying data around (the blue and green steps in the trace).
Thus after \( m \) pushes, we will have done \( \Theta(m) \) + \( \Theta(m) \) work, which is \( \Theta(m) \).
Thus, a single push is \( \Theta(1) \) amortized time. That's it!
Wait, what? I wasn't expecting to be done so quickly.
Hey, enjoy it!
We didn't exactly prove our \( 2m - 3 \) formula, we just guessed it.
Sure, we could be more rigorous there, but the main amortization argument is that we worked out what it would cost to do \( m \) operations from our starting point of an empty list, and then divided by \( m \) to get the amortized cost per operation.
So it really is just a kind of average cost?
Yes! It's an average with an ironclad worst-case guarantee, rather than a probabilistic sort of average.
But let's see if we can be a little more rigorous about the \( 2m - 3 \) formula for moves.
There are multiple ways of making the argument, but here's one: When push_back() requires us to double the capacity, the number of items we want to have in our list is just one past a power of two, so \( n = 2^x + 1 \) for some \( x \) (I'm using \(n\) when thinking about the number of items in our list and \( m \) when thinking about the number of operations—in this case, because pushing increases the number of items, \( n = m \)). We'll have just copied \( 2^x \) array elements, but there were previous copies before that. So the copying cost is
\[ \sum_{i=0}^x 2^i = 2^{x + 1} - 1 = 2(2^x) - 1 = 2(n - 1) - 1 . \]
Was I supposed to know that \( \sum_{i=0}^x 2^i = 2^{x + 1} - 1 \)?
It is a useful summation identity, and you probably did already know it.
I did?
Imagine a concrete example, like showing that \( 8 + 4 + 2 + 1 = 16 - 1 \). Think of counting in binary. If we have 16, that's
10000in binary, and if we subtract 1, we get01111.
That's cool!
Okay, I see. I sort of wanted to use the colors in the trace rather than summations. I could see a pattern and wanted to use that.
What's the pattern you see? Actually, wait, let's see what the students think…
In the trace above, moves are blue and green, whereas the actual pushes of the data are orange (you might also notice that blue moves are moves of recently added data, whereas green moves are data we've moved before). The way we've colored the operations in the trace above helps us see patterns and relationships.
We can match up each blue step with an (earlier) orange step. Similarly, we can match each green step with a blue step (and thus an orange one, too). Thus the number of blue and green steps cannot be larger than the number of orange steps.
Yes, that's basically what I saw. After a bit of muddle at the beginning, the orange steps were always followed by the same number of green steps and then the same number of blue steps.
Another way to look at things is to use money as a proxy for time and imagine that each step (regardless of color) costs $1, and we need to collect enough money at each push_back() to pay for all the work. A constant fee of $3 per push_back() has us covered: $1 for the orange step that's guaranteed to happen this time, and $2 to cover both a blue and green step that can happen later on.
Either of these arguments is sufficient to show that push_back is a constant–amortized-time operation.
We can also use similar approaches to any of the ones we've seen on this page to show that pop_back is a constant–amortized-time operation. If you'd like to have a go yourself, that's cool, but in the interests of time, we'll move on.
(When logged in, completion status appears here.)