CS 70

IntVector push_back()

In previous lessons and homeworks, we've seen code for CoordVector and IntVector classes that create an array-backed list. One of the operations this list type provides is push_back(). Our list has both a size (the number of ints it holds) and a capacity (the number of spaces it has in its array that could hold an int). When all the capacity is used up, it has to reallocate the array to a larger sized one, which requires \( \Theta(n) \) time.

So the complexity of a single push_back() operation is \( \Theta(1) \) time in the best case, and \( \Theta(n) \) time in the worst case.

When you wrote the CoordVector class, we told you to use a doubling strategy for resizing the array when it ran out of space. Thus when the array is full, we allocate a new array with double the capacity, copy all the data over, and then add the new item. In the bonus section of the homework, there was an opportunity to think about why this doubling strategy gives us good average performance.

  • Goat speaking

    Meh. I remember not doing that bonus question.

  • Hedgehog speaking

    Of all the options for what we could do for a bonus, I felt least equipped to do that one.

  • Sheep speaking

    I could, like, totally see it in my head, but I couldn't quite put it into words.

  • Duck speaking

    We're going to show that it's amortized constant time, right?

  • LHS Cow speaking

    That's right.


First, let's get some practice running the code:

in a new browser window or tab to get a small program that performs \( m \) push_back() operations on an empty IntVector, using a version of our IntVector code that has some extra debugging print statements and statistics-gathering code.

The main part of the code is this loop:

IntVector vec;
for (int i = 1; i <= numItems; ++i) {
    vec.push_back(i);
}

Click the green Run button at the top of the window, and you should see a line in the bottom input pane saying:

How many items to push onto our array-backed list?

Click the arrows to expand the input pane so that it fills more of the screen, and then type 10 and look at the output.

Copy the output starting at the line that says “Okay, pushing 10 items” and ending at the line that says the number of pushes, pops, and moves. Paste it into the box below.

The results should look like the following (but without the colors you see here):

How many items to push onto our array-backed list? 10
Okay, pushing 10 items
push_back(1):
        + Added 1 in an empty slot (0 empty slots remaining)
push_back(2):
        > Moved 1 to new array.
        + Added 2 in an empty slot (0 empty slots remaining)
push_back(3):
        > Moved 1 to new array.
        > Moved 2 to new array.
        + Added 3 in an empty slot (1 empty slots remaining)
push_back(4):
        + Added 4 in an empty slot (0 empty slots remaining)
push_back(5):
        > Moved 1 to new array.
        > Moved 2 to new array.
        > Moved 3 to new array.
        > Moved 4 to new array.
        + Added 5 in an empty slot (3 empty slots remaining)
push_back(6):
        + Added 6 in an empty slot (2 empty slots remaining)
push_back(7):
        + Added 7 in an empty slot (1 empty slots remaining)
push_back(8):
        + Added 8 in an empty slot (0 empty slots remaining)
push_back(9):
        > Moved 1 to new array.
        > Moved 2 to new array.
        > Moved 3 to new array.
        > Moved 4 to new array.
        > Moved 5 to new array.
        > Moved 6 to new array.
        > Moved 7 to new array.
        > Moved 8 to new array.
        + Added 9 in an empty slot (7 empty slots remaining)
push_back(10):
        + Added 10 in an empty slot (6 empty slots remaining)

10 pushes, 0 pops, 15 moves.

Click the Run button again to try running the program with a different number of moves.

Because we double the capacity of the array each time, and the capacities are \( 1, 2, 4, 8, 16, 32, 64, \ldots \), and resizes happen when we push \( 3, 5, 9. 17, 33, 65, \ldots \), the worst case for the amount of work done for resizing will be just after we've resized.

By trying the sizes 17, 33, and 65, guess a formula for the worst-case number of moves after \( m \) pushes. (You only need to find a formula that gives the right answer for these values, you don't need to figure out why, or make it work for non–worst-case sizes.)

Let's look at a table of the number of moves for different values of \( m \):

\( m \) Worst-Case Moves
5 7
9 15
17 31
33 63
65 127

We can guess that the formula for worst-case number of moves is \( 2(m - 1) - 1 \), which is \( 2m - 3 \). So, after \( m \) pushes we will have done \( m \) steps actually pushing new data (the orange steps in the trace above), and \( 2m - 3 \) steps copying data around (the blue and green steps in the trace).

Thus after \( m \) pushes, we will have done \( \Theta(m) \) + \( \Theta(m) \) work, which is \( \Theta(m) \).

Thus, a single push is \( \Theta(1) \) amortized time. That's it!

  • Hedgehog speaking

    Wait, what? I wasn't expecting to be done so quickly.

  • Goat speaking

    Hey, enjoy it!

  • Cat speaking

    We didn't exactly prove our \( 2m - 3 \) formula, we just guessed it.

  • LHS Cow speaking

    Sure, we could be more rigorous there, but the main amortization argument is that we worked out what it would cost to do \( m \) operations from our starting point of an empty list, and then divided by \( m \) to get the amortized cost per operation.

  • Hedgehog speaking

    So it really is just a kind of average cost?

  • LHS Cow speaking

    Yes! It's an average with an ironclad worst-case guarantee, rather than a probabilistic sort of average.

But let's see if we can be a little more rigorous about the \( 2m - 3 \) formula for moves.

Ignoring the \( -3 \) part if you like, can you come up with a good argument for why the worst-case cost of moving things is around \( 2m \)?

There are multiple ways of making the argument, but here's one: When push_back() requires us to double the capacity, the number of items we want to have in our list is just one past a power of two, so \( n = 2^x + 1 \) for some \( x \) (I'm using \(n\) when thinking about the number of items in our list and \( m \) when thinking about the number of operations—in this case, because pushing increases the number of items, \( n = m \)). We'll have just copied \( 2^x \) array elements, but there were previous copies before that. So the copying cost is

\[ \sum_{i=0}^x 2^i = 2^{x + 1} - 1 = 2(2^x) - 1 = 2(n - 1) - 1 . \]

  • Hedgehog speaking

    Was I supposed to know that \( \sum_{i=0}^x 2^i = 2^{x + 1} - 1 \)?

  • LHS Cow speaking

    It is a useful summation identity, and you probably did already know it.

  • Hedgehog speaking

    I did?

  • LHS Cow speaking

    Imagine a concrete example, like showing that \( 8 + 4 + 2 + 1 = 16 - 1 \). Think of counting in binary. If we have 16, that's 10000 in binary, and if we subtract 1, we get 01111.

  • Dog speaking

    That's cool!

  • Hedgehog speaking

    Okay, I see. I sort of wanted to use the colors in the trace rather than summations. I could see a pattern and wanted to use that.

  • LHS Cow speaking

    What's the pattern you see? Actually, wait, let's see what the students think…

In the trace above, moves are blue and green, whereas the actual pushes of the data are orange (you might also notice that blue moves are moves of recently added data, whereas green moves are data we've moved before). The way we've colored the operations in the trace above helps us see patterns and relationships.

Looking at the debugging trace above, what pattern do you observe considering the blue (and green) steps in relation to the orange ones?

We can match up each blue step with an (earlier) orange step. Similarly, we can match each green step with a blue step (and thus an orange one, too). Thus the number of blue and green steps cannot be larger than the number of orange steps.

  • Hedgehog speaking

    Yes, that's basically what I saw. After a bit of muddle at the beginning, the orange steps were always followed by the same number of green steps and then the same number of blue steps.

Another way to look at things is to use money as a proxy for time and imagine that each step (regardless of color) costs $1, and we need to collect enough money at each push_back() to pay for all the work. A constant fee of $3 per push_back() has us covered: $1 for the orange step that's guaranteed to happen this time, and $2 to cover both a blue and green step that can happen later on.

Either of these arguments is sufficient to show that push_back is a constant–amortized-time operation.

We can also use similar approaches to any of the ones we've seen on this page to show that pop_back is a constant–amortized-time operation. If you'd like to have a go yourself, that's cool, but in the interests of time, we'll move on.

(When logged in, completion status appears here.)