CS 70

C++ Implementation: insert()

Here's our encoding:

class IntTree {
 public: ...
 private:
    // Encoding for a tree node
    struct Node {
        int key_;
        Node* left_  = nullptr;     // if not specified, default to nullptr
        Node* right_ = nullptr;     // if not specified, default to nullptr
    };

    // The tree object owns all the nodes, and has a pointer to the root node.
    Node* root_;
};
  • Duck speaking

    Wait. I don't see why you needed a Node struct at all. Couldn't you just have put all the Node data members directly in the IntTree class?

  • LHS Cow speaking

    No. For example, if we said every IntTree has a key_, we wouldn't be able to deal with empty IntTrees.

  • RHS Cow speaking

    Overall, everything would be worse and more confusing. Don't try to avoid having a distinct type for your tree nodes.

insert()

We'll have a public member function that inserts a given key into the structure:

void insert(int k);

How will we implement this operation using our encoding?

Remember, here is our BST-INSERT pseudocode:

BST-INSERT(treenode, x):
    if there isn't a treenode (i.e., this subtree is empty):    // base case
        replace empty subtree with new tree node containing x as its value (and empty subtrees)
    else if x < treenode.value:
        BST-INSERT(treenode.leftSubtree, x)       // recurse on smaller case (left subtree)
    else:                                                                // x >= treenode.value
        BST-INSERT(treenode.rightSubtree, x)     // recurse on smaller case (right subtree)
  • Cat speaking

    I'm confused. I don't think I can use this pseudocode to write the insert() member function.

  • LHS Cow speaking

    Ooh. Good point!

Why can't we simply adapt this pseudocode to define the insert() member function for the IntTree class?

Our BST-INSERT pseudocode was written for our conceptual encoding of the tree data structure, where we were only thinking about nodes. Our C++ encoding is a little different. Let's look back at the diagram we when we introduced the encoding:

Tree Encoding

In one sentence, the issue is

BST-INSERT is all about the stuff in blue circles in our diagram, not the brown square, but the brown square is the IntTree object on which the insert() member function operates.

The interface given by our BST-INSERT pseudocode was written in a language-neutral way, but it was also an interface that was well-suited to recursion. It took a treenode argument, and recursive calls could pass in smaller subtrees for that argument until we hit the base case.

In contrast, in our C++ encoding, our tree's (private) Nodes are wrapped up inside an IntTree class.

Our class needs an insert() function, but that function can't itself be recursive: there's only one IntTree object to work with, and we can't break it into smaller IntTrees we could recurse on. So our insert() member function can't be BST-INSERT as we've defined it.

We will still implement a version of BST-INSERT, but it will be a separate (private) helper function—one that works on Nodes. Its interface will look just like the interface of BST-INSERT, but instead of being called directly by the user, this function will be called by our insert() member function, passing in the root_ of the tree as a starting point.

So, whereas our BST-INSERT pseudocode was a recursive function and was also an appropriate interface for inserting into the tree, in our C++ version, these aspects have to be separate and different—the interface that users call is insert(); insert() hands off the work to a separate private helper function (insertHelper()) whose interface is well suited to recursing over our Nodes, but only the insert() function can see that interface, users can't.

We can declare insert() and insertHelper() in our class definition as follows:

class IntTree {
  public:
    void insert(int k);
    ...

  private:
    struct Node {
      ...
    };
    Node* root_;

    void insertHelper(Node*& tree, int k);
};
  • Horse speaking

    Whoa! Whoa! What is that Node*& thing?

  • LHS Cow speaking

    You can use our inside-out rule for type reading to find out! It's a reference to a pointer to a Node.

  • Hedgehog speaking

    Why would you do that to meeeee?!

  • LHS Cow speaking

    It's all because of the part of the pseudocode where we said “replace empty subtree with new tree node”—we need to be able to update trees and pointers. Let's go over it…

For us, a Node* represents a subtree (by pointing to its root node).

In the base case, where the subtree is empty, we need to replace an empty subtree with a new tree consisting of a single node. So we need to change a pointer that was previously nullptr to one that points to our new node.

But we want the original pointer (wherever that is) to be updated. If we passed a Node* and then assigned that pointer to a new value, we'd just be changing a pointer variable in our function and nowhere else. So we need to pass a reference, a Node*&, so we can reassign the pointer to a new node.

Some code will probably help make everything make sense…

void IntTree::insert(int k) {
    if (!exists(k)) { // Don't insert keys already in the tree
        insertHelper(root_, k);
    }
}

void IntTree::insertHelper(Node*& tree, int k) {
    if (tree == nullptr) {
        tree = new Node{k, nullptr, nullptr};
        // ^-- because tree is a reference to a pointer, this will
        //     change the value of the pointer that was passed in!
    }
    else if (k < tree->key_) {
        insertHelper(tree->left_, k);
    }
    else {
        insertHelper(tree->right_, k);
    }
}

Because we pass tree by reference, when we run tree = new Node{k, nullptr, nullptr}, we will actually be changing tree->left_, tree->right_, or root_, depending on where insertHelper was called. If we'd declared tree as just a Node*, that assignment wouldn't have any lasting effect.

Here's a video walking through this code and using our memory model. (For simplicity, we'll leave out that call to exists.)

A Couple of Tips

  • RHS Cow speaking

    First, if, by chance, you happen to find yourself implementing a BST in C++ in the near future, we recommend that you draw lots of pictures!

  • LHS Cow speaking

    Visualization is the key to keeping your pointers straight. Use the tools we've given you!

  • RHS Cow speaking

    Second, try not to get too thrown by that Node*& in our example.

  • LHS Cow speaking

    For insert() it makes sense, because we are passing in a pointer to the tree root and we might need to change the value of the pointer!

  • RHS Cow speaking

    For other helper functions, we don't have to do that. They just take a Node* for their subtree.

(When logged in, completion status appears here.)