CS 70

Weight-Balanced Trees

In the main part of the assignment, we implemented several different insertion schemes for our search tree, including the RANDOMIZED scheme where it is vanishingly unlikely that the tree will be pathologically unbalanced. While a certain kind of obsessive theoretical computer scientist might be concerned by the fact that there is still some tiny chance of unbalance, no matter how tiny, in our universe we can consider this chance effectively zero for all practical purposes. RANDOMIZED won't produce pathologically unbalanced trees, and, in fact, will typically produce trees where the search or insertion cost is only 38% worse than an optimally balanced tree on average, but it's still reasonable to ask whether a tidier tree structure would perform better.

In the lesson materials on balanced search trees, you saw red–black trees as one way to guarantee that the tree remains balanced. But there is another approach—weight-balanced trees—that is somewhat simpler to implement (especially in the context of our existing TreeSet<T> implementation) but that still provides guaranteed balance and better tree statistics than RANDOMIZED insertion.

The Fundamental Idea

The key idea behind weight-balanced trees is that if there are many more nodes in the left subtree than in the right subtree (or vice versa), the tree is unbalanced and needs to be restructured. Because we already keep track of the size of each subtree in each node (added in Phase 6), we can use this information to determine when a subtree is unbalanced.

There is a trade-off in choosing how unbalanced a subtree is allowed to be before we decide to restructure it. If we required nearly perfect balance (i.e., the left and right subtrees must have almost exactly the same number of nodes), we would end up doing a lot of restructuring, which would slow down insertions. On the other hand, if we allowed a lot of imbalance, the tree's performance would be no better than our RANDOMIZED insertion scheme, and when we finally did restructure, it would be a big job.

The original paper (from 1973) on weight-balanced trees by Jürg Nievergelt and Edward M. Reingold (and the Wikipedia article on the topic) gives some rather scary looking irrational-number thresholds for deciding when to restructure a subtree. However, Yoichi Hirai and Kazuhiko Yamamoto showed in their 2011 paper that simple integer thresholds work just as well in practice, although the exact values still need to be carefully chosen. They consider a tree to be balanced if no more than \( 3/4 \) of the children of a node are in one subtree. In other words, the left child can't have more than \( \Delta \) times as many nodes as the right child, and vice versa, where \( \Delta = 3 \). If this condition is violated, we must perform one or more rotations to restore balance.

The balancing algorithm needs a second constant, \( \Gamma \), that determines when to do single versus double rotations. Nievergelt and Reingold give the following formula1 for \( \Gamma \):

\[ \Gamma \le \frac{\Delta - 1}{\Delta - 2}. \]

Pleasingly, when \( \Delta = 3 \), \( \Gamma \le 2 \), so we can simply set \( \Gamma = 2 \) and use integer arithmetic throughout.

  • Rabbit speaking

    In fact, 3 and 2 are the only integer values that satisfy the necessary constraints to guarantee logarithmic height, so this choice is essentially forced on us if we want to use integers!

Implementation Code

Rather than require you to write an implementation of weight-balanced trees from scratch, we'll provide code that you can adapt into your existing TreeSet<T> implementation. This code will implement insertion using weight-balanced trees, which you can then compare against your existing insertion schemes.

Here's the main insertion helper function:

template <typename T>
void TreeSet<T>::nodeInsertWB(Node*& here, const T& key) {
    // Base case, if the tree is empty, add the new node
    if (here == nullptr) {
        here = new Node{key};
        return;
    }

    ++here->size_;

    if (key < here->value_) {
        nodeInsertWB(here->left_, key);
        balanceRight(here);   // Right side might be too light, so balance it
    } else {
        nodeInsertWB(here->right_, key);
        balanceLeft(here);    // Left side might be too light, so balance it
    }
}

If you contrast this code with the code you wrote for ROOT insertion in Phase 5, you'll see that the basic structure is similar. While that code always performed a rotation after inserting into a subtree, this code only sometimes performs a rotation if the subtree is unbalanced.

  • Goat speaking

    Meh. You've just pushed all the hard work into those balanceLeft and balanceRight functions, haven't you?

  • LHS Cow speaking

    True. But it's not that bad…

Balancing Functions

Here's the rest of the code:

/*
 * Support for weight-balanced tree insertions, based on
 * Hirai, Yoichi, and Yamamoto, Kazuhiko, “Balancing Weight-Balanced Trees.”
 * Journal of Functional Programming 21, no. 3 (2011): 287–307.
 * https://doi.org/10.1017/S0956796811000104.
 *
 * Implemented by Melissa O'Neill, 2025.
 *
 * The code in the original paper is for Haskell in a functional programming
 * style.  This code is mostly a direct translation to C++ with mutation.
 * The paper describes the rules governing the constants DELTA and GAMMA,
 * which must be carefully chosen to guarantee logarithmic height.
 *
 * The main differences from the original paper are:
 *  - isBalanced is renamed to isDeltaBalanced
 *  - isSingle   is renamed to isGammaBalanced (and the arguments flipped)
 *  - rotateR and rotateL are replaced by balanceRight and balanceLeft,
 *    which add a balance check before performing rotations.
 *
 * The code only supports integer values for DELTA and GAMMA, whereas the
 * paper allows rational numbers.  This would be easy enough to add, but
 * the integer values perform well in practice and are widely used in real-
 * world weight-balanced tree implementations.
 *
 * Note, that there are other weight-balanced tree schemes in the literature,
 * but they often make the scheme seem more complicated than it needs to be,
 * with floating-point balance factors, or multiple cases for balancing. The
 * WikiPedia page on weight-balanced trees shows some interesting algorithms
 * that can be used to merge and split weight-balanced trees, but fails to
 * provide simple and elegant insertion code (see
 * https://en.wikipedia.org/wiki/Weight-balanced_tree).
 */

template <typename T>
bool TreeSet<T>::isDeltaBalanced(const Node* a, const Node* b) {
    size_t weightA = nodeSize(a) + 1;
    size_t weightB = nodeSize(b) + 1;

    return (DELTA * weightA) >= weightB;
}

template <typename T>
bool TreeSet<T>::isGammaBalanced(const Node* a, const Node* b) {
    size_t weightA = nodeSize(a) + 1;
    size_t weightB = nodeSize(b) + 1;

    return (GAMMA * weightA) > weightB;
}

template <typename T>
void TreeSet<T>::balanceRight(Node*& node) {
    if (isDeltaBalanced(node->right_, node->left_)) {
        // Tree is still balanced
        return;
    }
    // like rotateR from Hirai & Yamamoto
    if (!isGammaBalanced(node->left_->left_, node->left_->right_)) {
        // first part of double rotation
        rotateLeft(node->left_);
    }
    // finish rotation
    rotateRight(node);
}

template <typename T>
void TreeSet<T>::balanceLeft(Node*& node) {
    if (isDeltaBalanced(node->left_, node->right_)) {
        // Tree is still balanced
        return;
    }
    // like rotateL from Hirai & Yamamoto
    if (!isGammaBalanced(node->right_->right_, node->right_->left_)) {
        // first part of double rotation
        rotateRight(node->right_);
    }
    // finish rotation
    rotateLeft(node);
}

To use this code, you'll need to

  • Change the name of the main insertion helper function to match the other insertion helper functions in your code (i.e., possibly rename nodeInsertWB).
  • Adjust the names of data members (e.g., value_, left_, right_, size_) to match your existing code.
  • Add the constants DELTA and GAMMA to your class definition; for example,

    static constexpr size_t DELTA = 3;
    static constexpr size_t GAMMA = 2;
    
  • Add all these functions as private member functions of your TreeSet<T> class template.

  • Extend the bst_style enum to include a WEIGHT_BALANCED value.
  • Add a case to your public insert function to call nodeInsertWB when the insertion style is WEIGHT_BALANCED.
  • Ensure that attribution for the code is preserved in your code comments.

Testing Your Weight-Balanced Insertion

Given this known working implementation, instead of extending your test cases you can extend minispell.cpp to allow testing weight-balanced insertion. Add these lines to the parseCommandLine function's options:

        } else if (option == "-w" || option == "--weight-balanced") {
            options.insertScheme = WEIGHT_BALANCED;

(For completeness, you can also add the option to the help message if you like.)

Now you can run minispell with the -w option to use weight-balanced insertion. You can compare the resulting tree statistics (height, average depth, etc.) against those produced by your other insertion schemes to see how well weight-balanced insertion performs.

Say What You Did…

Don't forget to include a description of what you did in your Fun.md file and any observations you made about the performance of weight-balanced insertion compared to your other insertion schemes!


  1. In fact, Nievergelt and Reingold define things in terms of \( \alpha \), where \( \alpha = 1 / \Delta \), and their second constant is defined as \( (1 - \alpha) / (1 - 2\alpha) \) and never given an explicit name. We've used the names used by Hirai and Yamamoto for clarity. 

(When logged in, completion status appears here.)