Phase 5: insert, count, contains, and printToStream
In this phase you will implement the core functionality of your HashMultiset—insert, count, and contains. When this phase is complete, you'll actually be able to use your HashMultiset. You'll also write the printToStream function that can help you see what's going on inside your hash table.
For now, we won't be rehashing the table. Instead, users (and your tests) can specify a desired number of buckets to use for the hash table.
For reference:
User-Provided myhash Function
Recall from the specifications that your HashMultiset should assume that no matter what the key type T is, there is a function size_t myhash(const T&) that returns the hash value of a key.
So when you need to get a hash value, you should simply call myhash on the key.
Note that myhash can (and, for a good hash function, should) use the full range of size_t, so the hash values are very likely to be larger than the number of buckets.
Let's Go!
Also check out the Helpful Hints before you get going!
For this part, you will plan, write tests for, implement, and debug
insertcountcontainsprintToStream
Planning Image
Within the Planning directory, add your planning image, named Phase_5.jpg.
Core Functionality: insert, count, and contains
See the lesson materials from Week 12, Lesson 1 for how separate chaining works. But remember that unlike the hash tables we saw in the lessons, HashMultiset doesn't just store keys, it also stores a count for each key.
In particular, note the implications this has for insert: unlike a regular set, calling insert multiple times with the same key meaningfully changes the HashMultiset! For example, suppose mySet is an empty HashMultiset. If you call mySet.insert("milk");, then mySet.count("milk"); should return 1 (that is, "milk" got inserted with an initial count of 1). But then if you again call mySet.insert("milk");, then mySet.count("milk"); should return 2 (that is, the count for "milk" got incremented).
Displaying the Hash Table with printToStream
Here's an example of the output from printing a 10-bucket hash table with 14 elements:
Is it a bug that bucket 7 is empty when other buckets have three items in them?
No, nothing's wrong—that's just random distribution across buckets in action. If they'd all ended up in one bucket, we might be suspicious about the hash function.
[0] bacon:1
[1] sausage:12, avocado:9
[2] waffles:3
[3] toast:17
[4] pancakes:20
[5] cheese:2
[6] juice:5, coffee:1, eggs:2
[7]
[8] tea:6, steak:9, tofu:12
[9] beans:20
The algorithm for displaying the hash table is as follows:
foreach bucket: // i.e., loop over all buckets in array-index order PRINT “[” PRINT bucket index PRINT “] ” // n.b., the trailing space is always present, even for empty buckets. PRINT each of the key-count pairs in the bucket, formatted as key:count, separated by “, ” // i.e, a comma and a space. // Notice that there is no comma and space after the last item, only between items. PRINT a newline // i.e., “\n”.
Can my list of items end with a comma, as that would be easier to code?
No. Commas separate items, so there can't be a comma after the last item.
There are many ways to write this code with some degree of elegance. One option is to have a flag of some kind to know if you're printing the first item, a subsequent item, or the last item, but there are plenty of other ways to do it.
Once printToStream is written, you can print hash tables using the << operator as it will call printToStream.
Helpful Hints
insert's Behavior Changes Depending on Whether the Key is Already Present
The first time you insert a specific key (like "milk") into the hash table, you will need to initialize a new KeyCount for that key with an initial count of 1. But the next time(s) you insert that same key, you should not create a new KeyCount, because there is already a KeyCount for that key. Instead, you should fetch the previously-created KeyCount for that key, and increment its count.
Strong Suggestion: Helper Function for Inserting KeyCounts
You could handle the logic of inserting new KeyCounts (when needed) directly in the body of insert. But subsequent phases will be much easier if you instead write a separate private helper function that takes a KeyCount as a parameter and inserts it into the table. With this approach, insert still does the checking to see whether the key already exists in the table. But if the key doesn't exist, insert will initialize a new KeyCount and then call the private helper function to insert that KeyCount. Subsequently, the private helper function does not need to check again whether the key already exists (because insert already checked). This private helper function will come in useful later!
Also, when this helper function inserts a KeyCount, it should do so using push_front.
Avoid Repeated Logic in count and contains
In your planning, you should find that count and contains work very similarly (they really only differ in their return type). You should avoid copy-pasting code, or otherwise duplicating logic, when implementing these two functions. One approach to avoid repeated code is to factor out shared logic into a private helper function that gets called by both count and contains (but there are perhaps other approaches as well).
Pick a Simple Hash Function for Testing
For testing purposes, you get to pick the hash function! In hashset-test.cpp, you can define myhash for whatever type(s) you want to test with in the place marked by the comment.
Note that a good hash function gives random-looking values that appear to have no obvious connection to the hashed value. But testing is all about easy-to-follow behavior! So you may not want to use a good hash function when testing.
Instead, you may want to use simple hash functions that cause predictable behavior. We'll discuss this more when you implement the functions that measure the performance of the table.
For now, just know that you don't need to find a fancy, well-behaved hash function. Something very simple that would be a terrible choice for real usage is enough to test the relationships between insert, exists, and the other functions you've already implemented.
Define myhash Before You Include HashMultiset
If you test with any of the built-in classes of C++ that live in the std:: namespace (such as std::string), your myhash function must be defined before you include hashset.hpp. This requirement is a technical limitation: because myhash(string) is in the top-level namespace, and std::string is in the std namespace, the compiler will not find myhash if you define it after you define HashMultiset.
We've left a comment in hashset-test.cpp that shows where you should define your myhash function(s).
Testing Using printToStream
If you know the hash function, the number of buckets, and ensure that the final load factor is less than the maximum load factor, all hash-table implementations must print out a hash table that looks the same. So, for example, if you knew that your hash table should produce the output shown above, you could check that output against the string
"[0] bacon:1\n[1] sausage:12, avocado:9\n[2] waffles:3\n[3] toast:17\n[4] pancakes:20\n[5] cheese:2\n[6] juice:5, coffee:1, eggs:2\n[7] \n[8] tea:6, steak:9, tofu:12\n[9] beans:20"
That's a long line!
You can break a string in the middle by closing the quotes and opening them again on the next line.
clang-formatdoes this for you automatically!
Additional Sanity Check: hashmultiset-cow-test.cpp
With these functions implemented, you should be able to successfully compile and run hashmultiset-cow-test.
It's a very simple test (it basically just inserts something and checks to see if it exists in the table). But the key is a Cow class rather than a built-in type, so running this test checks to make sure that your table treats its keys properly. If your table performs operations on keys that it shouldn't, you'll probably get compiler errors.
(When logged in, completion status appears here.)