CS 70

Using Your Search Tree (and Written Questions)

To round out our assignment, we'll use our TreeStringSet as an actual dictionary of English-language words.

We've provided a source file, minispell.cpp, which uses TreeStringSet to provide a proof-of-concept for a rudimentary spelling checker. It reads in a dictionary file containing English-language words, and then another file of words, and looks up all the words in the second file against the dictionary and reports how many matches it found. This functionality is very rudimentary (it doesn't ignore punctuation, offer corrections, or ignore repeated misspellings), but it's sufficient to exercise much of the functionality of our TreeStringSet-based dictionary.

You aren't required to read over the code for this program, but you can if you like.

Let's Go!

Make Sure that Your Makefile Builds minispell

At the start of the assignment, you had to set up your Makefile to build the minispell program even though it didn't work yet. In this part of the assignment we're going to get it working.

It will help if your Makefile has an OPTFLAGS= variable, as you can then set that variable from the command-line to easily have make build everything with optimization, as, for example,

make clean
make OPTFLAGS="-O3 -flto" LDFLAGS="-flto" minispell

Run minispell

If you run the program with the --help argument, it will produce a usage message:

> ./minispell --help
Usage: ./minispell [options] [file-to-check ...]
Options:
  -h, --help             Print this message and exit.
  -f, --file-order       Insert words in the order they appear (default).
  -s, --shuffled-order   Insert words in a random order.
  -b, --balanced-order   Insert words in a balanced order
  -n, --num-dict-words   Number of words to read from the dictionary.
  -m, --num-check-words  Number of words to check for spelling.
  -d, --dict-file        Use a different dictionary file.

Default dictionary file: /cs70/data/smalldict.words
Default file to check:   /cs70/data/ispell.words
> ./minispell
Reading words from /cs70/data/smalldict.words... done!
Inserting into dictionary (in order read)... done!
 - insertion took 0.00075025 seconds
 - 341 nodes, height 340, average depth 170
 - median word in dictionary: 'long'

Reading words from /cs70/data/ispell.words... done!
Looking up these words in the dictionary... done!
 - looking up took 0.0177939 seconds
 - 34831 words read, 325 in dictionary

Exploration

The file smalldict.words contains the dictionary words in alphabetical order, which might seem convenient, but that ordering turns out to be problematic for our tree class.

Look at what changes when you run

  • minispell -s, which reads the same words, but shuffles them into a random order before inserting them into the tree.
  • minispell -b, which reads the same words, but figures out an insertion order that will create a very well-balanced tree.

Next, rerun minispell -s several times. See what changes between runs—pay particular attention to the height of the tree and the average depth.

Do the same with minispell -b.

The ./minispell executable has some additional options. If you give it a filename, it will check that file against the dictionary. You can also use a -d option to load a different dictionary file.

Now try

  • minispell -s -d ../../data/ispell.words ../../data/smalldict.words
  • minispell -b -d ../../data/ispell.words ../../data/smalldict.words
  • minispell -d ../../data/ispell.words ../../data/smalldict.words (you may need to be patient with this one!)
  • Rabbit speaking

    You can press Control-C to quit if you decide to give up waiting for the results on the last example.

Questions

On Gradescope, you'll find some questions asking you to run this program and think about the results.

(When logged in, completion status appears here.)