CS 70

Phase 8: Using Your Search Tree

To round out our assignment, we'll use our TreeSet class template as an actual dictionary of English words.

We've provided a source file, minispell.cpp, which uses TreeSet<std::string> to provide a proof-of-concept for a rudimentary spelling checker. It's basically the same program you saw in Homework 6, but with some additional options to control insertion style.

Let's Go!

Adapt Your Makefile to Build minispell and Build It

Adjust your Makefile so that you build the program minispell (the executable will link just minispell.o since TreeSet is now a class template), and create this executable.

It will help if your Makefile has an OPTFLAGS= variable, as you then can set that variable from the command-line to easily have make build everything with optimization, as, for example,

make clean
make OPTFLAGS="-O3 -flto" LDFLAGS="-flto" minispell

or, if your LDFLAGS variable in your Makefile includes other libraries, you should instead run

make clean
make OPTFLAGS="-O3 -flto" LDFLAGS="-lsomelibraryineed -flto" minispell

where you replace -lsomelibraryineed with whatever else you need to link (because specifying LDFLAGS= on the command line overrides whatever is in your Makefile).

Run minispell

If you run the program with the --help argument, it will produce a usage message:

> ./minispell --help
Usage: ./minispell [options] [file-to-check ...]
Options:
  -h, --help             Print this message and exit.
  -f, --file-order       Insert words in the order they appear (default).
  -s, --shuffled-order   Insert words in random order, by shuffling the input.
  -b, --balanced-order   Insert words in a balanced order
  -r, --random-insert    Use a tree with RANDOMIZED insertion style (default)
  -t, --root-insert      Use a tree with ROOT insertion style
  -l, --leaf-insert      Use a tree with LEAF insertion style
  -S, --use-std-set      Use std::set instead of TreeSet
  -n, --num-dict-words   Number of words to read from the dictionary.
  -m, --num-check-words  Number of words to check for spelling.
  -d, --dict-file        Use a different dictionary file.

Default dictionary file: /cs70/data/smalldict.words
Default file to check:   /cs70/data/ispell.words

If you run it without arguments, you should see output similar to

> ./minispell
Reading words from /cs70/data/smalldict.words... done!
Inserting into TreeSet, using randomized insertion (in order read)... done!
 - insertion took 0.000517011 seconds
 - 341 nodes, height 20, average depth 8.78299
 - median word in dictionary: 'long'

Reading words from /cs70/data/ispell.words... done!
Looking up these words in the dictionary... done!
 - looking up took 0.00717264 seconds
 - 34831 words read, 325 in dictionary

Exploration

There are five dictionary files available in /cs70/data/ that you can use with this program:

smalldict.words
A small dictionary of 341 common English words.
ispell.words
The main dictionary used by the ispell spelling checker, containing 34,831 common words.
largedict.words
A larger dictionary of 281,478 English words, mostly taken from scrabble word lists.
web2
A dictionary of 234,937 words, originally based on some version of “Webster's Dictionary” and provided on most Unix systems.
all.words
The above dictionaries combined, for a total of 410,239 words.

Explore some of the other options to the program by doing some other runs. Also, feel free to run the program multiple times to see if/when the program output varies.

Interesting things to try include running

./minispell -r -d /cs70/data/all.words /cs70/data/web2

and contrasting the results with running

./minispell -S -d /cs70/data/all.words /cs70/data/web2

Questions

On Gradescope, you'll find some questions asking you to run this program and think about the results.

(When logged in, completion status appears here.)