Harvey Mudd College
Computer Science 42
Assignment 3, Due Monday, September 20, by 11:59pm

Unicalc with Error Bars and Graphs and Trees

This assignment has two problems and one extra credit option.  In problem 1 you will continue your development of the Unicalc application by adding error-propagation to your back-end functionality.  You're ready to start this problem right away. In problem 2 you will work with a graph to find the shortest path between two nodes.  You'll want to wait until after Thursdays lecture to start this problem.  The extra credit problem is to implement a game of 20 questions.  Again, you're ready to go on this one.  You don't have to do the extra credit of course, but it's a lot of fun and it'll give you practice with trees, which you will be responsible for understanding on the midterm and final exams.

Design, commenting, testing and formatting

The same guidelines apply as for homework 1 and homework 2.

Testing

A few tests are provided below, but as these problems become larger, it becomes increasingly important to create and use additional tests of your own -- not only for top-level functions, but for helper functions, as well.  For problem 1 and the extra credit, you do not have to include additional tests, but for problem 2 you should provide at least one additional test.  Again make sure these tests are clearly identified for the graders with comments in your code.

The Problems

Problem 1 [40 points]: Unicalc with Error Bars [Individual or Partner], submitted in a file named ucErrorProp.rkt

Supplied files

First, download the support files for this part of the assignment:

Background

One of the important things you will learn in some of your other classes (if you haven't already) is that unit calculations are rarely exact.  For example, consider the problem of calculating the average velocity of a unicyclist as she races across campus.  One way you might do this is to measure the distance between West Dorm and Hoch Shanahan, and you might even put tape marks on the ground.  Say this distance is 22.15m. Now, as soon as you see her cross the first tape mark, you start your watch.  When you see her cross the second mark, you stop your watch.  9.95 seconds it says.  How fast was she going?  Simple, right?  v = x / t = 22.1m/9.95s = 2.32 m / s.  A perfect problem for Unicalc!

But wait, not so fast... your friend was also timing her, and he says that she took 10.32 seconds.  Furthermore, your other friend re-measured the distance between the tape marks, and she says that the distance is actually 22.07m. Hmmm, things are getting a little more complicated...

As you may have already learned, whenever you take a measurement, there is some room for error in that measurement.  As scientists, we quantify this error, so that we can  reason rationally about it, rather than simply guessing about how it might affect our calculations.  

In the above problem, each measurement has its own error.  When you put them together to calculate the velocity, you need to somehow account for, or propogate, the error through the equation.  Fortunately there is a pricipled way to do this, and this is what you will build into Unicalc this week.  

The details of error propagation are beyond the scope of this class.  You'll learn about the theory behind error propagation in Physics and Chemistry.  (Though if you can't wait, you can take a look at this site.)   For now, all you need to know are the equations for error propagation for standard arithmetic operations.

Let's assume that you have two quantities, x and y, with a standard (i.e. normally distributed, random) error Ex and Ey.  For example, instead of knowing that x is exactly 22.1m, you estimate that x is 22.1m plus or minus 0.05m.  That is, x=22.1m and Ex=0.05m.  Again, how exactly you would calculate this standard error we won't deal with here.  We'll assume the errors are given.  

Now you wish to find the standard error of the result of performing some artimetic operation on x and y.  Call the result z, and the resulting error Ez.

Here are the rules for error propagation:

Sum/difference:
z = x + y OR  z = x - y
Ez = sqrt( Ex^2 + Ey^2 )

Product/quotient
z = x*y  OR z = x / y
Ez / z = sqrt( (Ex / x)^2 + (Ey / y)^2 )

Power
z = x^m
Ez / z = sqrt( (m*Ex/x)^2 )

Note that error propagates the same way for addition and subtraction and also for multiplication and division.   Also, note that the error associted with x^2 CANNOT be calculated by applying the multiplication rule on x*x.  This is because the multiplication rule assumes that the two quantities (and their errors) are independent.  This is obviously not the case for x compared to itself.

Your Task

Extend your unicalc functions from assignment 2 to include error propagation.  This will involve the following:

1. Extend your data representation to represent Uncertain Quantities, that is, to include a value and a a standard error term in addition to the units in the numerator and denominator.   It's up to you how you represent the data, but you should support the following functions:
Here are some tests for the above functions.  Note that no tests are provided for make-QL or make-UQL because it doesn't matter how you represent your Quantities, as long as the accessor functions return the proper values.  You do not have to provide any additional tests for these functions.

(define Q1 (make-QL 3.3 '(second) '(meter meter)))
(define Q2 (make-UQL 4.23 0.23 '() '(kilometer watt)))
(test (get-quant Q1) 3.3)
(test (get-quant Q2) 4.23)
(test (get-error Q1) 0)
(test (get-error Q2) 0.23)
(test (get-num Q2) '())
(test (get-den Q1) '(meter meter))

2. Modify all of your unicalc functions from Assignment 2 (slightly modified, see below) so that they handle error propagation correctly.  As a reminder, here are the functions from assignment 2.

Function Call Form Meaning
(normalize-unit Unit) Returns a normalized Uncertain Quantity for a single unit.
(normalize Quantity) Converts any Uncertain Quantity to a normalized Uncertain Quantity.
(multiply Quantity1 Quantity2) Multiplies two Uncertain Quantities, returning an Uncertain Quantity.  Note: This week you do not need to assume the Quantities are normalized, and you should return a normalized quantity only if the inputs were already normalized.   However, your units should be simplified (canceled). 
(divide Quantity1 Quantity2) Divides Uncertain Quantity1 by 
Uncertain  Quantity2, returning an Uncertain Quantity.  Note: This week you do not need to assume the Quantities are normalized, and you should return a normalized quantity only if the inputs were already normalized.  However, your units should be simplified (canceled). 
(add Quantity1 Quantity2) Adds Uncertain Quantity1 to Uncertain Quantity2, returning a normalized Uncertain Quantity, provided the quantities are interconvertible. Returns (error "illegal add" Quantity1 Quantity2) value if not.   Note: This week you should not assume the inputs will be normalized; however, you can (and should) feel free to normalized the inputs within the add function.  The output should always be normalized.  
(subtract Quantity1 Quantity2) Subtracts normalized Uncertain Quantity2 from normalized
Uncertain  Quantity1, returning a normalized Uncertain Quantity, provided the quantities are interconvertible. Returns (error "illegal subtract" Quantity1 Quantity2) value if not.  Note: This week you should not assume the inputs will be normalized; however, you can (and should) feel free to normalized the inputs within the subtract function.  The output should always be normalized.  
(power Quantity1 p) Raises Uncertain Quantity1 to the integer power p returning a normalized Uncertain Quantity.  You may assume that p will be an integer (though it may be positive, negative or 0).  Note: This week you do not need to assume the Quantities are normalized, and you should return a normalized quantity only if the inputs were already normalized.   However, your units should be simplified (canceled). 

You may extend your own functions from last time, or you may use our solutions, which are provided at the top of this problem.

Test files are also provided at the start of this problem.  Be sure your functions work on all of the tests provided last week as well as all new tests for this week.  You do not have to provide additional tests for this problem.

Problem 2 [60 points]: Graphs and Direction Finding [Individual], submitted in a file named shortestPath.rkt

Graphs are an extremely important data structure in computer science.  As we discussed in class, they can represent any number of things, but a popular use of graphs is to represent cities on a map.  Used in this way, an important ability is to find the shortest path from one node in the graph (city) to another.

Write a function (min-path a b G), which takes in two nodes a and b and a directed, positively-weighted graph G. Graph G will be represented as a list of edges: (src dst weight). See Gt, below, for an example.

Your min-path should return a list whose first element is the minimum distance from a to b in G and whose remaining elements list the nodes, in order, on the minimum path from a to b. The minimum path from a to itself is simply a. If no minimum path exists, omit the nodes. If more than one equal-cost minimum path exists, your function may return any one of them. (We will only test your code on goals and graphs offering a single minimum path.) As examples:

(define Gt '((e b 100) (a b 25) (e a 42) (a c 7) (a d 13) (a e 15)
(b c 10) (b d 5) (d e 2) (b e 100) (c d 1) (c e 7)) )
(test (min-path 'a 'e Gt) '(10 a c d e))
(test (min-path 'e 'b Gt) '(67 e a b))
(test (min-path 'd 'd Gt) '(0 d))
(test (min-path 'a 'z Gt) '(+inf.0)) ;; z's not there!

To help sanity-check your tests, here is a picture of Gt:

Feel free, also, to use a subgraph for debugging.

For this problem, you should also test on at least 1 other pair of nodes from Gt.  Include this test in your code with a comment that clearly identifies it as your additional test.

Hints

  1. Break this problem down into many subfunctions.  Try to do as little as possible with each subfunction
  2. As suggested in class, maintain a list of "paths so far" complete with their costs.  Keep this list in order from smallest cost to largest cost.  Always continue exploring from the first path in the list.  You might find it helpful to actually store nodes in reverse as you compute the path, and simply reverse them at the end once you find the complete path.

Extra Credit [up to +20 points]: Trees [Individual], submitted in a file named twentyQ.rkt

Although this question is optional, you will be responsible for understanding and working with trees, so at least looking at it is strongly recommended.

20 Questions!

This problem asks you to implement the game of 20 questions using trees.

First, to give you a sense of the game of 20 questions, here is an example of a run in Racket. The user's input is in blue
> (twenty-questions)

Is it bigger than a breadbox? (y/n) y
Is it a computer scientist? (y/n) n
What was it? a chemist
What's a question distinguishing a computer scientist and a chemist? Do you titrate?
And what's the answer for a chemist? (y/n) y
Play again? (y/n) y
Is it bigger than a breadbox? (y/n) y
Does you titrate? (y/n) y
Is it a chemist? (y/n) n
What was it? Wally Wart
What's a question distinguishing a chemist and Wally Wart? Is it concrete?
And what's the answer for Wally Wart? (y/n) y
Play again? (y/n) y
Is it bigger than a breadbox? (y/n) y
Do you titrate? (y/n) y
Is it concrete? (y/n) y
Is it Wally Wart? (y/n) y
I got it!
Play again? (y/n) n

Features

Your twenty questions should allow users to

Many of the questions can be yes/no questions. You are welcome to use the askyn function below to handle those. In particular, your game should always allow y to represent a "yes" answer. Also, n should always be allowed to represent a "no" answer. This will make the submissions simpler for the graders to try out!

Representation

For consistency, your game should always start with the question Is it bigger than a breadbox? and should have the objects spam and a computer scientists available as guesses if the user answers the initial question with a n or a y, respectively.

You are welcome to represent the game's data however you like. However as a guide to one possible representation (the one used for the example program demoed in class), you might consider a tree in which internal nodes are questions and leaf nodes are objects, e.g.,

(define initialTree '("Is it bigger than a breadbox?" "spam" "a computer scientist"))
Strings are likely the simplest data structure for questions and objects. The function string-append, along with the other string functions noted on the Scheme reference card may be of use.

Possible decomposition

As with the data structure, the choice here is yours. As one possible suggestion to get you started (if you'd like), consider the decomposition of the demo version, which included

  1. A function named (tq-once tree) that plays the game of 20 questions, as described above, and then returns the augmented tree after one complete round of the game. If the computer guesses the object correctly, it would simply return the same tree it was originally passed.
  2. A function named (tq-continuer tree) that calls tq-once in order to run one round of the game. When finished, this function should ask if the user wants to play again and take the appropriate action.
  3. A function named (twenty-questions) that calls tq-container using the default tree named initialTree.

Input:

Here are two functions demonstrating how to use (read-line) to grab a line of user input. The first tests the resulting value (which will be a string). The second is a more general-purpose question-asking function:

(define (askyn Q) ;; asks a yes/no question Q and returns #t or #f
(begin
(display Q)
(display " (y/n) ")
(if (equal? "y" (read-line))
#t
#f)
))


(define (ask Q)
(begin
(display Q)
(read-line)))

Hints:    (feel free to disregard!)

The binary tree used in this game is slightly different than the numeric ones used in last week's HW. In this case, the leaves are not empty, but contain possible objects; internal nodes contain questions. After the first round in the example above, the tree's structure would be

("Is it bigger than a breadbox?" "spam" ("Do you titrate?" "a computer scientist" "a chemist"))
Note that, by our convention, "no" traverses to the left and "yes" to the right.

A crucial facet of the game is that (tq-once tree) must return a valid twenty-questions tree from all of the different conditional branches it handles. That returned tree will be augmented by an additional question and an additional object if the previous run did not guess the object correctly. If it did guess the object correctly, the original tree will be returned.

The tq-continuer function will want to give a name to the value of that new tree (using let is one way to do this). It will then ask the user whether they would like to continue and use that new tree as appropriate.

As with any large functional program, the key is to break up the tasks into manageable chunks, write functions that handle those pieces, and then compose them into a solution. Write a number of helper functions that will keep your code succinct and straightforward.