CS 60, Assignment 13

Computer Science 60
Principles of Computer Science
Fall 2007

Assignment 13: Language reflection and algorithm efficiency...

This assignment is due by Wednesday, December 12 at 11:59 pm.

IMPORTANT: Revised intructions for HW13 submission:

Do not submit HW13 through your dropbox on Sakai. Instead, go to the Assignments section of the Sakai site. There you will find one assignmnet: Homwork 13. Click on that assignmnet and follow the instructions for submission. You must attach a single file, which is a zip file called hw13.zip containing all of the files listed below.

In the first question of this assignment there is no programming - for those you should write up your answers (proofs or analyses) in a plain-text, pdf or Microsoft Word file and submit it as part1.txt, part1.pdf or part1.doc. For the second problem, you will submit 4 files: 3 code files (align.scm, Align.java and align.pl) and 1 text file (part2.txt). For the third problem, you should submit Wally.java. Again, zip all of these files into a single file called hw13.zip.

Watch out because Sakai only allows you to submit your assignment ONCE. So, IF YOU ARE TAKING A LATE DAY, you must let me know so that I can enable resubmission after you submit the Wally problem on 12/12.

Late days for this assignment...

You may use your remaining late days on this assignment, except for problem #3, the Wally problem. Since we will talk about that program on Thursday, December 13 in class, there are no late days available for that one.

Fast Multiplication! [40 Points]

In this problem we consider the problem of multiplying two n-bit numbers, x and y. For simplicity, we will assume that n is a power of 2. This is generally the case in any real-world computer architecture, but it's also true that relaxing this assumption does not change the results of the analysis.
- Explain briefly why the running time of the standard multiplication algorithm -- the one learned in grade school, but in base 2 instead of base 10, has a running time which is O(n^2).
- Professor Dee Videnconker has proposed the following alternative approach to multiplying two n-bit numbers x and y.
  
  Notice that x has n/2 "upper bits" which we'll call "a" and n/2 "lower bits" which we'll call "b". Similarly, dividing y in two parts gives us the upper n/2 bits, which we'll call "c", and the lower n/2 bits, which we'll call "d". So, x = a 2^{n/2} + b and y = c 2^{n/2} + d. What's going on here? Notice that multiplying "a" by 2^{n/2} "shifts" it by n/2 positions. The number "a" by itself is a n/2 bit number. After the multiplication by 2^{n/2}, the bits of "a" have been "promoted" so that we now have a n-bit number in which the first n/2 bits are the bits of "a" and the last n/2 bits are all 0's. Now, adding "b" to this gives us a n-bit number in which the first n/2 bits are "a" and the second n/2 bits are "b". That is, we now have written x in terms of "a" and "b". We do the analogous thing for expressing y in terms of "c" and "d".
  
  So, back to Professor Videanconker's algorithm! The algorithm works like this: Divide x into two n/2-bit numbers "a" and "b" and similarly divide y into two n/2 bit-numbers "c" and "d". Now, xy = (2^{n/2}a + b)(2^{n/2}c + d) = 2^{n}ac + 2^{n/2}(ad + bc) + bd. We compute "ab" using a recursive call (now on numbers that are n/2 bits long!) and similarly we compute "ad", "bc", and "bd". After the recursive calls are complete we do the multiplications by 2^{n} and 2^{n/2} followed by some addition.
  
  Notice that multiplying by 2^{k} is just the same as adding k 0's to the end of the number. This can be done in k time steps. Similarly, adding two j-bit numbers can be done in j time steps.
  1. Write down a recurrence relation for this algorithm.
  2. Solve the recurrence relation using the recurrence-unwinding method. Show your work in complete detail. Give your running time in big-O notation. Your final answer should be O(n^k) where k is some real number.
  3. How does the running time compare to the standard algorithm for multiplying two n-bit numbers?
  4. That wasn't great! To do better, some fast multiplication circuits use the following more clever divide-and-conquer algorithm. Again, divide x and y in half. Thus, x = a 2^{n/2} + b and y = c 2^{n/2} + d where a, b, c, d are n/2-bit numbers. The product of x and y, call it z, can now be computed by the following steps:
```
 temp1 = (a+b)(c+d)
 temp2 = ac
 temp3 = bd
 z = temp2 2^{n} + (temp1 - temp2 - temp3) 2^{n/2} + temp3
 
```
    Nothing for you to do here! This is just the description of the algorithm.
  5. Explain why z is in fact the product of x and y.
  6. Observe that the terms (a+b) and (c+d) may have either n/2 or n/2 + 1 bits each. For simplicity, assume for now that they each have exactly n/2 bits (the case of n/2 + 1 bits isn't hard to deal with, but let's not worry about it here). Notice that the above algorithm performs 3 multiplications of n/2 bit numbers plus some additions and shifts. (Multiplying a binary number by a number of the form 2^k can be accomplished by simply performing k bit shifts to the left! This takes just k units of time.) Now, just write down the recurrence relation (an expression for T(n), including a base case) for this algorithm.
  7. Find the big-O running time of the divide-and-conquer algorithm by unwinding this recurrence relation and then obtaining a solution. For full credit, be sure to express your final answer as O(n^k) where k is some real number.
  8. How does this algorithm's running time compare to the running time of the standard O(n^2) algorithm?
Language Comparison [35 points]

The problem you will address is the problem we looked at on the first day of class: the problem of DNA sequence alignment (taken from a homework assignment by Sedgewick and Wayne: http://www.cs.princeton.edu/courses/archive/fall07/cos126/assignments/sequence.html). Follow the link and read the assignment up through the part where it talks about how to measure the distance between to sequences. Don't worry about the dynamic programming part (although you might want to read it as you might find it helpful in general... with other computational problems). We will not implement an algorithm for finding the minimum edit-distance alignment here. We will only implement the edit-distance function for two given lists of DNA sequences.

Part 2.1: Implementation (15 points)

Implement the following:

In Scheme, in a file called align.scm, implement:
(edit-dist L1 L2)
which is a function that returns the edit distance between the sequences defined by L1 and L2, two lists of genetic information. Blanks will be represented with the symbol 'blank, and genes will be represented by alphabetic symbols. For example:
>>(edit-dist '(A A C A G T T A C C) '(T A blank A G G T blank C A))
7
In Java implement a class called Align (in a file with the same name .java) with one method:
public int editDist(char[] seq1, char[] seq2);
This function should again return the edit distance between seq1 and seq2. Blanks will be represented with a #. E.g.,
editDist( {'A', 'A', 'C', 'A', 'G', 'T', 'T', 'A', 'C', 'C'}
{'T', 'A', '#', 'A', 'G', 'G', 'T', '#', 'C', 'A'} );
Should return 7.
In Prolog, in a file called align.pl, write a predicate editDist(L1, L2, X) that is true when X represents the edit distance between L1 and L2. X should be able to be a variable, but L1 and L2 will always be bound. Blanks will again be represented with the literal symbol blank. E.g.,
editDist([a, a, c, a, g, t, t, a, c, c], [t, a, blank, a, g, g, t, blank, c, a], X).
X = 7 ;
No.

L1 will never contain blanks
L2 and L1 will always be the same length
Avoid "magic constants" and maintain good programming syle.

Part 2.2: Relection (20 points)

After you have written your code, imagine that you wanted to implement a complete solution to the sequence alignment problem (i.e., find the minimum edit-distance alignment). You don't actually have to implement anything, just think about how you would implement it. Also, you don't have to implement a fast algorithm--brute force is fine.

Now, choose the language that you feel would be best suited for this implementation, and write a short essay (2-3 paragraphs) explaining and defending your choice of language. You should address your argument to your classmates--that is, students currently in CS60. Try to convince them that your choice of language is the best choice based on your experience implementing the functions above and your thoughts about how you will implement the rest of the program. Be clear about your evidence. We will grade you on how convincing your argument is, taking into account how clear your examples are, and how clean your writing is.

In your argument, you may wish to consider any or all of the following:

How will you store the data?
How will you generate possible alignments?
How readable is your code?
How simple is your code to write?
How easy is your code to debug and how likely is it to have errors

You specifically should not worry about running time in this problem. Turn in this writeup in a file called part2.txt.

Extra Credit on this part: If you actually do implement the full optimal alignment program you will receive 5 points extra credit. Place the extra credit code in the same file as the code for the required section, and include a comment at the top of the file and in your writeup file.

Wally Wends Westward... [50 Points]

This problem provides a chance to exercise your own algorithm-design skills, to estimate the big-O complexity of your algorithm, and then to try out your code to see how quickly it runs in practice on problems of increasing size.
The problem

Wally is unicycling back to Mudd (@ the southwest corner of the map) after spending the summer in New England (@ the northeast corner). A square grid has been overlaid on the map, with each cell containing a positive number -- namely, the number of strawberry-filled DonutMan donuts that Wall will need to consume in order to have enough energy to pass through that cell.

Your task is to maximize donut conservation. And for this problem specifically, you should write
- a class named Wally in a file named Wally.java
- a method whose signature is
```
public static int minDonuts(int[][] map)
```
  This method should return the minimum number of donuts needed to get from the NE corner of the input map to the SW corner.
The rest is up to you... the algorithm, the software structure (other classes, other meethods), and the testing you do.

Example

For example, here is an image of a 5x5 donut-consumption grid and the correct minimum answer. Note that the start cell will be in the final spot in the first row, or map[0][N-1], and that the destination cell is in the final spot of the first column, or map[N-1][0], where N is the length of a side of the square map.

Below it is the path that produced that answer:

The write-up

You should comment your code in order to explain how your algorithm works, as usual. However, you should also include a comment at the top of your file that includes
- a paragraph on the overall approach you took and the algorithm that resulted
- a sentence or two on how you tested your algorithm
- a sentence or two on the big-O running time of your algorithm, with justification.
- [optional] any modifications you made to speed the code up beyond algorithm design (or otherwise creative modifications you might have made to the code...)
You may cite any results from class to justify your claim to a particular big-O running time, but you may also need to supplement those results with additional analysis of your own.

For uniformity, we ask that everyone define the problem size, N, to be the length of ONE SIDE of the square donut-consumption grid.

Submission

Submit Wally.java as instructed above, under assignment #13.

Scoring

Your solution will receive full correctness credit if it runs in "reasonable" time (under ten seconds on a 6x6 grid) and it works correctly, i.e., it computes the true minimum donut cost. Slow correct solutions will receive almost all of the points (unless they are bogosort-slow). Incorrect solutions, fast or slow, will receive almost no points.

The other half of the credit will be for your algorithm deisgn (not speed, just thoughtfulness), explanation, program design, and commenting.

Extra Credit Options
- The need for speed We will measure the fastest algorithms to determine which has the best real running time for large input sizes. Fame, fortune, or glory (as well as bonus points) are all possibilities for exceptional entries...
- Elegance or eccentricity Unusual and creative extentions in terms of code functionality, formatting, or any other facet you might want to pursue are welcome. Please explain in your top-of-file comment what you've done! Prizes are identical as in the previous option.

ANOTHER OPTIONAL BONUS PROBLEM (20 Points)

Consider two regular expressions. Each of these RE's defines a language of strings. Alternatively, one might say that the REs "generate" or "accept" their respective languages. In every case, such a language must be regular.

For this problem, show how a "Regular-expression autograder" CAN be built. This regular-expression autograder should be a program that takes as input two RE's and determines whether or not the two RE's define the same language of strings.

We showed that such an autograder is not possible for Turing Machines, but amazingly, it is possible for RE's! You may cite any results from class to help with your construction.

Computer Science 60 Principles of Computer Science Fall 2007

Assignment 13: Language reflection and algorithm efficiency... This assignment is due by Wednesday, December 12 at 11:59 pm.

IMPORTANT: Revised intructions for HW13 submission:

Late days for this assignment...

Computer Science 60
Principles of Computer Science
Fall 2007

Assignment 13: Language reflection and algorithm efficiency...

This assignment is due by Wednesday, December 12 at 11:59 pm.