Computer Science 60
Principles of Computer Science
Spring 2012


Assignment 9: Computability and complexity    [70 Points]
Due Monday, April 9 at 11:59pm

This week we have a light assignment (only 70 points) and no programming! Instead, you will be analyzing the running time of algorithms. We are interested in worst-case analysis. The best case is often uninteresting and the average case can be quite difficult to quantify, because you have to know how likely each input is to occur. (For example, inputs to sorting algorithms are not always sorted or near-sorted, but these cases occur more often than random chance would suggest; not all input permutations are equally likely in practice.) You will need to express the worst-case running time using big-O notation. Be sure to show all of your reasoning in each written problem.


A note about partners: This week Part 1 should be done individually, though as always you can discuss your general approach with anyone else, especially the grutors and profs.  Problem 2 may be done individually or cowritten with a partner (with one submission, written up jointly using a single computer).

Part 1: Big-O running times...

30 points; to be done individually

Submitting these answers    Include your answers to these problems in a plain-text part1.txt file.


  1. What is the big-O running time of this portion of code, in terms of n? Explain your analysis clearly and carefully. (Notice how i is increasing!)
            for ( int i=1; i<=n; i *= 2 )
               for ( int j=0; j<i; ++j )
                  // constant time work here
            
  2. What is the big-O running time of this portion of code, in terms of n? Explain your analysis clearly and carefully. (Notice how j is increasing!)
            for ( int i=1; i<=n; ++i )
               for ( int j=1; j<i; j *= 2 )
                  // constant time work here
            
  3. Consider a directed graph represented as a prolog list of edges, with each edge a [source, destination] pair. For example, G = [ [a,b], [b,a] ] is a two-node graph, with each node connected to the other.

    The reachability problem asks whether it is possible to get from one node to another particular node within a given graph. The following is one possible Prolog solution, named reach, based on the use-it-or-lose-it pattern:

    reach(A,A,_).     % any node is reachable from itself
                      % below we ask you to explain what this next line is "saying"
    reach(A,B,[[X,Y] | R]) :- reach(A,B,R) ; (reach(A,X,R), reach(Y,B,R)).
            
    For example, here are two representative runs:
    ?- reach( b, a, [[a,b],[b,c],[c,d],[d,e],[e,f],[f,g],[g,a]]).
    Yes
    
    ?- reach( b, h, [[a,b],[b,c],[c,d],[d,e],[e,f],[f,g],[g,a]]).
    No
            
    Note that we are not concerned with generating any variable/value bindings here, we are only concerned with checking reachability for a fully-bound set of inputs.

    Briefly explain in English what the second reach rule is "saying". Then, write a recurrence relation for T(n), the running time of the above reach function in terms of n, the number of edges in the graph G. Then, use the recurrence-unwinding strategy to "unroll" this recurrence relation in order to find a big-O running time of this version of reachability. Count each logical operation ; (or) and , (and) as one step. Show all of your work!

Part 2: More Big-O running times...

40 points; may be done in a pair or individually

Suppose we have two files on disk. Each file contains n words (one per line), each file may or may not be in alphabetical order, and each file may or may not contain duplicates. We want to write a single file containing the "union" of the words in the input files. That is, the output file should contain the words that were in one or both input files, but it does not have to be in order. In the following, you may assume that reading or writing a single word in a text file takes constant time. All the collections discussed support iterating through the elements in O(1) time per element.
  1. Suppose we start with an empty HashSet<String>, read the two files and insert each word in turn. Once all the words have been read in, we iterate through the elements of the hashset and write each to disk. What is the asymptotic worst-case running time of the entire process? Carefully justify your answer.

    [Typically we assume that insert, lookup, and remove in a HashSet take O(1) time. If you take CS 70, you'll discuss the exceptions, although none of them are important here.]

  2. Suppose we do the same thing, but use a TreeSet<String> instead of a HashSet<String>. What is the asymptotic worst-case running time? Carefully justify your answer.

    [Balanced trees guarantee that any single insert, lookup, or remove in a tree with k elements takes worst-case time O(log k). Note that the time to insert each element changes as the tree grows.]

  3. Suppose we instead use an unsorted ArrayList<String>. For each word, we check to see if the word is in the (unsorted) arraylist already, and if not, add it at the end with .add(). Then we iterate through the arraylist and print each item. What is the asymptotic worst-case running time? Carefully justify your answer.

    [Recall that ArrayLists are growable arrays. They provide constant-time read and write access to the i-th element, and (for most purposes, as discussed in CS 70) the add() method also takes constant time.]

  4. Suppose we instead use a sorted ArrayList<String>. For each word, we check to see if the word is in the arraylist already (which, since the elements are sorted, we can do using binary search). If not, we insert it into the proper position (which requires shifting (copying) all alphabetically-later words up one position, similar to one step of insertion sort.) What is the asymptotic worst-case running time? Carefully justify your answer.

    [Recall that binary search starts in the middle of a sorted sequence; if we don't find what we're looking for, we know (by the sorting) whether the value we want should be in the first half or the second half, so we recurse. We can do binary search on k elements in worst-case time O(log k).]
Be sure to simplify your big-O running times as much as possible. Submit your work in a text file part2.txt