Harvey Mudd College
Computer Science 60
Assignment 7, Due Friday, March 15, by midnight

Parsing

Here is an applet that will parse logical expressions and construct the resulting syntax tree in both text and graphical form (thanks to Chip Bradford!):

Parse-tree-building applet (works best under Netscape)

The purpose of this assignment is to gain experience in constructing a parser. The parser will parse a user-typed expression. (Bonus credit is available for creating an applet like the one above.)

Like writers of natural-language parsers, your goal is to parse input from a user in order to determine its meaning -- and, from there, to judge whether or not that expression is true. Unlike writers of natural-language parsers, your task is simplified in that you will only have to parse input expressions that are syntactically correct logical statements. In this assignment you will construct a program that converts logical expressions into syntax trees. In Assignment 8 you'll analyze the syntax tree to determine the validity of the original logical expression.

Submitting your code

For this assignment, you will need to create one file, named Parser.java. A skeleton file is available in /cs/cs60/as/a7 under the name of ParserMW.java (for MW's section) or ParserTTh.java (for TTh's section). Be sure to rename whichever file you copy to be Parser.java!.

To submit your Parser.java file, you will need to run

cs60submit Parser.java
from the directory in which that file is located.

Reading

This assignment is covered in Chapter 8 of the book. Since inheritance is relied on (as well as recursion), lots of the previous assignments' ideas will come into play. Most (hopefully all) of the material needed to do this assignment has or will be presented in class.

Testing your code

Test cases for your code for this assignment appear in the directory /cs/cs60/as/a7 . Also, the file to start from is there named /cs/cs60/as/a7/Parser*.java (* is either MW or TTh). You should be able to copy that file to your directory and compile and run it. It will parse OR expressions. (Note: we are using a common, if slightly counterintuitive, notation in which the plus sign '+' indicates OR and the multiplication sign '*' indicates AND. If you consider false/true values to be 0/1, the use of '+' as OR and '*' as AND will seem somewhat justified... .)

To test your java code, you can simply run it on the sample test files named /cs/cs60/as/a7/TestN.in, where N is replaced by the number of the test. You can compare the output to the answers in /cs/cs60/as/a7/TestN.out. For example,

% java Parser < /cs/cs60/as/a7/Test1.in
will show you the result of running the first test. The expected result is in the file /cs/cs60/as/a7/Test1.out. As usual, the diff command can be used -- if it finds no differences, your output has matched the expected output. That is,
% java Parser < /cs/cs60/as/a7/Test1.in | diff - /cs/cs60/as/a7/Test1.out
will return nothing (no differences) if your parser produces the expected output. Of course, copying the test files to your directory will save some typing above (you can get rid of the pathnames).

You may also copy /cs/cs60/as/a7/testall into your directory. Executing it will run all the tests on your parser.

Constructing the Parser

On execution your parser will present the user with a prompt

Type an expression > 
at which the user can input arbitrary logical expressions. Logical expressions consist of the symbols described in the grammar below (in the section The grammar to parse).

For example, the user might input

Type an expression > a + 1 * (bc' + a')
As a logical statement, this input would mean "a OR ( true AND ( NOT bc OR NOT a ) )". Note that variables can consist of more than one character though only lower-case, alphabetic characters with no whitespace are allowed -- things like bc in the previous expression. Constants must be 0 or 1 with 0 meaning false and 1 meaning true.

The parser should then create a syntax tree, which is an Object of type Expr, and then print out that tree with each operator in prefix form. Thus, the correct output in this case is

(OR (var a) (AND (const true) (OR (NOT (var bc)) (NOT (var a)))))
which represents the syntax tree

Functions and classes to include in your code

In a file named Parser.java, you will likely want to have at least the following methods (some are started for you):

See the grammar below for additional explanation of these different types of expressions. You may add other functions to your code as you see fit, but you need to include the ones listed above. All of the above functions (except main) return an object of type Expr. These objects are described next.

In addition to the above Parser class, you will need to create a hierarchy of classes that represent different kinds of expressions. All of these should derive from a common Abstract Base Class, Expr, provided in Parser*.java. (A class is abstract when it is not intended that objects in the class be constructed directly, but rather only by constructing objects in classes that inherit from the abstract class.) The derived classes (subclasses) will parallel the parsing functions listed above -- in particular, you will need to have VarExpr, ErrorExpr, and OrExpr are already written in the Parser*.java files. For each of these classes, you need to write a toString() method which provides a string representation of objects of the class, as shown in the list above. The signature of toString is
public String toString();
The strings on the left of the above list are meant as guides for what toString() should output. In addition, each of these derived classes from Expr must have a constructor and necessary data members. The examples OrExpr and VarExpr in the file /cs/cs60/a7/Parser*.java should guide you.

Tokenizing vs. Parsing

This assignment is primarily intended to familiarize you with implementing a "recursive-descent" parser, so called because the syntax tree is built through mutually recursive method calls -- one method per nonterminal symbol in the grammar. The raw material for parsing is a list of tokens taken from the user's input. Two different methods for tokenizing are provided, because of slight differences in each section's presentations. You may use either to help construct your parser.

MW section (Prof. Keller)

You will need to copy the files ParserMW.java and ParseFromString.class to your directory. Be sure to change the name of ParserMW.java to Parser.java. You will then be able to use the utilities skipWhitespace, nextCharIs, and peek, as discussed in class.

TTh section (Prof. Dodds)

You will need to copy the file ParserTTh.java. to your directory. Be sure to change the name of ParserTTh.java to Parser.java. The tokenizing code is provided for you, as discussed in class. If (at some point in the future) you need to construct a Java program that handles input from the terminal, the Tokenizer class is an example of one way to do this.

Both files will compile and run without any changes. (Try it!) However, they only parse a small subset of the full grammar you need to implement for this assignment.

The grammar to parse

The grammar specified below uses the following meta-symbols:


|   the vertical bar separates alternatives

{ } curly braces denote "0 or more repetitions of" what they contain

[ ] square brackets denote "0 or 1 repetitions of" what they contain,
       that is, an "option"

Table of symbols used in forming logical sentences:

Symbol Meaning
0 FALSE
1 TRUE
any group of letters a proposition variable
postfix ' NOT
infix * AND
infix + OR
infix > IMPLIES
infix = IFF


Order of precedence (from strongest- to weakest-binding) is: ' * + > =. Parentheses are used to enforce grouping.

The full grammar is the following:

    E -> I [ = I ]            Expression (or Equality expression)
    I -> S [ > S ]            Implication
    S -> P { + P }            Sum ("OR")
    P -> L { * L }            Product ("AND")
    L -> U { ' }              Literal (the ' is a single quote)
    U -> V | C | ( E )        Unit
    V -> A { A }              Variable name
    C -> 0|1                  Constant
    A -> a|b|c|d|e|f|g|h|i|j  Alphabetic character
        |k|l|m|n|o|p|q|r|s|t
        |u|v|w|x|y|z

Whitespace is allowed between any tokens, but not between letters in a variable name.

Never enough Java?

For optional bonus credit (up to 20%), turn your parser into an applet with functionality similar to the one linked at the top of this page. (Another parse-tree-creating example is available here. This example handles arithmetic expressions, rather than logical ones, but it has the attractive feature of updating the tree as it's typed, rather than waiting for the user to hit return. To submit the extra credit, simply use cs60submit as usual with the files needed and the html page that loads your applet.