Harvey Mudd College
Computer Science 60
Assignment 7, Due Thursday, March 9, by midnight

Parsing

Here is an applet that will parse logical expressions and construct the resulting parse tree in both text and graphical form (thanks to Chip Bradford!):

( Parse-tree-building applet)

The purpose of this assignment is to gain experience in constructing a parser. The parser will parse a user-typed expression. (Bonus credit is available for creating an applet like the one above.)

Like writers of natural-language parsers, your goal is to parse input from a user in order to determine its meaning -- and, from there, to judge whether or not that expression is true. Unlike writers of natural-language parsers, your task is simplified in that you will only have to parse input expressions that are syntactically correct logical statements. In this assignment you will write a program that converts logical expressions into parse trees. In Assignment 8 you'll analyze the parse tree to determine the validity of the original logical expression.

Submitting your code

For this assignment, you will need to create (actually, only alter) one file, named Parser.java.

To submit your Parser.java file, you will need to run

cs60submit Parser.java

Be sure your file is named as indicated above, because the scripts that sort files into appropriate places depend on that name.

Be sure to input that the assignment number is 7.

Reading

This assignment is covered in Chapter 8 of the book. Since inheritance is relied on (as well as recursion), lots of the previous assignments' ideas will come into play. Most (hopefully all) of the material needed to do this assignment will be presented in class.

Testing your code

Test cases for your code for this assignment appear in the directory /cs/cs60/as/a7 . Also, the file to start from is there named /cs/cs60/as/a7/Parser.java. You should be able to copy that file to your directory and compile and run it. It will parse OR expressions. (Note: we are using a common notation (but not Java's) in which the plus sign '+' indicates OR and the multiplication sign '*' indicates AND. If you consider false/true values as 0/1, the use of '+' as OR and '*' as and seems somewhat justified.)

To test your java code, you can simply run it on the sample test files named /cs/cs60/as/a7/TestN.in, where N is an integer from 0 to 6. Then you can compare the output to the answers in /cs/cs60/as/a7/TestN.out.

% java Parser < /cs/cs60/as/a7/Test1.in | diff - /cs/cs60/as/a7/Test1.out
If you see nothing (that is, no "differences" occur), you know your code is producing the correct output. If you do see lines beginning with "<", they are output lines that your program printed , but do not appear in the answers file. Lines beginning with ">" appear in that file, but not your program's output.

Writing the Parser

On execution your parser will present the user with a prompt

Type an expression > 
at which the user can input arbitrary logical expressions. Logical expressions consist of the symbols described in the grammar below (in the section The grammar to parse).

For example, the user might input

Type an expression > a + 1 * (bc' + a')
As a logical statement, this input would mean "a OR ( true AND ( NOT bc OR NOT a ) )". Note that variables can consist of more than one character (though only lower-case, alphabetic characters with no whitespace are allowed). Constants must be 0 or 1 with 0 meaning false and 1 meaning true. The parser must create a parse tree (an Expr) and then print out that tree with each operator in prefix form. The correct output in this case is
(OR (var a) (AND (const true) (OR (NOT (var bc)) (NOT (var a)))))
which represents the parse tree

Functions and classes your code must include

In a file named Parser.java, include the following classes and methods:

Parser should have methods named

See the grammar, below, for an explanation of these different types of expressions. You may add other functions to your code as you see fit, but you need to include the ones listed above. All of the above functions (except main) return an object of type Expr. These objects are described next.

In addition to the above Parser class, you will need to create a hierarchy of classes that represent different kinds of expressions. All of these should derive from a common Abstract Base Class, Expr, provided in Parser.java. The derived classes (subclasses) parallel the parsing functions listed above -- in particular, you will need to create For each of these classes, you need to write a toString() method which provides a string representation of objects of the class. Its signature is
public String toString();
The strings on the left of the above list are meant as guides for what toString() should output. In addition, each of these derived classes from Expr must have a constructor and necessary data members. The examples OrExpr and VarExpr in the file /cs/cs60/a7/Parser.java should guide you.

Tokenizing vs. Parsing

This assignment is primarily intended to familiarize you with implementing a "recursive-descent" parser, so called because the parse tree is built through mutually recursive method calls -- one method per nonterminal symbol in the grammar. A Tokenizer class is provided. The example Parser.java file shows how it works in detail. The overall idea is the following.

Once you have constructed a Tokenizer, call the object t, the call

t.tokenizeNextLine()
returns a Stack of Strings that are the tokens. Your methods will be able to use that stack of strings, named tokens with the following Stack methods: The tokenizing code is provided for you. If (at some point) you need to write a Java program that handles input from the terminal, an example of one way to do so is in the Tokenizer class.

The grammar to parse

The grammar specified below uses the following meta-symbols:


|   the vertical bar separates alternatives

{ } curly braces denote "0 or more repetitions of" what they contain

[ ] square brackets denote "0 or 1 repetitions of" what they contain,
       that is, an "option"

Table of symbols used in forming logical sentences:

Symbol Meaning
0 FALSE
1 TRUE
any group of letters a proposition variable
postfix ' NOT
infix * AND
infix + OR
infix -> IMPLIES
infix = IFF


Order of precedence (from strongest- to weakest-binding) is: ' * + -> =. Parentheses are used to enforce grouping.

The full grammar is the following:

    E -> I [ = I ]            Expression (or Equality expression)
    I -> S [ -> S ]            Implication
    S -> P { + P }            Sum ("OR")
    P -> L { * L }            Product ("AND")
    L -> U { ' }              Literal (the ' is a single quote)
    U -> V | C | ( E )        Unit
    V -> A { A }              Variable name
    C -> 0|1                  Constant
    A -> a|b|c|d|e|f|g|h|i|j  Alphabetic character
        |k|l|m|n|o|p|q|r|s|t
        |u|v|w|x|y|z

Whitespace is allowed between any tokens, but not between letters in a variable name.

Never enough Java?

For optional bonus credit (up to 20%), turn your parser into an applet with functionality similar to the one linked at the top of this page. (Another parse-tree-creating example is available here. It handles arithmetic expressions, rather than the logical ones this assignment is concerned with.)