CS60 final fall 99

6. (Grammars, Parsing, and Regular Expressions) [40 points total]

Consider the following grammar for a subset of the regular expressions over {0, 1}, where S is the starting nonterminal symbol:

  S -> R { '|' R }

  R -> T { T }

  T -> U { '*' }

  U -> 0 | 1 | '(' S ')'

As usual, { } means 0 or more repetitions of what is inside.

(A) (10 points) Write the regular expression, producible from the grammar above, that matches exactly those strings accepted by the NeuroToast DFA in Problem 5(A). In other words, your regular expression should match only those strings yielding unsuccessful toasting runs.

Solution: (one of many)

  (0|1)* (00|01*01*01*0) (0|1)*

(B) (10 points) Create the derivation tree generated by the grammar above in producing the regular expression you wrote in part (A).

Solution:

  The full tree is quite large. As a sample, the
  derivation tree of the subexpression (0|1)* is the following:
  
                              S
			      |
			      |
			      R
			      |
			      |
			      T
			     / \
			    /   \
			   U     *
			  /|\
			 / | \
		       '(' S ')'
		          /|\
		         / | \
		        R '|' R
		        |     |
		        |     |
		        T     T
		        |     |
		        |     |
		        U     U
		        |     |
		        |     |
		        0     1

(C) (20 points) Sketch how a recursive-descent parser would be constructed for this grammar, giving as much detail as needed to be convincing. Your sketch may include reasonable functions, such as those used in the parsing assignment. For example, peek() returns the next character in the input string without moving the internal pointer. (Thus, sequential calls to peek() return the same character.) Also, nextChar() returns the next char in the input string and moves the internal pointer past that char; nextCharIs(char value) checks if the next character is the same as "value" and, if so, removes it from the input and returns true. Don't worry about handling whitespace.


Expr S()
{
  Expr e = R();
  while (nextCharIs('|')) {
    e = new SExpr(e,R());
  }
  return e;  
}

Expr S()
{
  Expr e = T();
  while (nextCharIsLegal()) {
    e = new RExpr(e,T());
  }
  return e;  
}

Expr T()
{
  Expr e = U();
  while (nextCharIs('*')) {
    e = new TExpr(e);
  }
  return e;  
}

Expr U()
{
  if (peek('0') || peek('1')) {
    Expr e = new UExpr(nextChar());
  }
  else if (nextCharIs('(')) {
    Expr e = S();
    if (peek(')') { nextChar(); }
    else { e = new ErrorExpr("parenthesis missing"); }
  }
  else {
    Expr e = new ErrorExpr("unrecognized character");
  }
  return e;  
}

(D) (15 points) Extra credit Show that the expressions that the above grammar produces is not, itself, a regular language. That is, show that no DFA accepts the language of strings producible by the above grammar.

Solution:

  Because arbitrarily deep nested parentheses are
  legal (parts of) strings according to this grammar,
  any DFA would have to accept strings of the form
  (((...(1)...)))
  with equal numbers of parens on each side.
  This would require (at least) one state for
  each possible number of matching parens. But this
  is impossible if the number of states must be finite,
  as in a DFA.