6. (Grammars, Parsing, and Regular Expressions) [40 points total]
Consider the following grammar for a subset of the regular expressions over {0, 1}, where S is the starting nonterminal symbol:
S -> R { '|' R }
R -> T { T }
T -> U { '*' }
U -> 0 | 1 | '(' S ')'
As usual, { } means 0 or more repetitions of what is inside.
(A) (10 points) Write the regular expression, producible from the grammar above,
that matches exactly those strings accepted by the
NeuroToast DFA in Problem 5(A). In other words, your regular expression
should match only those strings yielding unsuccessful toasting runs.
Solution: (one of many) (0|1)* (00|01*01*01*0) (0|1)*
Solution:
The full tree is quite large. As a sample, the
derivation tree of the subexpression (0|1)* is the following:
S
|
|
R
|
|
T
/ \
/ \
U *
/|\
/ | \
'(' S ')'
/|\
/ | \
R '|' R
| |
| |
T T
| |
| |
U U
| |
| |
0 1
Expr S()
{
Expr e = R();
while (nextCharIs('|')) {
e = new SExpr(e,R());
}
return e;
}
Expr S()
{
Expr e = T();
while (nextCharIsLegal()) {
e = new RExpr(e,T());
}
return e;
}
Expr T()
{
Expr e = U();
while (nextCharIs('*')) {
e = new TExpr(e);
}
return e;
}
Expr U()
{
if (peek('0') || peek('1')) {
Expr e = new UExpr(nextChar());
}
else if (nextCharIs('(')) {
Expr e = S();
if (peek(')') { nextChar(); }
else { e = new ErrorExpr("parenthesis missing"); }
}
else {
Expr e = new ErrorExpr("unrecognized character");
}
return e;
}
Solution: Because arbitrarily deep nested parentheses are legal (parts of) strings according to this grammar, any DFA would have to accept strings of the form (((...(1)...))) with equal numbers of parens on each side. This would require (at least) one state for each possible number of matching parens. But this is impossible if the number of states must be finite, as in a DFA.