Optimization
The regexMatch code should work for all inputs, but for large inputs, the computed derivative may be an extremely large and unnecessarily complex regular expression. Let’s fix that.
Our approach will be to implement a function optimize that takes in a Regex and uses equivalence rules to transform the regular expression into an equivalent, but hopefully smaller, expression.
To Do
To take full advantage of optimization, our implementation of
derivshould produce optimized expressions. We will need to modify the code insrc/Evaluation.hs:- Rename the
derivfunction toderiv', but do not rewrite any calls toderiv. - Now, define
derivto do optimization (you will need to importoptimize):
deriv :: Char -> Regex -> Regex deriv c r = optimize (deriv' c (optimize r))- You should now be able to run the matching tests and get the same results as before.
- Rename the
In
src/Optimization.hs, add additional cases/code to theoptimizefunction so that it returns a “more optimized” regular expression, by taking advantage of the regular-expression equalities at the bottom of this page. More specifically,- the
optimizefunction should use pattern-matching to detect whether the input to the function has the form of the left-hand side of an equality at the bottom of this page, and instead return the corresponding right-hand side. (The provided code should be kept as a final “catch-all” case, so that theoptimizefunction behaves as before when the inputs do not match any of the left-hand sides.) - You will want to use recursion to optimize sub-expressions.
- Use the test suite to verify that you haven’t broken anything. It is easy to make small mistakes or typos; you will avoid a lot of headaches if you re-run the tests after every small change!
- the
Regular-expression equalities
∅r = ∅
r∅ = ∅
εr = r
rε = r
¬∅ | r = ¬∅
r | ¬∅ = ¬∅
∅ | r = r
r | ∅ = r
∅ & r = ∅
r & ∅ = ∅
¬∅ & r = r
r & ¬∅ = r
(r*)* = r*
ε* = ε
∅* = ε
¬(¬r) = r
r | r = r
r & r = r
r | s = s | r
r & s = s & r
(r s ... ) (t u ...) = (r s ... t u ...)
r (s t u ...) = (r s t u ...)
(r s t ...) u = (r s t ... u)
(r|s|...) | (t|u|...) = (r|s|...|t|u|...)
r | (s|t|u|...) = (r|s|t|u|...)
(r|s|t|...) | u = (r|s|t|...|u)
(r&s&...) & (t&u&...) = (r&s&...&t&u&...)
r & (s&t&u&...) = (r&s&t&u&...)
(r&s&t&...) & u = (r&s&t&...&u)