We learned how to write an intepreter — Getting functions-as-values right was a bit tricky
We made a tool to make a parsers — Based on composing functions
Haskell Tip — Point-Free Style
Vocabulary tip: In some parts of mathematics, a “point” means a variable. (It would be easier to understand what we were talking about if we called it “variable-free” style, but that's not the standard terminology).
Contrast
inc x = x + 1
mult3add1 x = 3 * x + 1
with
inc = (+ 1)
mult3add1 = (+ 1) . (* 3)
where
(.) :: (b -> c) -> (a -> b) -> (a -> c)
f . g = \x -> f (g x)
*PicParser> let f = (+1) . (*3)
*PicParser> f 5
16
In point free style, we compose functions rather than use variables. We can use the dot operator to do this.
The dot operator essentially means the following:
f.g = apply_f_to_g_on_arg
where apply_f_to_g_on_arg x = f(g x)
Parsing Continued
Higher-Order Functions
Higher-order functions are functions that take functions and return functions.
They give you the ability to say 'what' instead of 'how'. Higher-order functions use functions as data.
High-Level/Absract view
What is a Parser Cow ?
Is it a Cow ? No, it's a parser that has the ability to look for cows
Has it done anything yet? Not yet, but it has the potential to look for cows.
Parser Cow embodies the knowledge of how try to examine “the input” (whatever that is behind the scenes) and potentially consume some of it to yield a Cow. More succincgly, we could call it a Cow parser.
It isn't a Cow
It hasn't done anything yet, just like a function hasn't done anything until you run it.
Behind the scenes, it's storing Haskell function, but we don't need to bring that up. The fact that the function uses String to represent the input and the Maybe type to handle failure is an implementation detail.
Combining Parsers
Our Parser Combinators let us take some simple primitive parsers and join them together to make more sophisticated ones.
Primitives
pfail :: Parser a
return :: a -> Parser a
get :: Parser Char
<|> :: Parser a -> Parser a -> Parser a
>>= :: Parser a -> (a -> Parser b) -> Parser b
Derived from the Primitives
<+> :: Parser a -> Parser b -> Parser (a,b)
<:> :: Parser a -> Parser [a] -> Parser [a]
>>=: :: Parser a -> (a -> b) -> Parser b
>>: :: Parser a -> b -> Parser b
many :: Parser a -> Parser [a]
some :: Parser a -> Parser [a]
optional :: Parser a -> Parser (Maybe a)
Concrete/Implementation/Operational details
Let's label the parts of this:
newtype Parser a = ParsingFunction (String -> Maybe (a, String))
newtype tells us that Parser a is a datatype. ParsingFunction is a tag.
The associated value (String -> Maybe (a, String)) is a function that we pass an input string to, which gives us the parse result, Maybe (a, String), where a is the value that was parsed and the string is what is left over after the parse.
Some Examples
What are these?
digit = get <=> isDigit
alpha = get <=> isAlpha
alphanum = get <=> isAlphaNum
identifier = alpha <:> many alphanum
ident = many whitespace <-+> identifier
identlen1 = ident >>= \i -> return (length i)
identlen2 = ident >>=: \i -> length i
identlen3 = ident >>=: length
hello = text "hello"
What are their types, and what would they do if run on a suitable input?
digit :: Paser Char — Grabs a numeric char if one is given
alpha :: Parser Char — Grabs an alphabetic char if one is given
alphanum :: Parser Char — Grabs an alphanumeric character if one is given
identifier :: Parser [Char] — Grabs an alphanumeric string starting with an alphabetic char
ident :: Parser [Char] — Removes the whitespace from an input, then gets an identifier
identlen1 :: Parser Int — Returns the length of the input identifier, ignoring whitespace
identlen2 :: Parser Int — Returns the length of the input identifier, ignoring whitespace
identlen3 :: Parser Int — Returns the length of the input identifier, ignoring whitespace, also an eta reduction from identlen2
hello :: Parser [Char] — Expects to see "hello" and grabs it
A Grammar for pic
picture :: Parser Picture
picture = some element
element :: Parser Element
element = (direction <+-> text ";" >>=: \d -> Turn d)
<|> (shape <+> attrs <+-> text ";" >>=: \(s,a) -> Draw s a)
attrs = many attr
attr = (litstring >>=: \s -> Label s)
<|> (direction >>=: \d -> Dir d)
direction :: Parser Direction
direction = (rword "left" >>: Lt)
<|> (rword "right" >>: Rt)
<|> (rword "up" >>: Up)
<|> (rword "down" >>: Dn)
shape :: Parser Shape
shape = (rword "box" >>: Box)
<|> (rword "circle" >>: Circle)
<|> (rword "line" >>: Line)
<|> (rword "move" >>: Move)
<|> (rword "arrow" >>: Arrow)
litstring = skipws $
char '"' <-+> stringcontent <+-> char '"'
stringcontent = many stringchar
stringchar = get <=> (/= '"')
<|> (string "\\\"" >>: '\"')
*PicParser> parseFile "test1-bad.pic"
*** Exception: No parse at all
*PicParser> parseFile "test0-bad.pic"
*** Exception: No parse at all
Reading these ... test1-bad.pic
bix "A";
line;
box "B";
and test0-bad.pic which is just a file with four blank lines.
Not very helpful!
The Power of Abstraction
These are primitive parsers and combinators:
pfail
get
return
<|>
>>=
The other parsers and combinators can be implemented in terms of these.
A simple change can allow us to have an error message when our parser fails.
Instead of using the Maybe type, we've replaced it with a type that has an error string instead of just Nothing when the parsing fails.
newtype Parser a = ParsingFunction (String -> ParseResult a)
data ParseResult a = Success (a, String)
| Failure String
To support this change, we just have to change our base primitives. Everything else stays the same.
fail :: String -> Parser a
fail message = ParsingFunction alwaysFail
where alwaysFail _ = Failure message
pfail :: Parser a
pfail = fail "no parse"
The Power of the >>= Operator
Here are some extended examples of the power of the >>= operator.
Arbitrary repeated word
What does this do?
ident >>= text
The parser ident >>= text will remove whitespace from a string and pass the first parsed portion of the string to text where text will look specifically for that string in the remainder of the input and succeed (and produce the string) only if it shows up again.
Example session
*PicParser> let mystery = ident >>= text
*PicParser> let mystery = ident >>= \i -> text i
*PicParser> :t mystery
mystery :: Parser String
*PicParser> parse mystery " melissa jincheng"
"*** Exception: <input>:1:12 -- Expected 'melissa'
*PicParser> parse mystery " melissa melissa"
"melissa"
*PicParser> parse (some mystery) " melissa melissa jincheng jincheng reyna reyna"
["melissa","jincheng","reyna"]
anbncndn
Here's another cool example. Here we parse anbncndn.
Given this code (not shown in class):
countChar :: Char -> Parser Int
countChar c = (char c <-+> countChar c >>=: (+1))
<|> return 0
nChar :: Char -> Int -> Parser Int
nChar c 0 = return 0
nChar c n = char c <-+> nChar c (n-1) >>=: (+1)
Because the right-hand argument to >>= can execute arbitrary code to choose/make the parser to use next, we can do arbitrary parsing.
This expressive power lets us write unusual parsers, such as again demonstrated below:
*PicParser> parse again "world world"
["world"]
*PicParser> parse again "hello world hello world"
["hello","world"]
*PicParser> parse again "hello world hello"
*** Exception: <input>:1:18 -- Expected 'world'
*PicParser> parse again "hello world a b c d hello"
*** Exception: <input>:1:26 -- Expected 'world'
*PicParser> parse again "hello world a b c d hello world"
*** Exception: <input>:1:32 -- Expected 'a'
*PicParser> parse again "hello world a b c d hello world a"
*** Exception: <input>:1:34 -- Expected 'b'
*PicParser> parse again "hello world a b c d hello world a b"
*** Exception: <input>:1:36 -- Expected 'c'
*PicParser> parse again "hello world a b c d hello world a b c"
*** Exception: <input>:1:38 -- Expected 'd'
*PicParser> parse again "hello world a b c d hello world a b c d"
["hello","world","a","b","c","d"]
*PicParser> parse again "a b a b a b"
*** Exception: <input>:1:12 -- Expected 'a'
*PicParser> parse again "a b a b a b a"
*** Exception: <input>:1:14 -- Expected 'b'
*PicParser> parse again "a b a b a b a b"
["a","b","a","b"]
*PicParser> parse again "a a b b"
*** Exception: <input>:1:8 -- Expected 'a'
*PicParser> parse again "a a b b a a"
*** Exception: <input>:1:12 -- Expected 'b'
*PicParser> parse again "a a b b a a b"
*** Exception: <input>:1:14 -- Expected 'b'
*PicParser> parse again "a a b b a a b b"
["a","a","b","b"]
As you can see again reads a sequence of identifiers which must be repeated twice. It can even handle being repeated (e.g., with some or many) as shown below:
*PicParser> parse (some again) "a a a a b b b b"
[["a","a"],["b","b"]]
*PicParser> parse (some again) "a a b b c c d d"
[["a"],["b"],["c"],["d"]]
*PicParser> parse (some again) "a a b b b b c c"
[["a"],["b","b"],["c"]]
*PicParser> parse (some again) "a a a a b b b b"
[["a","a"],["b","b"]]
*PicParser> parse (some again) "a b c d a b c d"
[["a","b","c","d"]]
Wiki Extra: The Code
The code for again is a bit stranger than anything you'll ever need to write. The trick is that we build a parser on the fly (in the argument to fwd) which we later use to read the repeated text.
-- again: accept a sequence of identifiers provided the sequence repeats
-- itself
again :: Parser [String]
again = fwd (return ())
where fwd trailing = ident >>= \i -> return i <:>
let trailing' = (trailing <-+-> text i)
in fwd trailing'
<|> trailing' <-+> return []
Similarly, we can write forwardback which is like again but the repeat is reversed. For example:
*PicParser> parse forwardback "a b c d"
*** Exception: <input>:1:8 -- Expected 'd'
*PicParser> parse forwardback "a b c d d"
*** Exception: <input>:1:10 -- Expected 'c'
*PicParser> parse forwardback "a b c d d c"
*** Exception: <input>:1:12 -- Expected 'b'
*PicParser> parse forwardback "a b c d d c b"
*** Exception: <input>:1:14 -- Expected 'a'
*PicParser> parse forwardback "a b c d d c b a"
["a","b","c","d"]
Wiki Extra: The Code
Although the code above for again is actually easy to adapt to make forwardback we can use slightly less advanced techniques to make forwardback — there is no need to form a Parser and pass it down; simple recursion is enough to get the job done.
-- forwardback: accept a sequence of identifiers provided the sequence
-- repeats itself backwards
forwardback :: Parser [String]
forwardback = ident >>= (\ident -> return ident <:>
( forwardback <+-> text ident
<|> text ident <-+> return []))
If you want to print this page, you can click
to show all the answers.