r1 - 28 Feb 2020 - 22:31:57 - Main.BenWiedermannYou are here: TWiki >  CS131Spring2020 Web  > Week6Class2

Building a Parsing Library (Part 2)

Story So Far...

  • We learned the lambda calculus — Everything can be made of functions
  • We learned Haskell — Lambda Calculus + Types + Lazy Evaluation
  • We learned how to write an intepreter — Getting functions-as-values right was a bit tricky
  • We made a tool to make a parsers — Based on composing functions

Haskell Tip — Point-Free Style

Vocabulary tip: In some parts of mathematics, a “point” means a variable. (It would be easier to understand what we were talking about if we called it “variable-free” style, but that's not the standard terminology).

Contrast

   inc x       = x + 1
   mult3add1 x = 3 * x + 1
with
   inc         = (+ 1)
   mult3add1   = (+ 1) . (* 3)
where
   (.) :: (b -> c) -> (a -> b) -> (a -> c)
   f . g = \x -> f (g x)

*PicParser> let f = (+1) . (*3)
*PicParser> f 5
16
In point free style, we compose functions rather than use variables. We can use the dot operator to do this. The dot operator essentially means the following:
f.g = apply_f_to_g_on_arg
        where apply_f_to_g_on_arg x = f(g x)

Parsing Continued

Higher-Order Functions

Higher-order functions are functions that take functions and return functions. They give you the ability to say 'what' instead of 'how'. Higher-order functions use functions as data.

High-Level/Absract view

What is a Parser Cow ?

  • Is it a Cow ? No, it's a parser that has the ability to look for cows
  • Has it done anything yet? Not yet, but it has the potential to look for cows.

Parser Cow embodies the knowledge of how try to examine “the input” (whatever that is behind the scenes) and potentially consume some of it to yield a Cow. More succincgly, we could call it a Cow parser.

  • It isn't a Cow
  • It hasn't done anything yet, just like a function hasn't done anything until you run it.
  • Behind the scenes, it's storing Haskell function, but we don't need to bring that up. The fact that the function uses String to represent the input and the Maybe type to handle failure is an implementation detail.

Combining Parsers

Our Parser Combinators let us take some simple primitive parsers and join them together to make more sophisticated ones.

Primitives

pfail  :: Parser a
return :: a -> Parser a
get    :: Parser Char
<|>    :: Parser a -> Parser a -> Parser a
>>=    :: Parser a -> (a -> Parser b) -> Parser b

Derived from the Primitives

<+>      :: Parser a -> Parser b -> Parser (a,b)
<:>      :: Parser a -> Parser [a] -> Parser [a]
>>=:     :: Parser a -> (a -> b) -> Parser b
>>:      :: Parser a -> b -> Parser b
many     :: Parser a -> Parser [a]
some     :: Parser a -> Parser [a]
optional :: Parser a -> Parser (Maybe a)

Concrete/Implementation/Operational details

Let's label the parts of this:

newtype Parser a = ParsingFunction (String -> Maybe (a, String))

newtype tells us that Parser a is a datatype. ParsingFunction is a tag.

The associated value (String -> Maybe (a, String)) is a function that we pass an input string to, which gives us the parse result, Maybe (a, String), where a is the value that was parsed and the string is what is left over after the parse.

Some Examples

What are these?

  digit      = get <=> isDigit 
  alpha      = get <=> isAlpha
  alphanum   = get <=> isAlphaNum  
  identifier = alpha <:> many alphanum 
  ident      = many whitespace <-+> identifier
  identlen1  = ident >>=  \i -> return (length i) 
  identlen2  = ident >>=: \i -> length i 
  identlen3  = ident >>=: length
  hello      = text "hello"

What are their types, and what would they do if run on a suitable input?

  • digit      :: Paser Char — Grabs a numeric char if one is given
  • alpha      :: Parser Char — Grabs an alphabetic char if one is given
  • alphanum   :: Parser Char — Grabs an alphanumeric character if one is given
  • identifier :: Parser [Char] — Grabs an alphanumeric string starting with an alphabetic char
  • ident      :: Parser [Char] — Removes the whitespace from an input, then gets an identifier
  • identlen1  :: Parser Int — Returns the length of the input identifier, ignoring whitespace
  • identlen2  :: Parser Int — Returns the length of the input identifier, ignoring whitespace
  • identlen3  :: Parser Int — Returns the length of the input identifier, ignoring whitespace, also an eta reduction from identlen2
  • hello      :: Parser [Char] — Expects to see "hello" and grabs it

A Grammar for pic

picture :: Parser Picture
picture = some element

element :: Parser Element
element =     (direction <+-> text ";"                  >>=: \d -> Turn d)
          <|> (shape <+> attrs <+-> text ";"            >>=: \(s,a) -> Draw s a)
          
attrs = many attr

attr =        (litstring                                >>=: \s -> Label s)
          <|> (direction                                >>=: \d -> Dir d)

direction :: Parser Direction
direction =    (rword "left"                            >>: Lt)
           <|> (rword "right"                           >>: Rt)
           <|> (rword "up"                              >>: Up)
           <|> (rword "down"                            >>: Dn)


shape :: Parser Shape
shape =        (rword "box"                             >>: Box)
           <|> (rword "circle"                          >>: Circle)
           <|> (rword "line"                            >>: Line)
           <|> (rword "move"                            >>: Move)
           <|> (rword "arrow"                           >>: Arrow)

litstring = skipws $
                char '"' <-+> stringcontent <+-> char '"'

stringcontent = many stringchar

stringchar =   get <=> (/= '"')
           <|> (string "\\\""                           >>: '\"')

Demo

And it actually works!

Prelude> :load PicParser.hs 
[1 of 4] Compiling ParserBase       ( ParserBase.hs, interpreted )
[2 of 4] Compiling ParserCombinators ( ParserCombinators.hs, interpreted )
[3 of 4] Compiling Absyn            ( Absyn.hs, interpreted )
[4 of 4] Compiling PicParser        ( PicParser.hs, interpreted )
Ok, modules loaded: PicParser, Absyn, ParserBase, ParserCombinators.
*PicParser> parseFile "test1.pic"
[Draw Box [Label "A"],Draw Line [],Draw Box [Label "B"]]
*PicParser> parseFile "test2.pic"
[Draw Circle [Label "User",Label "input"],Draw Arrow [Label "steps 1",Dir Rt,Label "and 2"],Draw Box [Label "Magic",Label "happens"],Draw Arrow [Label "step 3",Dir Rt,Dir Dn],Draw Box [Label "Output"]]
*PicParser> parseFile "test3.pic"
[Draw Box [Label "Abc",Label "Def",Label "Ghi"],Draw Arrow [Dir Rt,Dir Dn],Draw Box [Label "Jkl",Label "Mno"],Draw Move [Dir Dn,Dir Lt],Draw Box [Label "pqr",Label "stu"],Draw Arrow [Dir Lt,Dir Up],Draw Circle [Label "Vwx",Label "Yz"],Draw Line [Dir Up,Dir Rt]]

So this is all pretty neat.

But...

Actually, it's not so great...

*PicParser> parseFile "test1-bad.pic" 
*** Exception: No parse at all
*PicParser> parseFile "test0-bad.pic" 
*** Exception: No parse at all

Reading these ... test1-bad.pic

bix "A";
line;
box "B";

and test0-bad.pic which is just a file with four blank lines.

Not very helpful!

The Power of Abstraction

These are primitive parsers and combinators:

  • pfail
  • get
  • return
  • <|>
  • >>=

The other parsers and combinators can be implemented in terms of these.

A simple change can allow us to have an error message when our parser fails.

Instead of using the Maybe type, we've replaced it with a type that has an error string instead of just Nothing when the parsing fails.

newtype Parser a = ParsingFunction (String -> ParseResult a)

data    ParseResult a = Success (a, String)
                      | Failure String

To support this change, we just have to change our base primitives. Everything else stays the same.

fail :: String -> Parser a
fail message = ParsingFunction alwaysFail
    where alwaysFail _ = Failure message

pfail :: Parser a
pfail = fail "no parse"

The Power of the >>= Operator

Here are some extended examples of the power of the >>= operator.

Arbitrary repeated word

What does this do?

  ident >>= text

The parser ident >>= text will remove whitespace from a string and pass the first parsed portion of the string to text where text will look specifically for that string in the remainder of the input and succeed (and produce the string) only if it shows up again.

Example session

*PicParser> let mystery = ident >>= text
*PicParser> let mystery = ident >>= \i -> text i
*PicParser> :t mystery 
mystery :: Parser String
*PicParser> parse mystery "   melissa jincheng"
"*** Exception: <input>:1:12 -- Expected 'melissa'
*PicParser> parse mystery "   melissa melissa"
"melissa"
*PicParser> parse (some mystery) "   melissa melissa jincheng jincheng reyna reyna"
["melissa","jincheng","reyna"]

anbncndn

Here's another cool example. Here we parse anbncndn.

Given this code (not shown in class):

 
countChar :: Char -> Parser Int
countChar c = (char c <-+> countChar c >>=: (+1))
            <|> return 0

nChar :: Char -> Int -> Parser Int
nChar c 0 = return 0
nChar c n = char c <-+> nChar c (n-1) >>=: (+1)

*PicParser> parse (countChar 'a') "aaaaa"
5
*PicParser> parse (countChar 'a') "b"
*** Exception: <input>:1:1 -- Expected 'a'
*PicParser> :t nChar 
nChar :: Char -> Int -> Parser Int
*PicParser> parse (nChar 'a' 6) "aaaaaa"
6
*PicParser> parse (nChar 'a' 6) "aaa"
*** Exception: <input>:1:4 -- Expected 'a'
*PicParser> parse (nChar 'a' 6) "aaaaaaa"
*** Exception: <input>:1:7 -- EOF expected

*PicParser> parse (countChar 'a' >>= \n -> nChar 'b' n) "aaaabbbb"
4
*PicParser> parse (countChar 'a' >>= \n -> nChar 'b' n) "aaaabbb"
*** Exception: <input>:1:8 -- Expected 'b'
*PicParser> parse (countChar 'a' >>= \n -> nChar 'b' n) "aaaabbbbb"
*** Exception: <input>:1:9 -- EOF expected
*PicParser> parse (countChar 'a' >>= \n -> nChar 'b' n) ""
0
*PicParser> parse (nChar 'a' 6) "aaaaaaa"
*** Exception: <input>:1:7 -- EOF expected
*PicParser> parse (nChar 'a' 6) "aaaaaa"
6
*PicParser> parse (countChar 'a' >>= \n -> nChar 'b' n >>= \m -> nChar 'c' m) "aaabbbccc"
3
*PicParser> parse (countChar 'a' >>= (\n -> nChar 'b' n >>= (\m -> nChar 'c' m))) "aaabbbccc"
3
*PicParser> parse (countChar 'a' >>= (\n -> nChar 'b' n >>= (\m -> nChar 'c' (n+m)))) "aaabbbcccccc"
6
*PicParser> parse (countChar 'a' >>= (\n -> nChar 'b' n >>= (\m -> nChar 'c' (n+m)))) "aaabbbccccc"
*** Exception: <input>:1:12 -- Expected 'c'
*PicParser> parse (countChar 'a' >>= (\n -> nChar 'b' n >>= (\m -> nChar 'c' m >>= nChar 'd'))) "aaabbbcccddd"
3

 
abcd :: Parser Int
abcd = countChar 'a' >>= nChar 'b' >>= nChar 'c' >>= nChar 'd'

Even Cooler Parsers

Because the right-hand argument to >>= can execute arbitrary code to choose/make the parser to use next, we can do arbitrary parsing.

This expressive power lets us write unusual parsers, such as again demonstrated below:

*PicParser> parse again "world world"
["world"]
*PicParser> parse again "hello world hello world"
["hello","world"]
*PicParser> parse again "hello world hello"
*** Exception: <input>:1:18 -- Expected 'world'
*PicParser> parse again "hello world a b c d hello"
*** Exception: <input>:1:26 -- Expected 'world'
*PicParser> parse again "hello world a b c d hello world"
*** Exception: <input>:1:32 -- Expected 'a'
*PicParser> parse again "hello world a b c d hello world a"
*** Exception: <input>:1:34 -- Expected 'b'
*PicParser> parse again "hello world a b c d hello world a b"
*** Exception: <input>:1:36 -- Expected 'c'
*PicParser> parse again "hello world a b c d hello world a b c"
*** Exception: <input>:1:38 -- Expected 'd'
*PicParser> parse again "hello world a b c d hello world a b c d"
["hello","world","a","b","c","d"]
*PicParser> parse again "a b a b a b"
*** Exception: <input>:1:12 -- Expected 'a'
*PicParser> parse again "a b a b a b a"
*** Exception: <input>:1:14 -- Expected 'b'
*PicParser> parse again "a b a b a b a b"
["a","b","a","b"]
*PicParser> parse again "a a b b"
*** Exception: <input>:1:8 -- Expected 'a'
*PicParser> parse again "a a b b a a"
*** Exception: <input>:1:12 -- Expected 'b'
*PicParser> parse again "a a b b a a b"
*** Exception: <input>:1:14 -- Expected 'b'
*PicParser> parse again "a a b b a a b b"
["a","a","b","b"]

As you can see again reads a sequence of identifiers which must be repeated twice. It can even handle being repeated (e.g., with some or many) as shown below:

*PicParser> parse (some again) "a a a a b b b b"
[["a","a"],["b","b"]]
*PicParser> parse (some again) "a a b b c c d d"
[["a"],["b"],["c"],["d"]]
*PicParser> parse (some again) "a a b b b b c c"
[["a"],["b","b"],["c"]]
*PicParser> parse (some again) "a a a a b b b b"
[["a","a"],["b","b"]]
*PicParser> parse (some again) "a b c d a b c d"
[["a","b","c","d"]]

Wiki Extra: The Code

The code for again is a bit stranger than anything you'll ever need to write. The trick is that we build a parser on the fly (in the argument to fwd) which we later use to read the repeated text.

   -- again: accept a sequence of identifiers provided the sequence repeats
   --        itself
   again :: Parser [String]
   again = fwd (return ())
     where fwd trailing = ident >>= \i -> return i <:> 
                                         let trailing' = (trailing <-+-> text i)
                                         in      fwd trailing' 
                                             <|> trailing' <-+> return []

Similarly, we can write forwardback which is like again but the repeat is reversed. For example:

*PicParser> parse forwardback "a b c d"
*** Exception: <input>:1:8 -- Expected 'd'
*PicParser> parse forwardback "a b c d d"
*** Exception: <input>:1:10 -- Expected 'c'
*PicParser> parse forwardback "a b c d d c"
*** Exception: <input>:1:12 -- Expected 'b'
*PicParser> parse forwardback "a b c d d c b"
*** Exception: <input>:1:14 -- Expected 'a'
*PicParser> parse forwardback "a b c d d c b a"
["a","b","c","d"]

Wiki Extra: The Code

Although the code above for again is actually easy to adapt to make forwardback we can use slightly less advanced techniques to make forwardback — there is no need to form a Parser and pass it down; simple recursion is enough to get the job done.

   -- forwardback: accept a sequence of identifiers provided the sequence 
   --              repeats itself backwards
   forwardback :: Parser [String]
   forwardback = ident >>= (\ident -> return ident <:>
                                         (    forwardback <+-> text ident
                                          <|> text ident  <-+> return []))

If you want to print this page, you can click to show all the answers.

-- Main.BenWiedermann - 28 Feb 2020

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Harvey Mudd College computer science
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback