r3 - 25 Feb 2020 - 08:49:45 - Main.MelissaONeillYou are here: TWiki > CS131Spring2020 Web > Week6Class1
Preliminaries
Haskell Quiz
Haskell data Declarations
In the code below, What is the difference between the data declaration and the two type declarations? Also, how do you think values of type AnimalMaker are actually stored in memory?
data AnimalMaker = PigMakingFunction (Int -> [Pig])
| CowMakingFunction (Int -> [Cow])
| BadWeatherOnTheFarm String
type CatMaker = Int -> [Cat]
type DogMaker = Int -> [Dog]
We seem to be working in all these exercises with functions that make animals, perhaps taking an Int specifying how many animals to make, and returning a list of that many animals.
The AnimalMaker type has three cases, two of the cases are for dealing with holding a function, and the third case is for when we don't have a function to store. It is a new type with three constructors/tags (PigMakingFunction, CowMakingFunction, and BadWeatherOnTheFarm) that can be used to make new values of this type, and in patterns.
As for how things work behind the scenes, Haskell must store some extra data besides the associated values to know which case we actually have — probably a small integer.
In contrast, CatMaker and DogMaker are just synonyms for existing types (the type we'd get if we made a function). There aren't multiple cases, nor are there any constructors or any need for pattern matching. Haskell's type inference will never give anything these types, it would use the actual underlying types (Int -> [Cat] and Int -> [Dog]) although we could force it to use our synonyms by giving a type signature.
What's unusual about this declaration? And why is it perhaps better than DogMaker?
data BatMaker = BatMakingFunction (Int -> [Bat])
It only has one case. Just like AnimalMaker, we still need to use the constructor to make a value of this type, and pattern matching to get the function we're holding/wrapping. Unlike DogMaker (a type synonym), this is a brand new type. Because we need to wrap the function using the constructor and unwrap it using pattern matching, we don't need a type signature to ensure Haskell gets the types right.
What's going on in this declaration?
data CreatureMaker a = CreatureMakingFunction (Int -> [a])
It is more generic than BatMaker, as CreatureMaker is a parametric type. When we want to wrap Bat making functions, we can use CreatureMaker Bat.
Behaves like our single-case data declaration, and so can be used in pattern matching like data declarations.
It thus has all the advantages of a data declaration and avoids the annoyances of type aliases described above. For instance, when we initialize a variable using the BatMakerFunction constructor, Haskell knows it is of type BatMaker, as seen below:
A big idea from this class is that code and data overlap, so we should not be surprised that this specification can either be seen as a not-directly-executable description (data) or as executable code saying how to do parsing (recursively).
Last time we fairly mechanically converted this grammar into the following recursive function:
parse :: [LambdaToken] -> (LambdaExpr, [LambdaToken])
parse (IdentTok vname : rest) = -- variable
(Var vname, rest)
parse (LParenTok : LambdaTok : IdentTok vname : DotTok : rest) = -- anon func
let (body, rest2) = parse rest
in case rest2 of
RParenTok : rest3 -> (Lambda vname body, rest3)
_ -> error "')' expected"
parse (LParenTok : rest) = -- application
let (func, rest2) = parse rest
(arg, rest3) = parse rest2
in case rest3 of
RParenTok : rest4 -> (Apply func arg, rest4)
_ -> error "')' expected"
parse [] = error "Expected lambda expr, ran out of tokens!"
parse (t:_) = error ("Unexpected token, " ++ show t)
Making Parentheses Optional, Attempt 1
We could attempt to fix the grammar to make parentheses optional this way:
What do we notice about this attempt (in particular what problems do we see?
The grammar is ambiguous. It is not clear how to handle x y z, should we interpret it as if it were (x y) z or x (y z)
The grammar is left recursive which means that if we tried to implement it using recursive descent, we'd go into an infinite loop.
We also notice that it not necessary for the structure and specification concrete syntax to exactly match the abstract syntax (e.g., our abstract syntax doesn't have a fourth case for parentheses because that would be unnecessary.
Consider the expressions 1+2+3 and 1+2*3.
There is ambiguity in parsing for both expressions. More importantly, the second expression could be parsed as (BinOp Times (BinOp Add 1 2) 3), which yields 9 instead of the desired outcome 7. This ambiguity is bad.
Note that from the rule
| ‹expr› ‹op› ‹expr›
we can see that the grammar is left-recursive.
Fixing the Precedence Issues and Using Shorthands for Repetition
We can hand-implement this grammar using the following recursive code:
parseExpr :: [LambdaToken] -> Maybe (LambdaExpr, [LambdaToken])
parseExpr toks =
case parsePrims toks of
Just (prims, rest) -> Just (foldl1 Apply prims, rest)
Nothing -> Nothing
parsePrims :: [LambdaToken] -> Maybe ([LambdaExpr], [LambdaToken])
parsePrims toks =
case parsePrim toks of
Just (prim, rest1) ->
case parsePrims rest1 of
Just (prims, rest2) -> Just (prim:prims, rest2)
Nothing -> Just ([prim], rest1)
Nothing ->
Nothing
parsePrim :: [LambdaToken] -> Maybe (LambdaExpr, [LambdaToken])
parsePrim (IdentTok vname : rest) = Just (Var vname, rest)
parsePrim (LambdaTok : IdentTok vname : DotTok : rest2) =
case parseExpr rest2 of
Just (body, rest3) -> Just (Lambda vname body, rest3)
Nothing -> Nothing
parsePrim (LParenTok : rest) =
case parseExpr rest of
Just (result, RParenTok : rest) -> Just (result, rest)
_ -> Nothing
parsePrim [] = Nothing -- error "Expected lambda expr, ran out of tokens!"
parsePrim (t:_) = Nothing -- error ("Unexpected token, " ++ show t)
Sample session:
*LambdaParser> :t parsePrim
parsePrim :: [LambdaToken] -> Maybe (LambdaExpr, [LambdaToken])
*LambdaParser> :t parsePrims
parsePrims :: [LambdaToken] -> Maybe ([LambdaExpr], [LambdaToken])
*LambdaParser> :t parseExpr
parseExpr :: [LambdaToken] -> Maybe (LambdaExpr, [LambdaToken])
*LambdaParser> parseExpr (tokenize "(λx.x x x) (λx.x x x)")
Just (Apply (Lambda "x" (Apply (Apply (Var "x") (Var "x")) (Var "x"))) (Lambda "x" (Apply (Apply (Var "x") (Var "x")) (Var "x"))),[])
*LambdaParser> parseExpr (tokenize "...")
Nothing
*LambdaParser> parseExpr (tokenize "λx.x)")
Just (Lambda "x" (Var "x"),[RParenTok])
*LambdaParser> parsePrim (tokenize "w x y z")
Just (Var "w",[IdentTok "x",IdentTok "y",IdentTok "z"])
*LambdaParser> parsePrims (tokenize "w x y z")
Just ([Var "w",Var "x",Var "y",Var "z"],[])
*LambdaParser> parseExpr (tokenize "w x y z")
Just (Apply (Apply (Apply (Var "w") (Var "x")) (Var "y")) (Var "z"),[])
*LambdaParser> foldl Apply (Var "w") [Var "x",Var "y",Var "z"]
Apply (Apply (Apply (Var "w") (Var "x")) (Var "y")) (Var "z")
*LambdaParser> foldl1 Apply [Var "w",Var "x",Var "y",Var "z"]
Apply (Apply (Apply (Var "w") (Var "x")) (Var "y")) (Var "z")
*LambdaParser>
Aside: The Power of Higher Order Functions
Haskell has many higher-order functions (i.e., functions that operate on functions, taking functions as arguments and often returning functions as results).
Here are some we've seen:
curry :: ((a, b) -> c) -> a -> b -> c
uncurry :: (a -> b -> c) -> (a, b) -> c
. :: (b -> c) -> (a -> b) -> a -> c
flip :: (a -> b -> c) -> b -> a -> c
span :: (a -> Bool) -> [a] -> ([a], [a])
map :: (a -> b) -> [a] -> [b]
filter :: (a -> Bool) -> [a] -> [a]
mapMaybe :: (a -> Maybe b) -> [a] -> [b]
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl1 :: (a -> a -> a) -> [a] -> a
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr1 :: (a -> a -> a) -> [a] -> a
It's cool to use these functions (they become part of our vocabulary) and allow us to express ourselves more succinctly and wield bigger concepts when thinking, but it's also neat to make our own new higher order functions.
Generalizing Repetition
We can take the basic algorithm of parsePrims and write a general function:
parseRepeated :: (b -> Maybe (a, b)) -> b -> Maybe ([a], b)
parseRepeated parser = repeater
where
repeater toks =
case parser toks of
Just (one, rest1) ->
case repeater rest1 of
Just (lots, rest2) -> Just (one:lots, rest2)
Nothing -> Just ([one], rest1)
Nothing ->
Nothing
parsePrims :: [LambdaToken] -> Maybe ([LambdaExpr], [LambdaToken])
parsePrims = parseRepeated parsePrim
In fact, a function like parseRepeated a standard name in Haskell, some.
Building a Parsing Library
Story So Far...
We can take a grammar and (at least for an easy one) hand write recursive code to do the parsing! Yay!
Our Parser Type
Deciding on a Type for Parsers
So, what are parsers, and how do they fit into this picture?
The abstract syntax we're trying to parse is utterly trivial:
data MyThing = TheX
and the concrete syntax is just this:
x
A first guess might have a type such as String -> MyThing.
But this ignores the idea that what we want might not be there, or might not be there properly.
Let's consider three possible inputs:
x — this is correct
x and stuff — we have the x we want, but there is extra stuff afterwards
y — we don't see what we want at all
Thus, we need a representation that captures
The type for thing we're trying to parse. In this case, MyThing.
A string representing what (if anything) that was left over afterwards (possibly that might be handed over to later parsing). That implies we need to return (MyThing, String).
The idea that parsing can fail entirely, so we use the Maybe type, yielding Maybe (MyThing, String).
Thus the type of our parser seems like it should be
String -> Maybe (MyThing, String)
It would be nice to name this type, so we could write:
but why limit it to just MyThing — let's generalize it to parsing arbitrary things.
The Final Type
newtype Parser a = ParsingFunction (String -> Maybe (a, String))
Parser a is a parameterized type that can take the form of a function that turns a String into a Maybe tuple of the parsed result type a (what has been parsed) and the leftover string.
Note that, since there is only one case in this type, we have used the newtype declaration style (rather than data) as described earlier.
Using and Writing Parsing Functions
We're going to write some very simple primitives that we can combine to build arbitrarily complex parsers.
parse
The function parse :: Parser a -> String -> a that uses a parsing function someone has (somehow) made. It insists that it always produces a result and consumes all the input.
Note: We only use parse as a top-level function to run a parser we've made (and throw an error if it doesn't parse successfully). We'll never call it from inside a parser (we don't want to throw an error and crash out the parser, we handle “that didn't parse successfully” by having our parsing function return nothing.
Write it...
parse :: Parser a -> String -> a
parse (ParsingFunction f) inputString =
case f inputString of
Just (result, "") -> result
Just (_, rest) -> error ("After parsing, " ++ show rest
++ " remained")
Nothing -> error ("No parse at all")
Example (assuming expr holds a suitable parser for Expr, which we don't yet know how to make):
*Parsing> parse expr "1 + 2"
BinOp Plus (Number 1) (Number 2)
*Parsing> parse expr "1 + 2 sausages"
*** Exception: After parsing, "sausages" remained
*Parsing> parse expr "!!!!"
*** Exception: No parse at all
pfail
Write pfail :: Parser a representing a parser that promises to be able to parse any kind of thing, but when used always fails to parse anything!
Some possible answers we might think of (some of which are wrong!)
pfail = ParseFn (\input -> error "No!!!")
pfail = ParseFn (\input -> (Nothing, ""))
pfail = ParseFn (\input -> Nothing)
pfail = \input -> error "No!!!"
pfail = \input -> Nothing
pfail :: Parser a
pfail = ParsingFunction alwaysFail
where alwaysFail _ = Nothing
Instead of crashing the program right away, we should throw an error at the top level (inside parse).
Remember also to match the types! The second answer is a function which takes a String and returns a tuple (Maybe a, String), whereas we want one which takes a string and returns a Maybe (a, String) .
For similar reasons, the last answer won't compile. -- Main.AbramSanderson - 21 Feb 2016
This answer written in Haskell corresponds to the third possible option. Remember Maybe a is either Just...something... or Nothing !
Example:
Parsing> data Elephant = AnElephant Int Int String deriving Show
Parsing> let parseElephant = pfail :: Parser Elephant
Parsing> let parseMeaningOfLife = pfail :: Parser String
Parsing> parse parseElephant "E"
*** Exception: No parse at all
Parsing> parse parseMeaningOfLife "42"
*** Exception: No parse at all
Parsing> parse parseMeaningOfLife "Have I figured it out yet?"
*** Exception: No parse at all
get
Write get :: Parser Char that reads one character from the input and returns it as its parsing result (failing if there is no character to read).
get :: Parser Char
get = ParsingFunction readChar
where readChar (h:t) = Just(h, t)
readChar _ = Nothing
Example:
*Parsing> parse get "a"
'a'
*Parsing> parse get "b"
'b'
*Parsing> parse get "ab"
*** Exception: After parsing, "b" remained
*Parsing> parse get ""
*** Exception: No parse at all
At this point, it may seem like we're quite far from writing a useful parser, but, actually, we're almost there in having the tools we need to make a usable parser (we just need three more things and we're done!).
return
Write return :: a -> Parser a that consumes no input and always gives its argument as its parse result.
Note: Haskell doesn't have a keyword return, so we are free to use this word as a function name, which we've done. We could have given it another name like here_is_a. It has nothing to do with returning from functions!
return :: a -> Parser a
return x = ParsingFunction (\str -> Just (x,str))
parserA <|> parserB is a composition operator, which yields a new parser, that when run:
First tries to parse with parserA, and
If it fails, tries parserB instead.
If it succeeds, parserB is never used.
<|> :: Parser a -> Parser a -> Parser a
Writing <|> directly
How would we write <|> ?
(<|>) :: Parser a -> Parser a -> Parser a
(ParsingFunction f) <|> (ParsingFunction g) = ParsingFunction f_or_g
where f_or_g str =
case f str of
Nothing -> g str
result -> result
Laws:
p <|> pfail ≡ p
pfail <|> p ≡ p
(p <|> q) <|> r ≡ p <|> (q <|> r)
Examples
*Parsing> :t get
get :: Parser Char
*Parsing> parse get "x"
'x'
*Parsing> :t pfail
pfail :: Parser a
*Parsing> :t parse pfail "xxxx"
parse pfail "xxxx" :: a
*Parsing> parse pfail "xxxx"
*** Exception: No parse at all
*Parsing> parse (return 17) ""
17
*Parsing> parse (get <|> return '.' ) "x"
'x'
*Parsing> parse (get <|> return '.' ) ""
'.'
parserA <+> parserB
Produces a parser that uses parserAand thenparserB and yields a pair of the results of each.
<+> :: Parser a -> Parser b -> Parser (a,b)
(Caution: Do not confuse its type with Parser a -> Parser b -> (Parser a, Parser b) .)
Writing <+> directly
How would we write <+> ?
(<+>) :: Parser a -> Parser b -> Parser (a,b)
(ParsingFunction f) <+> (ParsingFunction g) = ParsingFunction f_then_g
where f_then_g input =
case f input of Nothing -> Nothing
Just (v, rest) -> case g rest of
Nothing -> Nothing
Just (v', rest') -> Just ((v, v'), rest')
Laws (wiki extra):
p <+> pfail ≡ pfail
pfail <+> p ≡ pfail
Examples:
*Parsing> parse (get <+> get) "xy"
('x','y')
*Parsing> parse (get <+> return 17) "x"
('x',17)
*Parsing> parse (get <+> pfail) "x"
*** Exception: No parse at all
*Parsing> parse ((get <+> pfail) <|> return ('x', 17) ) ""
('x',17)
*Parsing> parse ((get <+> pfail) <|> return ('x', 17) ) "x"
*** Exception: After parsing, "x" remained
*Parsing> parse (((get <+> pfail) <|> return ('x', 17)) <+> get ) "x"
(('x',17),'x')
Wiki Extra: parserOne <:> parserLots
<:> :: Parser a -> Parser [a] -> Parser [a]
This is like our previous parser, but is specifically designed for parsing that involves generating lists (rather than pairs). It combines two parsers, one that makes a thing and one that makes a list of those same things.
Examples:
We can actually write many and some using this operator:
many :: Parser a -> Parser [a]
many p = p <:> many p
<|> return []
some :: Parser a -> Parser [a]
some p = p <:> many p
Wiki Extra: parser <=> resultCheckPredicate
<=> :: Parser a -> (a -> Bool) -> Parser a
Writing <=> directly
How would we write <=> ?
(<=>) :: Parser a -> (a -> Bool) -> Parser a
ParsingFunction f <=> cond = ParsingFunction f_if_cond
where f_if_cond input =
case f input of Nothing -> Nothing
Just (v, rest) -> if cond v then Just (v, rest)
else Nothing
What do <=> and <+> have in common? How are they different?
Both take the first argument as a parser.
Both return a Parser.
If p fails, then the whole thing fails, and they don't bother checking q or cond.
Both can fail after successfully parsing p.
<+> does something with the result of p (it shows up in the output).
<=> does something with the result of p, it uses the predicate to check whether it's okay.
Wiki Extra: A Generalized “andThen” Operation
parserA >>= makeParserB is a composition operator, which yields a new parser, that when run:
First tries to parse with parserA, and
If it fails, the combined parser fails too, and makeParserB is never called.
If it succeeds, the combined parser then passes parserA's result to makeParserB which generates a new parser which is then run. The result of the second parser is the result of the final combined parser.
>>= :: Parser a -> (a -> Parser b) -> Parser b
With this operator, we can write <=> in terms of >>=:
(<=>) :: Parser a -> (a -> Bool) -> Parser b
p <=> isOkay = p >>= (\r -> if isOkay r then return r else pfail)
and we can write <+> in terms of >>=:
(<+>) :: Parser a -> Parser b -> Parser (a,b)
p <+> q = p >>= (\pr -> (q >>= \qr -> return (pr,qr)))
How would we write >>=
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
ParsingFunction f >>= makeG = ParsingFunction f_then_g
where f_then_g input =
case f input of
Nothing -> Nothing
Just (v,rest) -> let ParsingFunction g = makeG v
in g rest
If you want to print this page, you can click
to show all the answers.