Monads

Our current design has a lot in common with monads. In fact, if we remove everything but error-handling, the type of a compiler phase is:

a -> Either CompilerError b

So, for example, the type of tokenize would be:

String -> Either CompilerError [Token]

We might describe this signature as, A function that takes a String as input and performs a monadic computation whose result (if successful) will be of type [Token].

The only difference between this type and the type of our compiler phase is that there is a lot more information threaded through the monadic computation of our compiler. It sure would be nice if we could rely on Haskell’s existing monads to do all this threading work for us!

1 Redefining the interface

It turns out that the type of our compiler phase can be captured by the following (admittedly complicated!) type:

a -> ExceptT CompilerError (RWST CompilerConfiguration CompilerLog CompilerState IO) b

This type uses the mtl library to define a monadic computation that has all the properties we want in a compiler phase, plus the ability to do IO (helpful for reading and writing files!).

mtl stands for “Monad Transformer Library”.

Let us take a closer look at each part of the type.

1a ->
2  ExceptT
3    CompilerError
4    (
5      RWST
6      CompilerConfiguration
7      CompilerLog
8      CompilerState
9      IO
    ) 
10    b

1: a is the type of the input to the compiler phase. The arrow indicates that the entire type is a function, whose result type is described by everything that follows.
2: ExceptT is a type constructor from mtl that builds the type for our monadic compiler phase. ExceptT takes three arguments, which comprise the rest of the code.
3: CompilerError is the first argument to ExceptT. It provides the type of an unsuccessful result.
4: Lines 4–9 provide the second argument to ExceptT, which is the monad that performs successful computation. We call this the inner monad of ExceptT.
5: The inner monad is built with the mtl type constructor RWST. It builds a type for a monad that can perform operations, while also having access to a Reader (i.e., configuration), a Writer (i.e., a log), and a State. RSWT takes four arguments, which appear on the next four lines.
6: CompilerConfiguration is the first argument to RWST. It provides the data structure that the entire monad can read from.
7: CompilerLog is the second argument to RWST. It provides the data structure that the entire monad can write to. RWST requires that this type support concatenation, so we will need to adjust our placeholder definition for CompilerLog (see full code, below).
8: CompilerState is the third argument to RWST. It provides the data structure that can keep track of state for the entire monad.
9: The fourth and last argument to RWST must be an inner monad: the monadic computation into which RWST will mix the ability to read, write, and keep track of state. We choose IO for this inner monad, so that our compiler phases can read from and write to files, if needed.
10: b is the third and last argument to ExceptT. It is the type of result computed by the monad—the output of the compiler phase.

Warning

There is another big difference that comes with using the mtl library in this way: All our compiler phases are truly monadic, which means we often need to use do notation when implementing them.

But how does it work?

If you would like a longer discussion about why and how the mtl provides us with this functionality, I recommend this article. We can also chat about it in office hours!

2 Composing phases

Given compiler phases built in the way described above, we can dispense of the >.> operator that we defined by hand. Instead, we can compose the phases with a Haskell operator named >=>.

Some people call >=> the “fish” operator!

3 Full code

Here is the code for a compiler whose phases are monadic computations that have all the functionality we have described so far. Note that we have removed the definition of the >.> operator. All other changes are emphasized.

Compiler.hs

module Compiler where

import Control.Monad        ((>=>))
import Control.Monad.Except (ExceptT)
import Control.Monad.RWS    (RWST)

{- Compiler pipeline -}
compiler :: CompilerPhase String String
compiler = tokenize >=> parse >=> optimize >=> emit

{- Compiler phases -}
tokenize :: CompilerPhase String [Token]
tokenize = undefined

parse :: CompilerPhase [Token] AST
parse = undefined

optimize :: CompilerPhase AST AST
optimize = undefined

emit :: CompilerPhase AST String
emit = undefined

{- Compiler data types -}
type CompilerPhase a b = a -> ExceptT CompilerError (RWST CompilerConfiguration CompilerLog CompilerState IO) b

data Token = Token
data AST = AST
data CompilerConfiguration = CompilerConfiguration
1type CompilerLog = [CompilerLogEntry]
data CompilerLogEntry = CompilerEntry
data CompilerState = CompilerState
data CompilerError = CompilerError

1: Because RWST requires the log to support concatenation, we have slightly altered our placeholder type. We have added a new data type LogEntry and changed the type of the log to [LogEntry]. Lists support concatenation, so RWST is happy.