Configuration

Configuration allows the person using the compiler to control its behavior, for example by setting flags on the command line. The flag information could be stored in our CompilerConfiguration data structure. To use this information, a phase would need access to a value of type CompilerConfiguration. We can provide this value as an argument to the function. Let us redesign our compiler phases so that each one takes a CompilerConfiguration as input.

1 Redesign the interface

The new signatures for our compiler and phases will now be:

compiler :: String -> CompilerConfiguration -> String

tokenize :: String -> CompilerConfiguration -> [Token]
optimize :: AST    -> CompilerConfiguration -> AST
emit     :: AST    -> CompilerConfiguration -> String

Unfortunately, all phases now take two arguments, so we cannot compose them as easily as we did before.

2 Composing phases

We will have to change how we compose our phases. Here is an implementation that works:

compiler :: String -> CompilerConfiguration -> String
compiler input configuration =
  let tokens = tokenize input  configuration
      ast    = parse    tokens configuration
      ast'   = optimize ast    configuration
      result = emit     ast'   configuration
  in result

In this implementation, we have to feed the configuration to each phase. It is still somewhat readable, but there is a neat trick we can do to recover the readability of our original pipeline design.

Threading

Let us create a new phase-composition operator that “threads” (i.e., composes) the phases together, along with the needed configuration. This operator just generalizes what we did above: take the output of one phase and feed it as the input to the next, along with the configuration.

-- | Compose two compiler phases into a single phase
(>.>) :: (a -> CompilerConfiguration -> b) -> (b -> CompilerConfiguration -> c) -> (a -> CompilerConfiguration -> c)
(>.>) phase1 phase2 input configuration =
  let phase1Result = phase1 input configuration
      phase2Result = phase2 phase1Result configuration
  in phase2Result

What’s with the weird notation?!

I made up a name for our operator: >.>. We could have chosen just about any name. I chose this name because the . conveys function composition, and the > > is similar to our previous design.

Now we define our pipeline as before, but with our new composition operator:

{- Compiler pipeline -}
compiler :: String -> CompilerConfiguration -> String
compiler = tokenize >.> parse >.> optimize >.> emit

With this design, it will be much easier to add a new phase!

3 Full code

Here is all the code for our compiler, whose phases can read from a configuration.

The code that has changed from our previous version is emphasized.

Compiler.hs

module Compiler where

{- Compiler pipeline -}
compiler :: String -> CompilerConfiguration -> String
compiler = tokenize >.> parse >.> optimize >.> emit

-- | Compose two compiler phases into a single phase
(>.>) :: (a -> CompilerConfiguration -> b) -> (b -> CompilerConfiguration -> c) -> (a -> CompilerConfiguration -> c)
(>.>) phase1 phase2 input configuration =
  let phase1Result = phase1 input configuration
      phase2Result = phase2 phase1Result configuration
  in phase2Result

{- Compiler phases -}
tokenize :: String -> CompilerConfiguration -> [Token]
tokenize = undefined

parse :: [Token] -> CompilerConfiguration -> AST
parse = undefined

optimize :: AST -> CompilerConfiguration -> AST
optimize = undefined

emit :: AST -> CompilerConfiguration -> String
emit = undefined

{- Compiler data types -}
data Token = Token
data AST = AST
data CompilerConfiguration = CompilerConfiguration
data CompilerLog = CompilerLog
data CompilerState = CompilerState
data CompilerError = CompilerError