State

Whereas logging is intended for external use (for example, by the person running the compiler), state is intended for the internal use of the compiler. Soon, we will implement a compiler that supports a source language that has variables. It will be helpful for that compiler to keep information about the variables, and to update the information as the compiler works to translate the source language to the target language.

From a design perspective, there is really no difference between logging and state: they both need to be threaded from the output of one phase to the input of another. To redesign the interface and compose phases, we can essentially do the same thing we did for logging.

1 Redesign the interface

The new signatures for our compiler and phases are:

compiler :: String  -> CompilerConfiguration -> CompilerLog -> CompilerState -> (String,  CompilerLog, CompilerState)

tokenize :: String  -> CompilerConfiguration -> CompilerLog -> CompilerState -> ([Token], CompilerLog, CompilerState)
parse    :: [Token] -> CompilerConfiguration -> CompilerLog -> CompilerState -> (AST,     CompilerLog, CompilerState)
optimize :: AST     -> CompilerConfiguration -> CompilerLog -> CompilerState -> (AST,     CompilerLog, CompilerState)
emit     :: AST     -> CompilerConfiguration -> CompilerLog -> CompilerState -> (String,  CompilerLog, CompilerState)

2 Composing phases

To compose phases, we need to thread the output result, the output log, and the output state of one phase to the corresponding inputs of the next. And, of course, we still need to feed the configuration to each phase.

We again update the definition of our >.> phase-composition operator.

-- | Compose two compiler phases into a single phase
(>.>) :: (a -> CompilerConfiguration -> CompilerLog -> CompilerState -> (b, CompilerLog, CompilerState)) -> (b -> CompilerConfiguration -> CompilerLog -> CompilerState -> (c, CompilerLog, CompilerState)) -> (a -> CompilerConfiguration -> CompilerLog -> CompilerState -> (c, CompilerLog, CompilerState))
(>.>) phase1 phase2 input configuration log state =
  let (phase1Result, log', state')   = phase1 input configuration log state
      (phase2Result, log'', state'') = phase2 phase1Result configuration log' state'
  in (phase2Result, log'', state'')

3 Wrangling types

The type signatures for our phases are getting really long! Not only that, but there isn’t much difference between the signatures. They differ only in the the kind of data they work on and the kind of data they produce. Everything else—configuration, logging, and state—is the same.

To make the similarities and differences more apparent, we will factor out all the commonalities into a type alias:

type CompilerPhase a b = a -> CompilerConfiguration -> CompilerLog -> CompilerState -> (b, CompilerLog, CompilerState)

The type variables a and b correspond to the input and output of a compiler phase, as we designed it way back at the beginning of this process.

Now, the signatures for our compiler and phases can be simplified:

compiler :: CompilerPhase String  String

tokenize :: CompilerPhase String  [Token]
parse    :: CompilerPhase [Token] AST
optimize :: CompilerPhase AST     AST
emit     :: CompilerPhase AST     String

Whew—that’s better!

4 Full code

Here is all the code for our compiler, whose phases can read a configuration, read/write a log, and read/update a state.

The code that has changed from our previous version is emphasized.

Compiler.hs
module Compiler where

{- Compiler pipeline -}
compiler ::  CompilerPhase String String
compiler = tokenize >.> parse >.> optimize >.> emit

-- | Compose two compiler phases into a single phase
(>.>) :: CompilerPhase a b -> CompilerPhase b c -> CompilerPhase a c
(>.>) phase1 phase2 input configuration log state =
  let (phase1Result, log', state')   = phase1 input configuration log state
      (phase2Result, log'', state'') = phase2 phase1Result configuration log' state'
  in (phase2Result, log'', state'')

{- Compiler phases -}
tokenize :: CompilerPhase String [Token]
tokenize = undefined

parse :: CompilerPhase [Token] AST
parse = undefined

optimize :: CompilerPhase AST AST
optimize = undefined

emit :: CompilerPhase AST String
emit = undefined

{- Compiler data types -}
type CompilerPhase a b = a -> CompilerConfiguration -> CompilerLog -> CompilerState -> (b, CompilerLog, CompilerState)

data Token = Token
data AST = AST
data CompilerConfiguration = CompilerConfiguration
data CompilerLog = CompilerLog
data CompilerState = CompilerState
data CompilerError = CompilerError