1 \documentclass[a4paper,11pt]{article}
3 \usepackage[margin=2.5cm]{geometry}
7 %format alpha = "\alpha"
10 \title{\textbf{Slightly Larger Harpy Tutorial}}
11 \author{Martin Grabmüller \and Dirk Kleeblatt}
15 \begin{abstract}\noindent
16 This tutorial is an introduction to some of the more advanced features
17 of Harpy. In the course of the tutorial, we will build a small
18 run-time compiler for a call-by-value lambda-calculus.
19 This document is written as a literate Haskell program, so you can
20 compile and run it without any modifications.
23 \section{Introduction}
25 In this tutorial, we develop a small run-time compiler for a simple
26 call-by-value lambda calculus extended with natural number constants.
28 Before reading, you should have read the Harpy tutorial, which
29 introduces the most basic concepts. We also recommend to have a
30 running \verb|ghci| session running for loading the literate Haskell
31 script of this tutorial and to try it out. Maybe you want to have the
32 Harpy API documentation available, too.
36 To make use of Harpy and (a subset of) the x86 assembler instructions, we need
37 to import some modules.
40 import Harpy.CodeGenMonad
41 import Harpy.X86Assembler
42 import Harpy.X86Disassembler
45 import Control.Monad.Trans
49 The following module system hackery is necessary, because the
50 |Harpy.X86Assembler| module exports a method called
51 |div|, which clashes with the |div| function
52 exported by the Haskell Prelude. The prelude |div| will be
53 hidden, but is still available as a qualified name.
56 import Prelude hiding (div)
57 import qualified Prelude
60 \subsection{Data Types}
62 In the following, we define several data types for representing lambda
65 A |Name| represents a variable name. In a real
66 implementation, it would probably be more complicated data type for
67 handling qualified or compiler-generated names. For now, a string
74 In our implementation, a variable may refer to (a) a global variable,
75 which is represented by its address, (b) a parameter, which is
76 represented by the number of enclosing lambda expression, or (c) a
77 free variable, which is represented by its index into the current code
81 data VarLoc = Global Word32
87 A compile-time environment represents a mapping from variable names to
91 data Env = Env [(Name, VarLoc)]
94 An expression is represented by the following data type and may be a
95 variable, an immediate value (integer constant), a lambda abstraction
96 or an application expression.
105 The |State| type represents the state of a running program.
106 It consists of a pointer to the next available word of heap memory,
107 and a pointer to the end of the heap. These two values are necessary
108 for dynamically allocating memory and for performing heap-overflow
112 data State = State { heap :: Ptr Word32,
113 heapEnd :: Ptr Word32}
116 The following test expressions are constructed using the constructors of
117 type |Exp| and are to be evaluated by the main program.
120 testExp0 = App (Abs "x" (Var "x")) (Imm 12)
121 testExp1 = App (App (Abs "x" (Abs "y" (Var "x"))) (Imm 12)) (Imm 23)
124 \subsection{Main Program}
126 Before presenting the compiler, we will have a look at how it is used.
127 The entry point to the dynamic compiler is the function
128 |interpret|, which takes an expression and returns the value
129 calculated by the expression. The main program simply prints the
133 $(callDecl "callFunc" [t|Word32|])
138 main = do res <- interpret testExp1
139 putStrLn $ "Result: " ++ show res
142 The |interpret| function allocates 16 kbytes of heap and
143 compiles the expression. The |State| of the compiler is
144 passed to the |runCodeGen| function as a user state.
147 interpret :: Exp -> IO Word32
149 do let heapSize = 1024 * 16
150 heap <- mallocBytes heapSize
151 (_, Right res) <- runCodeGen (compileTop exp) (Env [])
152 (State heap (heap `plusPtr`
153 (heapSize `Prelude.div` 4)))
157 \subsection{Compilation}
159 The compilation starts in the function |compileTop|, which emits code
160 for creating a stack frame, saving all general-purpose registers
161 (except \verb|eax|, which will contain the result of the execution),
162 and finally compiles the expression. Afterwards, all registers are
163 restored and the function is called. In addition we call the
164 disassembler on the generated code and print it out. This is a useful
165 debuggin aid when generating assembler code.
168 compileTop :: Exp -> CodeGen Env State Word32
177 mov ecx ((fromIntegral (ptrToWordPtr (heap st))) :: Word32)
188 liftIO $ mapM_ (putStrLn . showIntel) dis
193 \subsubsection{Handling Variables}
195 The |findVar| function looks up a variable in the
196 environment and returns its binding.
203 Nothing -> error ("unbound variable: " ++ s)
206 For the implementation of closures, we need to calculate free
207 variables of expressions, so that we know what to store into closures
208 and how to access free variables from the body of a lambda expression.
211 freeVariables (Var s) = return [s]
212 freeVariables (Imm _) = return []
213 freeVariables (Abs x b) = liftM (delete x) (freeVariables b)
214 freeVariables (App e1 e2) = liftM2 union (freeVariables e1) (freeVariables e2)
217 \subsubsection{Heap Allocation}
219 Closures are the only dynamically allocated structures in our little
220 compiler. They are allocated from a contiguous block of malloc'ed
221 memory, which is passed in the code generator monad's state. The
222 following function generates code for allocating $w$ words of memory
223 in the heap, and returning a pointer to the memory in register
224 \verb|eax|. On heap overflow, a breakpoint instruction is executed.
230 cmp ecx (fromIntegral (ptrToWordPtr (heapEnd st)) :: Word32)
235 add ecx ((4 * w) :: Word32)
238 \subsubsection{The Compilation Proper}
240 Finally, the |compile| function compiles an expression.
241 Constants are simply loaded into the result register, values of
242 variables are retrieved from their respective addresses (global
243 variables from absolute addresses, parameters relative to the stack
244 base pointer, and free variables relative to the closure pointer.
247 compile :: Exp -> CodeGen Env State ()
249 do mov eax (fromIntegral i :: Word32)
253 Global addr -> mov eax (Addr addr)
254 Param ofs -> mov eax (Disp (fromIntegral ofs), ebp)
255 Free ofs -> mov eax (Disp (fromIntegral ofs), esi)
258 Lambda expressions are more complicated. We first need to calculate
259 the set of free variables of the body expression. Then the body of
260 the expression is compiled (with a jump around its code, so that it is
261 not actually executed at this point). Then a closure record is
262 allocated on the heap, a code pointer to the body is stored in the
263 first word, and the values of all free variables of the body are
264 calculated and moved into the remaining fields of the closure.
266 The result is a pointer to the new closure, returned in \verb|eax|, as
270 compile exp@(Abs x body) =
271 do fv <- freeVariables exp
272 let frees = zip fv (map Free [4,8..])
278 mov esi (Disp 8, ebp)
279 withEnv (Env ((x, Param 12) : frees)) (compile body)
284 emitAlloc (fromIntegral (length frees + 1))
289 mapM_ (\ (x, Free ofs) ->
291 mov (Disp (fromIntegral ofs), edi) eax)
295 compile (App e1 e2) =
301 add esp (8 :: Word32)
306 This completes our little just-in-time compiled lambda calculus
307 evaluator. We hope that this tutorial makes clear many of the fine
308 points in using the library.
310 Happy Harpy Hacking in Haskell!