An interpreter in Haskell
I haven’t touched Haskell for a couple of years. To get back into it, I made an interpreter for a small imperative language. Haskell is great for making interpreters!
First I define the syntactic structure.
My language has integers (its only native datatype),
pure operators on integers,
a set of global variables,
some control flow constructs,
and two I/O primitives.
Below the syntax is encoded as the type
E (for “expression”).
data E = -- literals EInt Int -- pure operators | EBinOp Op E E | ENot E -- global variables | EGet String | ESet String E -- control flow | EIf E E E | ESeq E E | EWhile E E | EDoWhile E E | ESkip -- I/O | EWriteByte E | EReadByte data Op = Add | Sub | Eq | Lt | Lte | And -- and many more
Here’s an example of a program written in this language,
writeXsForever :: E writeXsForever = EWhile (EInt 1) (EWriteByte (EInt 120))
You might be able to eyeball this expression and guess its intended behavior.
My intended behavior for this program is that it writes the byte
x to stdout repeatedly forever!
But to give this program meaning, I must define the interpreter.
In Haskell, the interpreter could be a function with the type:
eval :: E -> IO ()
My interpreter is more complex for two reasons.
First, since the interpreter is evaluating an expression,
it needs to return the value that expression evaluated to;
in my language this is always an
Second, my language has mutable global variables (
Map String Int)
which must be “threaded” through each evaluation.
eval :: Map.Map String Int -> E -> IO (Int, Map.Map String Int)
main function begins the interpreter by calling
eval on the root expression of the program:
main :: IO () main = do eval Map.empty writeXsForever return ()
$ ./jimscript xxxxxxxxxxxxxxxx^C $
Now I define
eval case-by-case on each syntactic form in
I start with perhaps the simplest,
a literal which evaluates to itself
and does not modify any variables:
eval vars (EInt i) = return (i, vars)
Next, I interpret
BinOp, the syntax form for all binary operators on integers.
Notice how we evaluate the left-hand side expression before the right-hand side,
and that this matters because of to the side-effects that expressions can have
on global variables and on input/output.
Notice the “threading” of
vars through each evaluation gets quite verbose
(I’ve chosen not to abstract this,
because I plan to implement more sophisticated variable “scoping” in future).
Also notice that
evalOp is rather tedious,
Op values and Haskell functions which implement them.
Much of the work in writing an interpreter is handed off to the host language!
eval vars (EBinOp op e1 e2) = do (val1, vars') <- eval vars e1 (val2, vars'') <- eval vars' e2 return (evalOp op val1 val2, vars'') evalOp :: Op -> Int -> Int -> Int evalOp Add a b = a + b evalOp Sub a b = a - b evalOp Eq a b = if a == b then 1 else 0 evalOp Lt a b = if a < b then 1 else 0 evalOp Lte a b = if a <= b then 1 else 0 evalOp And a b = if a == 0 || b == 0 then 0 else 1
The global variable map has primitive “get” and “set” expressions,
which are evaluated as follows.
Notice the call to
error if the variable isn’t set
(I’m not a Haskell purist).
eval vars (EGet var) = case Map.lookup var vars of Nothing -> error $ "no such variable: " ++ var Just x -> return (x, vars) eval vars (ESet var e) = do (val, vars') <- eval vars e return (val, Map.insert var val vars)
On to control flow,
an interesting one is
Its “looping” behavior is implemented using Haskell recursion;
notice the subcall evaluating a new
EWhile with the new global variable set:
eval vars (EWhile c e) = do (cond, vars') <- eval vars c case cond of 0 -> return (0, vars') _ -> do (_, vars'') <- eval vars' e eval vars'' (EWhile c e)
On to I/O, here’s the interpreter for
My language can only write to stdout,
but it could be extended to write to files, sockets and so on
(but this would want a native string datatype, not just integers).
eval vars (EWriteByte byteE) = do (byte, vars') <- eval vars byteE if byte < 0 then error $ "Tried to print byte < 0: " ++ show byte else if 255 < byte then error $ "Tried to print byte > 255: " ++ show byte else PosixIO.fdWrite PosixIO.stdOutput [Char.chr byte] return (byte, vars')
Now here are some more interesting JimScript programs:
writeTheAlphabet :: E writeTheAlphabet = ESeq (ESet "x" (EInt 1)) (EWhile (ENot (EBinOp Eq (EGet "x") (EInt 27))) (ESeq (EWriteByte (EBinOp Add (EInt 64) (EGet "x"))) (ESet "x" (EBinOp Add (EGet "x") (EInt 1)))))
$ ./jimscript ABCDEFGHIJKLMNOPQRSTUVWXYZ
uppercase :: E uppercase = (EDoWhile (ESeq (ESet "c" EReadByte) (EIf (EBinOp Eq (EGet "c") (EInt (-1))) ESkip (EIf (EBinOp And (EBinOp Lte (EInt 97) (EGet "c")) (EBinOp Lte (EGet "c") (EInt 122))) (EWriteByte (EBinOp Sub (EGet "c") (EInt 32))) (EWriteByte (EGet "c"))))) (ENot (EBinOp Eq (EGet "c") (EInt (-1)))))
$ ./jimscript hello HELLO
Programs in this language are Haskell expressions of type
there is no defined syntax.
I might define a syntax and write a parser next.
some of the
eval definitions were long-winded so I omitted them.
Here’s are the rest.
eval vars (ENot e) = do (v, vars') <- eval vars e case v of 0 -> return (1, vars) _ -> return (0, vars) eval vars (EIf c t e) = do (cond, vars') <- eval vars c case cond of 0 -> eval vars' e _ -> eval vars' t eval vars (EDoWhile e c) = do (_, vars') <- eval vars e (cond, vars'') <- eval vars' c case cond of 0 -> return (0, vars'') _ -> eval vars'' (EDoWhile e c) eval vars (ESeq e1 e2) = do (_, vars') <- eval vars e1 eval vars' e2 eval vars ESkip = return (0, vars) eval vars EReadByte = do exp :: Either Exception.SomeException (String,Foreign.C.Types.CSize) <- Exception.try (PosixIO.fdRead PosixIO.stdInput 1) case exp of Left _ -> return (-1, vars) Right (str,count) -> do if count == 0 then return (-1, vars) else do let [c] = str return (Char.ord c, vars)
More by Jim
- Your syntax highlighter is wrong
- Granddad died today
- The Three Ts of Time, Thought and Typing: measuring cost on the web
- I hate telephones
- The sorry state of OpenSSL usability
- The dots do matter: how to scam a Gmail user
- My parents are Flat-Earthers
- How Hacker News stays interesting
- Project C-43: the lost origins of asymmetric crypto
- The hacker hype cycle
- The inception bar: a new phishing method
- Time is running out to catch COVID-19
- A probabilistic pub quiz for nerds
- Smear phishing: a new Android vulnerability