An interpreter in Haskell

I haven’t touched Haskell for a couple of years. To get back into it, I made an interpreter for a small imperative language. Haskell is great for making interpreters!

First I define the syntactic structure. My language has integers (its only native datatype), pure operators on integers, a set of global variables, some control flow constructs, and two I/O primitives. Below the syntax is encoded as the type E (for “expression”).

data E =
  -- literals
    EInt Int
  -- pure operators
  | EBinOp Op E E
  | ENot E
  -- global variables
  | EGet String
  | ESet String E
  -- control flow
  | EIf E E E
  | ESeq E E
  | EWhile E E
  | EDoWhile E E
  | ESkip
  -- I/O
  | EWriteByte E
  | EReadByte

data Op = Add | Sub | Eq | Lt | Lte | And  -- and many more

Here’s an example of a program written in this language, called writeXsForever.

writeXsForever :: E
writeXsForever = EWhile (EInt 1) (EWriteByte (EInt 120))

You might be able to eyeball this expression and guess its intended behavior. My intended behavior for this program is that it writes the byte x to stdout repeatedly forever! But to give this program meaning, I must define the interpreter. In Haskell, the interpreter could be a function with the type:

eval :: E -> IO ()

My interpreter is more complex for two reasons. First, since the interpreter is evaluating an expression, it needs to return the value that expression evaluated to; in my language this is always an Int. Second, my language has mutable global variables (Map String Int) which must be “threaded” through each evaluation.

eval :: Map.Map String Int -> E -> IO (Int, Map.Map String Int)

The Haskell main function begins the interpreter by calling eval on the root expression of the program:

main :: IO ()
main = do
  eval Map.empty writeXsForever
  return ()

$ ./jimscript
xxxxxxxxxxxxxxxx^C
$

Now I define eval case-by-case on each syntactic form in E. I start with perhaps the simplest, EInt, a literal which evaluates to itself and does not modify any variables:

eval vars (EInt i) = return (i, vars)

Next, I interpret BinOp, the syntax form for all binary operators on integers. Notice how we evaluate the left-hand side expression before the right-hand side, and that this matters because of to the side-effects that expressions can have on global variables and on input/output. Notice the “threading” of vars through each evaluation gets quite verbose (I’ve chosen not to abstract this, because I plan to implement more sophisticated variable “scoping” in future). Also notice that evalOp is rather tedious, translating between Op values and Haskell functions which implement them. Much of the work in writing an interpreter is handed off to the host language!

eval vars (EBinOp op e1 e2) = do
  (val1, vars') <- eval vars e1
  (val2, vars'') <- eval vars' e2
  return (evalOp op val1 val2, vars'')

evalOp :: Op -> Int -> Int -> Int
evalOp Add a b = a + b
evalOp Sub a b = a - b
evalOp Eq a b = if a == b then 1 else 0
evalOp Lt a b = if a < b then 1 else 0
evalOp Lte a b = if a <= b then 1 else 0
evalOp And a b = if a == 0 || b == 0 then 0 else 1

The global variable map has primitive “get” and “set” expressions, which are evaluated as follows. Notice the call to error if the variable isn’t set (I’m not a Haskell purist).

eval vars (EGet var) = case Map.lookup var vars of
  Nothing -> error $ "no such variable: " ++ var
  Just x -> return (x, vars)
eval vars (ESet var e) = do
  (val, vars') <- eval vars e
  return (val, Map.insert var val vars)

On to control flow, an interesting one is EWhile. Its “looping” behavior is implemented using Haskell recursion; notice the subcall evaluating a new EWhile with the new global variable set:

eval vars (EWhile c e) = do
  (cond, vars') <- eval vars c
  case cond of
    0 -> return (0, vars')
    _ -> do
      (_, vars'') <- eval vars' e
      eval vars'' (EWhile c e)

On to I/O, here’s the interpreter for IWriteByte. My language can only write to stdout, but it could be extended to write to files, sockets and so on (but this would want a native string datatype, not just integers).

eval vars (EWriteByte byteE) = do
  (byte, vars') <- eval vars byteE
  if byte < 0 then error $ "Tried to print byte < 0: " ++ show byte
  else if 255 < byte then error $ "Tried to print byte > 255: " ++ show byte
  else PosixIO.fdWrite PosixIO.stdOutput [Char.chr byte]
  return (byte, vars')

Now here are some more interesting JimScript programs:

writeTheAlphabet :: E
writeTheAlphabet =
  ESeq
    (ESet "x" (EInt 1))
    (EWhile (ENot (EBinOp Eq (EGet "x") (EInt 27))) (ESeq
      (EWriteByte (EBinOp Add (EInt 64) (EGet "x")))
      (ESet "x" (EBinOp Add (EGet "x") (EInt 1)))))

$ ./jimscript
ABCDEFGHIJKLMNOPQRSTUVWXYZ

uppercase :: E
uppercase =
  (EDoWhile (ESeq
      (ESet "c" EReadByte)
      (EIf (EBinOp Eq (EGet "c") (EInt (-1)))
        ESkip
        (EIf (EBinOp And
                (EBinOp Lte (EInt 97) (EGet "c"))
                  (EBinOp Lte (EGet "c") (EInt 122)))
          (EWriteByte (EBinOp Sub (EGet "c") (EInt 32)))
          (EWriteByte (EGet "c")))))
    (ENot (EBinOp Eq (EGet "c") (EInt (-1)))))

$ ./jimscript
hello
HELLO

Programs in this language are Haskell expressions of type E; there is no defined syntax. I might define a syntax and write a parser next.

Addendum: some of the eval definitions were long-winded so I omitted them. Here’s are the rest.

eval vars (ENot e) = do
  (v, vars') <- eval vars e
  case v of
    0 -> return (1, vars)
    _ -> return (0, vars)
eval vars (EIf c t e) = do
  (cond, vars') <- eval vars c
  case cond of
    0 -> eval vars' e
    _ -> eval vars' t
eval vars (EDoWhile e c) = do
  (_, vars') <- eval vars e
  (cond, vars'') <- eval vars' c
  case cond of
    0 -> return (0, vars'')
    _ -> eval vars'' (EDoWhile e c)
eval vars (ESeq e1 e2) = do
  (_, vars') <- eval vars e1
  eval vars' e2
eval vars ESkip = return (0, vars)
eval vars EReadByte = do
  exp :: Either Exception.SomeException (String,Foreign.C.Types.CSize) <- Exception.try (PosixIO.fdRead PosixIO.stdInput 1)
  case exp of
    Left _ -> return (-1, vars)
    Right (str,count) -> do
      if count == 0 then
        return (-1, vars)
      else do
        let [c] = str
        return (Char.ord c, vars)

Tagged #programming, #haskell.

More by Jim

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2018. Content is not associated with my employer. Found an error? Edit this page.

An interpreter in Haskell

Similar posts

More by Jim