Saturday, April 28, 2012

Scheme without special forms: a metacircular adventure

A good programming language will have many libraries building on a small set of core features. Writing and distributing libraries is much easier than dealing with changes to a language implementation. Of course, the choice of core features affects the scope of things we can build as libraries. We want a very small core that still allows us to build anything.

The lambda calculus can implement any computable function, and encode arbitrary data types. Technically, it's all we need to instruct a computer. But programs also need to be written and understood by humans. We fleshy meatbags will soon get lost in a sea of unadorned lambdas. Our languages need to have more structure.

As an example, the Scheme programming language is explicitly based on the lambda calculus. But it adds syntactic special forms for definitions, variable binding, conditionals, etc. Scheme also lets the programmer define new syntactic forms as macros translating to existing syntax. Indeed, lambda and the macro system are enough to implement some of the standard special forms.

But we can do better. There's a simple abstraction which lets us define lambda, Lisp or Scheme macros, and all the other special forms as mere library code. This idea was known as "fexprs" in old Lisps, and more recently as "operatives" in John Shutt's programming language Kernel. Shutt's PhD thesis [PDF] has been a vital resource for learning about this stuff; I'm slowly making my way through its 416 pages.

What I understand so far can be summarized by something self-contained and kind of cool. Here's the agenda:

  • I'll describe a tiny programming language named Qoppa. Its S-expression syntax and basic data types are borrowed from Scheme. Qoppa has no special forms, and a small set of built-in operatives.

  • We'll write a Qoppa interpreter in Scheme.

  • We'll write a library for Qoppa which implements enough Scheme features to run the Qoppa interpreter.

  • We'll use this nested interpreter to very slowly compute the factorial of 5.

All of the code is on GitHub, if you'd like to see it in one place.

Operatives in Qoppa

An operative is a first-class value: it can be passed to and from functions, stored in data structures, and so forth. To use an operative, you apply it to some arguments, much like a function. The difference is that

  1. The operative receives its arguments as unevaluated syntax trees, and

  2. The operative also gets an argument representing the variable-binding environment at the call site.

Just as Scheme's functions are constructed by the lambda syntax, Qoppa's operatives are constructed by vau. Here's a simple example:

(define quote
    (vau (x) env
        x))

We bind a single argument as x, and bind the caller's environment as env. (Since we don't use env, we could replace it with _, which means to ignore the argument in that position, like Haskell's _ or Kernel's #ignore.) The body of the vau says to return the argument x, unevaluated.

So this implements Scheme's quote special form. If we evaluate the expression (quote x) we'll get the symbol x. As it happens, quote is used sparingly in Qoppa. There is usually a cleaner alternative, as we'll see.

Here's another operative:

(define list (vau xs env
    (if (null? xs)
        (quote ())
        (cons
            (eval env (car xs))
            (eval env (cons list (cdr xs)))))))

This list operative does the same thing as Scheme's list function: it evaluates any number of arguments and returns them in a list. So (list (+ 2 2) 3) evaluates to the list (4 3).

In Scheme, list is just (lambda xs xs). In Qoppa it's more involved, because we must explicitly evaluate each argument. This is the hallmark of (meta)programming with operatives: we selectively evaluate using eval, rather than selectively suppressing evaluation using quote.

The last part of this code deserves closer scrutiny:

(eval env (cons list (cdr xs)))

What if the caller's environment env contains a local binding for the name list? Not to worry, because we aren't quoting the name list. We're building a cons pair whose car is the value of list... an operative! Supposing xs is (1 2 3), the expression

(cons list (cdr xs))

evaluates to the list

(<some-value-representing-an-operative> 2 3)

and that's what eval sees. Just like lambda, evaluating a vau expression captures the current environment. When the resulting operative is used, the vau body gets values from this captured static environment, not the dynamic argument of the caller. So we have lexical scoping by default, with the option of dynamic scoping thanks to that env parameter.

Compare this situation with Lisp or Scheme macros. Lisp macros build code which refers to external stuff by name. Maintaining macro hygiene requires constant attention by the programmer. Scheme's macros are hygienic by default, but the macro system is far more complex. Rather than writing ordinary functions, we have to use one of several special-purpose sublanguages. Operatives provide the safety of Scheme macros, but (like Lisp macros) they use only the core computational features of the language.

Implementing Qoppa

Now that you have a taste of what the language is like, let's write a Qoppa interpreter in Scheme.

We will represent an environment as a list of frames, where a frame is simply an association list. Within the vau body in

( (vau (x) _ x) 3 )

the current environment would be something like

( ;; local frame
  ((x 3))

  ;; global frame
  ((cons <operative>)
   (car  <operative>)
   ...) )

Here's a Scheme function to build a frame from some names and the corresponding values.

(define (bind param val) (cond
    ((and (null? param) (null? val))
        '())
    ((eq? param '_)
        '())
    ((symbol? param)
        (list (list param val)))
    ((and (pair? param) (pair? val))
        (append
            (bind (car param) (car val))
            (bind (cdr param) (cdr val))))
    (else
        (error "can't bind" param val))))

We allow names and values to be arbitrary trees, so for example

(bind
    '((a b) . c)
    '((1 2) 3 4))

evaluates to

((a 1)
 (b 2)
 (c (3 4)))

(If you'll recall, (x . y) is the pair formed by (cons 'x 'y), an improper list.) The generality of bind means our argument-binding syntax — in vau, lambda, let, etc. — will be richer than Scheme's.

Next, a function to find a (name value) entry, given the name and an environment. This just invokes assq on each frame until we find a match.

(define (m-lookup name env)
    (if (null? env)
        (error "could not find" name)
        (let ((binding (assq name (car env))))
            (if binding
                binding
                (m-lookup name (cdr env))))))

We also need a representation for operatives. A simple choice is that a Qoppa operative is represented by a Scheme procedure that takes the operands and current environment as arguments. Now we can write the Qoppa evaluator itself.

(define (m-eval env exp) (cond
    ((symbol? exp)
        (cadr (m-lookup exp env)))
    ((pair? exp)
        (m-operate env (m-eval env (car exp)) (cdr exp)))
    (else
        exp)))

(define (m-operate env operative operands)
    (operative env operands))

The evaluator has only three cases. If exp is a symbol, it refers to a value in the current environment. If it's a cons pair, the car must evaluate to an operative and the cdr holds operands. Anything else evaluates to itself: numbers, strings, Booleans, and Qoppa operatives (represented by Scheme procedures).

Instead of the traditional eval and apply we have "eval" and "operate". Thanks to our uniform representation of operatives, the latter is very simple.

Qoppa builtins

Now we need to populate the global environment with useful built-in operatives. vau is the most significant of these. Here is its corresponding Scheme procedure.

(define (m-vau static-env vau-operands)
    (let ((params    (car   vau-operands))
          (env-param (cadr  vau-operands))
          (body      (caddr vau-operands)))

        (lambda (dynamic-env operands)
            (m-eval
                (cons
                    (bind
                        (cons env-param   params)
                        (cons dynamic-env operands))
                    static-env)
                body))))

When applying vau, you provide a parameter tree, a name for the caller's environment, and a body. The result of applying vau is an operative which, when applied, evaluates that body. It does so in the environment captured by vau, extended with arguments.

Here's the global environment:

(define (make-global-frame)
    (define (wrap-primitive fun)
        (lambda (env operands)
            (apply fun (map (lambda (exp) (m-eval env exp)) operands))))
    (list
        (list 'vau m-vau)
        (list 'eval    (wrap-primitive m-eval))
        (list 'operate (wrap-primitive m-operate))
        (list 'lookup  (wrap-primitive m-lookup))
        (list 'bool    (wrap-primitive (lambda (b t f) (if b t f))))
        (list 'eq?     (wrap-primitive eq?))
        ; more like these
        ))

(define global-env (list (make-global-frame)))

Other than vau, each built-in operative evaluates all of its arguments. That's what wrap-primitive accomplishes. We can think of these as functions, whereas vau is something more exotic.

We expose the interpreter's m-eval and m-operate, which are essential for building new features as library code. We could implement lookup as library code; providing it here just prevents some code duplication.

The other functions inherited from Scheme are:

  • Type predicates: null? symbol? pair?

  • Pairs: cons car cdr set-car! set-cdr!

  • Arithmetic: + * - / <= =

  • I/O: error display open-input-file read eof-object

Scheme as a Qoppa library

The Qoppa interpreter uses Scheme syntax like lambda, define, let, if, etc. Qoppa itself supports none of this; all we get is vau and some basic data types. But this is enough to build a Qoppa library which provides all the Scheme features we used in the interpreter. This code starts out very cryptic, and becomes easier to read as we have more high-level features available. You can read through the full library if you like. This section will go over some of the more interesting parts.

Our first task is a bit of a puzzle: how do you define define? It's only possible because we expose the interpreter's representation of environments. We can push a new binding onto the top frame of env, like so:

(set-car! env
    (cons
        (cons <name> (cons <value> null))
        (car env)))

We use this idea twice, once inside the vau body for define, and once to define define itself.

((vau (name-of-define null) env
    (set-car! env (cons
        (cons name-of-define
            (cons (vau (name exp) defn-env
                    (set-car! defn-env (cons
                        (cons name (cons (eval defn-env exp) null))
                        (car defn-env))))
                null))
        (car env))))
    define ())

Next we'll define Scheme's if, which evaluates one branch or the other. We do this in terms of the Qoppa builtin bool, which always evaluates both branches.

(define if (vau (b t f) env
    (eval env
        (bool (eval env b) t f))))

We already saw the code for list, which evaluates each of its arguments. Many other operatives have this behavior, so we should abstract out the idea of "evaluate all arguments". The operative wrap takes an operative and returns a transformed version of that operative, which evaluates all of its arguments.

(define wrap (vau (operative) oper-env
    (vau args args-env
        (operate args-env
            (eval    oper-env operative)
            (operate args-env list args)))))

Now we can implement lambda as an operative that builds a vau term, evals it, and then wraps the resulting operative.

(define lambda (vau (params body) static-env
    (wrap
        (eval static-env
            (list vau params '_ body)))))

This works just like Scheme's lambda:

(define fact (lambda (n)
    (if (<= n 1)
        1
        (* n (fact (- n 1))))))

Actually, it's incomplete, because Scheme's lambda allows an arbitrary number of expressions in the body. In other words Scheme's

(lambda (x) a b c)

is syntactic sugar for

(lambda (x) (begin a b c))

begin evaluates its arguments in order left to right, and returns the value of the last one. In Scheme it's a special form, because normal argument evaluation happens in an undefined order. By contrast, the Qoppa interpreter implements a left-to-right order, so we'll define begin as a function.

(define last (lambda (xs)
    (if (null? (cdr xs))
        (car xs)
        (last (cdr xs)))))

(define begin (lambda xs (last xs)))

Now we can mutate the binding for lambda to support multiple expressions.

(define set! (vau (name exp) env
    (set-cdr!
        (lookup name env)
        (list (eval env exp)))))

(set! lambda
    ((lambda (base-lambda)
        (vau (param . body) env
            (eval env (list base-lambda param (cons begin body)))))
    lambda))

Note the structure

((lambda (base-lambda) ...) lambda)

which holds on to the original lambda operative, in a private frame. That's right, we're using lambda to save lambda so we can overwrite lambda. We use the same approach when defining other sugar, such as the implicit lambda in define.

There are some more bits of Scheme we need to implement: cond, let, map, append, and so forth. These are mostly straightforward; read the code if you want the full story. By far the most troublesome was Scheme's apply function, which takes a function and a list of arguments, and is supposed to apply the function to those arguments. The problem is that our functions are really operatives, and expect to call eval on each of their arguments. If we already have the values in a list, how do we pass them on?

Qoppa and Kernel have very different solutions to this problem. In Kernel, "applicatives" (things that evaluate all their arguments) are a distinct type from operatives. wrap is the primitive constructor of applicatives, and its inverse unwrap is used to implement apply. This design choice simplifies apply but complicates the core evaluator, which needs to distinguish applicatives from operatives.

For Qoppa I implemented wrap as a library function, which we saw before. But then we don't have unwrap. So apply takes the uglier approach of quoting each argument to prevent double-evaluation.

(define apply (wrap (vau (operative args) env
    (eval env (cons
        operative
        (map (lambda (x) (list quote x)) args))))))

In either Kernel or Qoppa, you're not allowed to apply apply to something that doesn't evaluate all of its arguments.

Testing

The code we saw above is split into two files:

  • qoppa.scm is the Qoppa interpreter, written in Scheme

  • prelude.qop is the Qoppa code which defines wrap, lambda, etc.

I defined a procedure execute-file which reads a file from disk and runs each expression through m-eval. The last line of qoppa.scm is

(execute-file "prelude.qop")

so the definitions in prelude.qop are available immediately.

We start by loading qoppa.scm into a Scheme interpreter. I'm using Guile here, but I've actually tested this with a variety of R5RS implementations.

$ guile -l qoppa.scm
guile> (m-eval global-env '(fact 5))
$1 = 120

This establishes that we've implemented the features used by fact, such as define and lambda. But did we actually implement enough to run the Qoppa interpreter? To test this, we need to go deeper.

guile> (execute-file "qoppa.scm")
$2 = done
guile> (m-eval global-env '(m-eval global-env '(fact 5)))
$3 = 120

This is factorial implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme (implemented in C). Of course it's outrageously slow; on my machine this (fact 5) takes about 5 minutes. But it demonstrates that a tiny language of operatives, augmented with an appropriate library, can provide enough syntactic features to run a non-trivial Scheme program. As for how to do this efficiently, well, I haven't got far enough into the literature to have any idea.

5 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. `apply' could still be made beautiful if you define it as an operative.

      Delete
  3. Look for Black on http://library.readscheme.org/page11.html and related work on that page and elsewhere by Kenichi Asai.

    ReplyDelete
  4. I'll have to read this in detail some other time, but I already like it for using two of the most obscure Greek letters ever. (One of those oddball facts that's clogging up my brain: many metrical anomalies in Homer are due to unwritten vaus.)

    ReplyDelete