A good programming language will have many libraries building on a small set of core features. Writing and distributing libraries is much easier than dealing with changes to a language implementation. Of course, the choice of core features affects the scope of things we can build as libraries. We want a very small core that still allows us to build anything.
The lambda calculus can implement any computable function, and encode arbitrary data types. Technically, it's all we need to instruct a computer. But programs also need to be written and understood by humans. We fleshy meatbags will soon get lost in a sea of unadorned lambdas. Our languages need to have more structure.
As an example, the Scheme programming language is explicitly based on the lambda calculus. But it adds syntactic special forms for definitions, variable binding, conditionals, etc. Scheme also lets the programmer define new syntactic forms as macros translating to existing syntax. Indeed,
lambda and the macro system are enough to implement some of the standard special forms.
But we can do better. There's a simple abstraction which lets us define
lambda, Lisp or Scheme macros, and all the other special forms as mere library code. This idea was known as "fexprs" in old Lisps, and more recently as "operatives" in John Shutt's programming language Kernel. Shutt's PhD thesis [PDF] has been a vital resource for learning about this stuff; I'm slowly making my way through its 416 pages.
What I understand so far can be summarized by something self-contained and kind of cool. Here's the agenda:
I'll describe a tiny programming language named Qoppa. Its S-expression syntax and basic data types are borrowed from Scheme. Qoppa has no special forms, and a small set of built-in operatives.
We'll write a Qoppa interpreter in Scheme.
We'll write a library for Qoppa which implements enough Scheme features to run the Qoppa interpreter.
We'll use this nested interpreter to very slowly compute the factorial of 5.
All of the code is on GitHub, if you'd like to see it in one place.
Operatives in Qoppa
An operative is a first-class value: it can be passed to and from functions, stored in data structures, and so forth. To use an operative, you apply it to some arguments, much like a function. The difference is that
The operative receives its arguments as unevaluated syntax trees, and
The operative also gets an argument representing the variable-binding environment at the call site.
Just as Scheme's functions are constructed by the
lambda syntax, Qoppa's operatives are constructed by
vau. Here's a simple example:
(define quote (vau (x) env x))
We bind a single argument as
x, and bind the caller's environment as
env. (Since we don't use
env, we could replace it with
_, which means to ignore the argument in that position, like Haskell's
_ or Kernel's
#ignore.) The body of the
vau says to return the argument
So this implements Scheme's
quote special form. If we evaluate the expression
(quote x) we'll get the symbol
x. As it happens,
quote is used sparingly in Qoppa. There is usually a cleaner alternative, as we'll see.
Here's another operative:
(define list (vau xs env (if (null? xs) (quote ()) (cons (eval env (car xs)) (eval env (cons list (cdr xs)))))))
list operative does the same thing as Scheme's
list function: it evaluates any number of arguments and returns them in a list. So
(list (+ 2 2) 3) evaluates to the list
list is just
(lambda xs xs). In Qoppa it's more involved, because we must explicitly evaluate each argument. This is the hallmark of (meta)programming with operatives: we selectively evaluate using
eval, rather than selectively suppressing evaluation using
The last part of this code deserves closer scrutiny:
(eval env (cons list (cdr xs)))
What if the caller's environment
env contains a local binding for the name
list? Not to worry, because we aren't quoting the name
list. We're building a cons pair whose car is the value of
list... an operative! Supposing
(1 2 3), the expression
(cons list (cdr xs))
evaluates to the list
(<some-value-representing-an-operative> 2 3)
and that's what
eval sees. Just like
lambda, evaluating a
vau expression captures the current environment. When the resulting operative is used, the
vau body gets values from this captured static environment, not the dynamic argument of the caller. So we have lexical scoping by default, with the option of dynamic scoping thanks to that
Compare this situation with Lisp or Scheme macros. Lisp macros build code which refers to external stuff by name. Maintaining macro hygiene requires constant attention by the programmer. Scheme's macros are hygienic by default, but the macro system is far more complex. Rather than writing ordinary functions, we have to use one of several special-purpose sublanguages. Operatives provide the safety of Scheme macros, but (like Lisp macros) they use only the core computational features of the language.
Now that you have a taste of what the language is like, let's write a Qoppa interpreter in Scheme.
We will represent an environment as a list of frames, where a frame is simply an association list. Within the
vau body in
( (vau (x) _ x) 3 )
the current environment would be something like
( ;; local frame ((x 3)) ;; global frame ((cons <operative>) (car <operative>) ...) )
Here's a Scheme function to build a frame from some names and the corresponding values.
(define (bind param val) (cond ((and (null? param) (null? val)) '()) ((eq? param '_) '()) ((symbol? param) (list (list param val))) ((and (pair? param) (pair? val)) (append (bind (car param) (car val)) (bind (cdr param) (cdr val)))) (else (error "can't bind" param val))))
We allow names and values to be arbitrary trees, so for example
(bind '((a b) . c) '((1 2) 3 4))
((a 1) (b 2) (c (3 4)))
(If you'll recall,
(x . y) is the pair formed by
(cons 'x 'y), an improper list.) The generality of
bind means our argument-binding syntax — in
let, etc. — will be richer than Scheme's.
Next, a function to find a
(name value) entry, given the name and an environment. This just invokes
assq on each frame until we find a match.
(define (m-lookup name env) (if (null? env) (error "could not find" name) (let ((binding (assq name (car env)))) (if binding binding (m-lookup name (cdr env))))))
We also need a representation for operatives. A simple choice is that a Qoppa operative is represented by a Scheme procedure that takes the operands and current environment as arguments. Now we can write the Qoppa evaluator itself.
(define (m-eval env exp) (cond ((symbol? exp) (cadr (m-lookup exp env))) ((pair? exp) (m-operate env (m-eval env (car exp)) (cdr exp))) (else exp))) (define (m-operate env operative operands) (operative env operands))
The evaluator has only three cases. If
exp is a symbol, it refers to a value in the current environment. If it's a cons pair, the car must evaluate to an operative and the cdr holds operands. Anything else evaluates to itself: numbers, strings, Booleans, and Qoppa operatives (represented by Scheme procedures).
Instead of the traditional eval and apply we have "eval" and "operate". Thanks to our uniform representation of operatives, the latter is very simple.
Now we need to populate the global environment with useful built-in operatives.
vau is the most significant of these. Here is its corresponding Scheme procedure.
(define (m-vau static-env vau-operands) (let ((params (car vau-operands)) (env-param (cadr vau-operands)) (body (caddr vau-operands))) (lambda (dynamic-env operands) (m-eval (cons (bind (cons env-param params) (cons dynamic-env operands)) static-env) body))))
vau, you provide a parameter tree, a name for the caller's environment, and a body. The result of applying
vau is an operative which, when applied, evaluates that body. It does so in the environment captured by
vau, extended with arguments.
Here's the global environment:
(define (make-global-frame) (define (wrap-primitive fun) (lambda (env operands) (apply fun (map (lambda (exp) (m-eval env exp)) operands)))) (list (list 'vau m-vau) (list 'eval (wrap-primitive m-eval)) (list 'operate (wrap-primitive m-operate)) (list 'lookup (wrap-primitive m-lookup)) (list 'bool (wrap-primitive (lambda (b t f) (if b t f)))) (list 'eq? (wrap-primitive eq?)) ; more like these )) (define global-env (list (make-global-frame)))
vau, each built-in operative evaluates all of its arguments. That's what
wrap-primitive accomplishes. We can think of these as functions, whereas
vau is something more exotic.
We expose the interpreter's
m-operate, which are essential for building new features as library code. We could implement
lookup as library code; providing it here just prevents some code duplication.
The other functions inherited from Scheme are:
Scheme as a Qoppa library
The Qoppa interpreter uses Scheme syntax like
if, etc. Qoppa itself supports none of this; all we get is
vau and some basic data types. But this is enough to build a Qoppa library which provides all the Scheme features we used in the interpreter. This code starts out very cryptic, and becomes easier to read as we have more high-level features available. You can read through the full library if you like. This section will go over some of the more interesting parts.
Our first task is a bit of a puzzle: how do you define
define? It's only possible because we expose the interpreter's representation of environments. We can push a new binding onto the top frame of
env, like so:
(set-car! env (cons (cons <name> (cons <value> null)) (car env)))
We use this idea twice, once inside the
vau body for
define, and once to define
((vau (name-of-define null) env (set-car! env (cons (cons name-of-define (cons (vau (name exp) defn-env (set-car! defn-env (cons (cons name (cons (eval defn-env exp) null)) (car defn-env)))) null)) (car env)))) define ())
Next we'll define Scheme's
if, which evaluates one branch or the other. We do this in terms of the Qoppa builtin
bool, which always evaluates both branches.
(define if (vau (b t f) env (eval env (bool (eval env b) t f))))
We already saw the code for
list, which evaluates each of its arguments. Many other operatives have this behavior, so we should abstract out the idea of "evaluate all arguments". The operative
wrap takes an operative and returns a transformed version of that operative, which evaluates all of its arguments.
(define wrap (vau (operative) oper-env (vau args args-env (operate args-env (eval oper-env operative) (operate args-env list args)))))
Now we can implement
lambda as an operative that builds a
evals it, and then
wraps the resulting operative.
(define lambda (vau (params body) static-env (wrap (eval static-env (list vau params '_ body)))))
This works just like Scheme's
(define fact (lambda (n) (if (<= n 1) 1 (* n (fact (- n 1))))))
Actually, it's incomplete, because Scheme's
lambda allows an arbitrary number of expressions in the body. In other words Scheme's
(lambda (x) a b c)
is syntactic sugar for
(lambda (x) (begin a b c))
begin evaluates its arguments in order left to right, and returns the value of the last one. In Scheme it's a special form, because normal argument evaluation happens in an undefined order. By contrast, the Qoppa interpreter implements a left-to-right order, so we'll define
begin as a function.
(define last (lambda (xs) (if (null? (cdr xs)) (car xs) (last (cdr xs))))) (define begin (lambda xs (last xs)))
Now we can mutate the binding for
lambda to support multiple expressions.
(define set! (vau (name exp) env (set-cdr! (lookup name env) (list (eval env exp))))) (set! lambda ((lambda (base-lambda) (vau (param . body) env (eval env (list base-lambda param (cons begin body))))) lambda))
Note the structure
((lambda (base-lambda) ...) lambda)
which holds on to the original
lambda operative, in a private frame. That's right, we're using
lambda to save
lambda so we can overwrite
lambda. We use the same approach when defining other sugar, such as the implicit
There are some more bits of Scheme we need to implement:
append, and so forth. These are mostly straightforward; read the code if you want the full story. By far the most troublesome was Scheme's
apply function, which takes a function and a list of arguments, and is supposed to apply the function to those arguments. The problem is that our functions are really operatives, and expect to call
eval on each of their arguments. If we already have the values in a list, how do we pass them on?
Qoppa and Kernel have very different solutions to this problem. In Kernel, "applicatives" (things that evaluate all their arguments) are a distinct type from operatives.
wrap is the primitive constructor of applicatives, and its inverse
unwrap is used to implement
apply. This design choice simplifies
apply but complicates the core evaluator, which needs to distinguish applicatives from operatives.
For Qoppa I implemented
wrap as a library function, which we saw before. But then we don't have
apply takes the uglier approach of quoting each argument to prevent double-evaluation.
(define apply (wrap (vau (operative args) env (eval env (cons operative (map (lambda (x) (list quote x)) args))))))
In either Kernel or Qoppa, you're not allowed to apply
apply to something that doesn't evaluate all of its arguments.
The code we saw above is split into two files:
qoppa.scmis the Qoppa interpreter, written in Scheme
prelude.qopis the Qoppa code which defines
I defined a procedure
execute-file which reads a file from disk and runs each expression through
m-eval. The last line of
so the definitions in
prelude.qop are available immediately.
$ guile -l qoppa.scm guile> (m-eval global-env '(fact 5)) $1 = 120
This establishes that we've implemented the features used by
fact, such as
lambda. But did we actually implement enough to run the Qoppa interpreter? To test this, we need to go deeper.
guile> (execute-file "qoppa.scm") $2 = done guile> (m-eval global-env '(m-eval global-env '(fact 5))) $3 = 120
This is factorial implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme (implemented in C). Of course it's outrageously slow; on my machine this
(fact 5) takes about 5 minutes. But it demonstrates that a tiny language of operatives, augmented with an appropriate library, can provide enough syntactic features to run a non-trivial Scheme program. As for how to do this efficiently, well, I haven't got far enough into the literature to have any idea.