A good programming language will have many libraries building on a small set of core features. Writing and distributing libraries is much easier than dealing with changes to a language implementation. Of course, the choice of core features affects the scope of things we can build as libraries. We want a very small core that still allows us to build anything.
The lambda calculus can implement any computable function, and encode arbitrary data types. Technically, it's all we need to instruct a computer. But programs also need to be written and understood by humans. We fleshy meatbags will soon get lost in a sea of unadorned lambdas. Our languages need to have more structure.
As an example, the Scheme programming language is explicitly based on the lambda calculus. But it adds syntactic special forms for definitions, variable binding, conditionals, etc. Scheme also lets the programmer define new syntactic forms as macros translating to existing syntax. Indeed, lambda
and the macro system are enough to implement some of the standard special forms.
But we can do better. There's a simple abstraction which lets us define lambda
, Lisp or Scheme macros, and all the other special forms as mere library code. This idea was known as "fexprs" in old Lisps, and more recently as "operatives" in John Shutt's programming language Kernel. Shutt's PhD thesis [PDF] has been a vital resource for learning about this stuff; I'm slowly making my way through its 416 pages.
What I understand so far can be summarized by something self-contained and kind of cool. Here's the agenda:
I'll describe a tiny programming language named Qoppa. Its S-expression syntax and basic data types are borrowed from Scheme. Qoppa has no special forms, and a small set of built-in operatives.
We'll write a Qoppa interpreter in Scheme.
We'll write a library for Qoppa which implements enough Scheme features to run the Qoppa interpreter.
We'll use this nested interpreter to very slowly compute the factorial of 5.
All of the code is on GitHub, if you'd like to see it in one place.
Operatives in Qoppa
An operative is a first-class value: it can be passed to and from functions, stored in data structures, and so forth. To use an operative, you apply it to some arguments, much like a function. The difference is that
The operative receives its arguments as unevaluated syntax trees, and
The operative also gets an argument representing the variable-binding environment at the call site.
Just as Scheme's functions are constructed by the lambda
syntax, Qoppa's operatives are constructed by vau
. Here's a simple example:
(define quote
(vau (x) env
x))
We bind a single argument as x
, and bind the caller's environment as env
. (Since we don't use env
, we could replace it with _
, which means to ignore the argument in that position, like Haskell's _
or Kernel's #ignore
.) The body of the vau
says to return the argument x
, unevaluated.
So this implements Scheme's quote
special form. If we evaluate the expression (quote x)
we'll get the symbol x
. As it happens, quote
is used sparingly in Qoppa. There is usually a cleaner alternative, as we'll see.
Here's another operative:
(define list (vau xs env
(if (null? xs)
(quote ())
(cons
(eval env (car xs))
(eval env (cons list (cdr xs)))))))
This list
operative does the same thing as Scheme's list
function: it evaluates any number of arguments and returns them in a list. So (list (+ 2 2) 3)
evaluates to the list (4 3)
.
In Scheme, list
is just (lambda xs xs)
. In Qoppa it's more involved, because we must explicitly evaluate each argument. This is the hallmark of (meta)programming with operatives: we selectively evaluate using eval
, rather than selectively suppressing evaluation using quote
.
The last part of this code deserves closer scrutiny:
(eval env (cons list (cdr xs)))
What if the caller's environment env
contains a local binding for the name list
? Not to worry, because we aren't quoting the name list
. We're building a cons pair whose car is the value of list
... an operative! Supposing xs
is (1 2 3)
, the expression
(cons list (cdr xs))
evaluates to the list
(<some-value-representing-an-operative> 2 3)
and that's what eval
sees. Just like lambda
, evaluating a vau
expression captures the current environment. When the resulting operative is used, the vau
body gets values from this captured static environment, not the dynamic argument of the caller. So we have lexical scoping by default, with the option of dynamic scoping thanks to that env
parameter.
Compare this situation with Lisp or Scheme macros. Lisp macros build code which refers to external stuff by name. Maintaining macro hygiene requires constant attention by the programmer. Scheme's macros are hygienic by default, but the macro system is far more complex. Rather than writing ordinary functions, we have to use one of several special-purpose sublanguages. Operatives provide the safety of Scheme macros, but (like Lisp macros) they use only the core computational features of the language.
Implementing Qoppa
Now that you have a taste of what the language is like, let's write a Qoppa interpreter in Scheme.
We will represent an environment as a list of frames, where a frame is simply an association list. Within the vau
body in
( (vau (x) _ x) 3 )
the current environment would be something like
( ;; local frame
((x 3))
;; global frame
((cons <operative>)
(car <operative>)
...) )
Here's a Scheme function to build a frame from some names and the corresponding values.
(define (bind param val) (cond
((and (null? param) (null? val))
'())
((eq? param '_)
'())
((symbol? param)
(list (list param val)))
((and (pair? param) (pair? val))
(append
(bind (car param) (car val))
(bind (cdr param) (cdr val))))
(else
(error "can't bind" param val))))
We allow names and values to be arbitrary trees, so for example
(bind
'((a b) . c)
'((1 2) 3 4))
evaluates to
((a 1)
(b 2)
(c (3 4)))
(If you'll recall, (x . y)
is the pair formed by (cons 'x 'y)
, an improper list.) The generality of bind
means our argument-binding syntax — in vau
, lambda
, let
, etc. — will be richer than Scheme's.
Next, a function to find a (name value)
entry, given the name and an environment. This just invokes assq
on each frame until we find a match.
(define (m-lookup name env)
(if (null? env)
(error "could not find" name)
(let ((binding (assq name (car env))))
(if binding
binding
(m-lookup name (cdr env))))))
We also need a representation for operatives. A simple choice is that a Qoppa operative is represented by a Scheme procedure that takes the operands and current environment as arguments. Now we can write the Qoppa evaluator itself.
(define (m-eval env exp) (cond
((symbol? exp)
(cadr (m-lookup exp env)))
((pair? exp)
(m-operate env (m-eval env (car exp)) (cdr exp)))
(else
exp)))
(define (m-operate env operative operands)
(operative env operands))
The evaluator has only three cases. If exp
is a symbol, it refers to a value in the current environment. If it's a cons pair, the car must evaluate to an operative and the cdr holds operands. Anything else evaluates to itself: numbers, strings, Booleans, and Qoppa operatives (represented by Scheme procedures).
Instead of the traditional eval and apply we have "eval" and "operate". Thanks to our uniform representation of operatives, the latter is very simple.
Qoppa builtins
Now we need to populate the global environment with useful built-in operatives. vau
is the most significant of these. Here is its corresponding Scheme procedure.
(define (m-vau static-env vau-operands)
(let ((params (car vau-operands))
(env-param (cadr vau-operands))
(body (caddr vau-operands)))
(lambda (dynamic-env operands)
(m-eval
(cons
(bind
(cons env-param params)
(cons dynamic-env operands))
static-env)
body))))
When applying vau
, you provide a parameter tree, a name for the caller's environment, and a body. The result of applying vau
is an operative which, when applied, evaluates that body. It does so in the environment captured by vau
, extended with arguments.
Here's the global environment:
(define (make-global-frame)
(define (wrap-primitive fun)
(lambda (env operands)
(apply fun (map (lambda (exp) (m-eval env exp)) operands))))
(list
(list 'vau m-vau)
(list 'eval (wrap-primitive m-eval))
(list 'operate (wrap-primitive m-operate))
(list 'lookup (wrap-primitive m-lookup))
(list 'bool (wrap-primitive (lambda (b t f) (if b t f))))
(list 'eq? (wrap-primitive eq?))
; more like these
))
(define global-env (list (make-global-frame)))
Other than vau
, each built-in operative evaluates all of its arguments. That's what wrap-primitive
accomplishes. We can think of these as functions, whereas vau
is something more exotic.
We expose the interpreter's m-eval
and m-operate
, which are essential for building new features as library code. We could implement lookup
as library code; providing it here just prevents some code duplication.
The other functions inherited from Scheme are:
Type predicates:
null?
symbol?
pair?
Pairs:
cons
car
cdr
set-car!
set-cdr!
Arithmetic:
+
*
-
/
<=
=
I/O:
error
display
open-input-file
read
eof-object
Scheme as a Qoppa library
The Qoppa interpreter uses Scheme syntax like lambda
, define
, let
, if
, etc. Qoppa itself supports none of this; all we get is vau
and some basic data types. But this is enough to build a Qoppa library which provides all the Scheme features we used in the interpreter. This code starts out very cryptic, and becomes easier to read as we have more high-level features available. You can read through the full library if you like. This section will go over some of the more interesting parts.
Our first task is a bit of a puzzle: how do you define define
? It's only possible because we expose the interpreter's representation of environments. We can push a new binding onto the top frame of env
, like so:
(set-car! env
(cons
(cons <name> (cons <value> null))
(car env)))
We use this idea twice, once inside the vau
body for define
, and once to define define
itself.
((vau (name-of-define null) env
(set-car! env (cons
(cons name-of-define
(cons (vau (name exp) defn-env
(set-car! defn-env (cons
(cons name (cons (eval defn-env exp) null))
(car defn-env))))
null))
(car env))))
define ())
Next we'll define Scheme's if
, which evaluates one branch or the other. We do this in terms of the Qoppa builtin bool
, which always evaluates both branches.
(define if (vau (b t f) env
(eval env
(bool (eval env b) t f))))
We already saw the code for list
, which evaluates each of its arguments. Many other operatives have this behavior, so we should abstract out the idea of "evaluate all arguments". The operative wrap
takes an operative and returns a transformed version of that operative, which evaluates all of its arguments.
(define wrap (vau (operative) oper-env
(vau args args-env
(operate args-env
(eval oper-env operative)
(operate args-env list args)))))
Now we can implement lambda
as an operative that builds a vau
term, eval
s it, and then wraps
the resulting operative.
(define lambda (vau (params body) static-env
(wrap
(eval static-env
(list vau params '_ body)))))
This works just like Scheme's lambda
:
(define fact (lambda (n)
(if (<= n 1)
1
(* n (fact (- n 1))))))
Actually, it's incomplete, because Scheme's lambda
allows an arbitrary number of expressions in the body. In other words Scheme's
(lambda (x) a b c)
is syntactic sugar for
(lambda (x) (begin a b c))
begin
evaluates its arguments in order left to right, and returns the value of the last one. In Scheme it's a special form, because normal argument evaluation happens in an undefined order. By contrast, the Qoppa interpreter implements a left-to-right order, so we'll define begin
as a function.
(define last (lambda (xs)
(if (null? (cdr xs))
(car xs)
(last (cdr xs)))))
(define begin (lambda xs (last xs)))
Now we can mutate the binding for lambda
to support multiple expressions.
(define set! (vau (name exp) env
(set-cdr!
(lookup name env)
(list (eval env exp)))))
(set! lambda
((lambda (base-lambda)
(vau (param . body) env
(eval env (list base-lambda param (cons begin body)))))
lambda))
Note the structure
((lambda (base-lambda) ...) lambda)
which holds on to the original lambda
operative, in a private frame. That's right, we're using lambda
to save lambda
so we can overwrite lambda
. We use the same approach when defining other sugar, such as the implicit lambda
in define
.
There are some more bits of Scheme we need to implement: cond
, let
, map
, append
, and so forth. These are mostly straightforward; read the code if you want the full story. By far the most troublesome was Scheme's apply
function, which takes a function and a list of arguments, and is supposed to apply the function to those arguments. The problem is that our functions are really operatives, and expect to call eval
on each of their arguments. If we already have the values in a list, how do we pass them on?
Qoppa and Kernel have very different solutions to this problem. In Kernel, "applicatives" (things that evaluate all their arguments) are a distinct type from operatives. wrap
is the primitive constructor of applicatives, and its inverse unwrap
is used to implement apply
. This design choice simplifies apply
but complicates the core evaluator, which needs to distinguish applicatives from operatives.
For Qoppa I implemented wrap
as a library function, which we saw before. But then we don't have unwrap
. So apply
takes the uglier approach of quoting each argument to prevent double-evaluation.
(define apply (wrap (vau (operative args) env
(eval env (cons
operative
(map (lambda (x) (list quote x)) args))))))
In either Kernel or Qoppa, you're not allowed to apply apply
to something that doesn't evaluate all of its arguments.
Testing
The code we saw above is split into two files:
qoppa.scm
is the Qoppa interpreter, written in Schemeprelude.qop
is the Qoppa code which defineswrap
,lambda
, etc.
I defined a procedure execute-file
which reads a file from disk and runs each expression through m-eval
. The last line of qoppa.scm
is
(execute-file "prelude.qop")
so the definitions in prelude.qop
are available immediately.
We start by loading qoppa.scm
into a Scheme interpreter. I'm using Guile here, but I've actually tested this with a variety of R5RS implementations.
$ guile -l qoppa.scm
guile> (m-eval global-env '(fact 5))
$1 = 120
This establishes that we've implemented the features used by fact
, such as define
and lambda
. But did we actually implement enough to run the Qoppa interpreter? To test this, we need to go deeper.
guile> (execute-file "qoppa.scm")
$2 = done
guile> (m-eval global-env '(m-eval global-env '(fact 5)))
$3 = 120
This is factorial implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme (implemented in C). Of course it's outrageously slow; on my machine this (fact 5)
takes about 5 minutes. But it demonstrates that a tiny language of operatives, augmented with an appropriate library, can provide enough syntactic features to run a non-trivial Scheme program. As for how to do this efficiently, well, I haven't got far enough into the literature to have any idea.