Saturday, April 28, 2012

Scheme without special forms: a metacircular adventure

A good programming language will have many libraries building on a small set of core features. Writing and distributing libraries is much easier than dealing with changes to a language implementation. Of course, the choice of core features affects the scope of things we can build as libraries. We want a very small core that still allows us to build anything.

The lambda calculus can implement any computable function, and encode arbitrary data types. Technically, it's all we need to instruct a computer. But programs also need to be written and understood by humans. We fleshy meatbags will soon get lost in a sea of unadorned lambdas. Our languages need to have more structure.

As an example, the Scheme programming language is explicitly based on the lambda calculus. But it adds syntactic special forms for definitions, variable binding, conditionals, etc. Scheme also lets the programmer define new syntactic forms as macros translating to existing syntax. Indeed, lambda and the macro system are enough to implement some of the standard special forms.

But we can do better. There's a simple abstraction which lets us define lambda, Lisp or Scheme macros, and all the other special forms as mere library code. This idea was known as "fexprs" in old Lisps, and more recently as "operatives" in John Shutt's programming language Kernel. Shutt's PhD thesis [PDF] has been a vital resource for learning about this stuff; I'm slowly making my way through its 416 pages.

What I understand so far can be summarized by something self-contained and kind of cool. Here's the agenda:

  • I'll describe a tiny programming language named Qoppa. Its S-expression syntax and basic data types are borrowed from Scheme. Qoppa has no special forms, and a small set of built-in operatives.

  • We'll write a Qoppa interpreter in Scheme.

  • We'll write a library for Qoppa which implements enough Scheme features to run the Qoppa interpreter.

  • We'll use this nested interpreter to very slowly compute the factorial of 5.

All of the code is on GitHub, if you'd like to see it in one place.

Operatives in Qoppa

An operative is a first-class value: it can be passed to and from functions, stored in data structures, and so forth. To use an operative, you apply it to some arguments, much like a function. The difference is that

  1. The operative receives its arguments as unevaluated syntax trees, and

  2. The operative also gets an argument representing the variable-binding environment at the call site.

Just as Scheme's functions are constructed by the lambda syntax, Qoppa's operatives are constructed by vau. Here's a simple example:

(define quote
    (vau (x) env

We bind a single argument as x, and bind the caller's environment as env. (Since we don't use env, we could replace it with _, which means to ignore the argument in that position, like Haskell's _ or Kernel's #ignore.) The body of the vau says to return the argument x, unevaluated.

So this implements Scheme's quote special form. If we evaluate the expression (quote x) we'll get the symbol x. As it happens, quote is used sparingly in Qoppa. There is usually a cleaner alternative, as we'll see.

Here's another operative:

(define list (vau xs env
    (if (null? xs)
        (quote ())
            (eval env (car xs))
            (eval env (cons list (cdr xs)))))))

This list operative does the same thing as Scheme's list function: it evaluates any number of arguments and returns them in a list. So (list (+ 2 2) 3) evaluates to the list (4 3).

In Scheme, list is just (lambda xs xs). In Qoppa it's more involved, because we must explicitly evaluate each argument. This is the hallmark of (meta)programming with operatives: we selectively evaluate using eval, rather than selectively suppressing evaluation using quote.

The last part of this code deserves closer scrutiny:

(eval env (cons list (cdr xs)))

What if the caller's environment env contains a local binding for the name list? Not to worry, because we aren't quoting the name list. We're building a cons pair whose car is the value of list... an operative! Supposing xs is (1 2 3), the expression

(cons list (cdr xs))

evaluates to the list

(<some-value-representing-an-operative> 2 3)

and that's what eval sees. Just like lambda, evaluating a vau expression captures the current environment. When the resulting operative is used, the vau body gets values from this captured static environment, not the dynamic argument of the caller. So we have lexical scoping by default, with the option of dynamic scoping thanks to that env parameter.

Compare this situation with Lisp or Scheme macros. Lisp macros build code which refers to external stuff by name. Maintaining macro hygiene requires constant attention by the programmer. Scheme's macros are hygienic by default, but the macro system is far more complex. Rather than writing ordinary functions, we have to use one of several special-purpose sublanguages. Operatives provide the safety of Scheme macros, but (like Lisp macros) they use only the core computational features of the language.

Implementing Qoppa

Now that you have a taste of what the language is like, let's write a Qoppa interpreter in Scheme.

We will represent an environment as a list of frames, where a frame is simply an association list. Within the vau body in

( (vau (x) _ x) 3 )

the current environment would be something like

( ;; local frame
  ((x 3))

  ;; global frame
  ((cons <operative>)
   (car  <operative>)
   ...) )

Here's a Scheme function to build a frame from some names and the corresponding values.

(define (bind param val) (cond
    ((and (null? param) (null? val))
    ((eq? param '_)
    ((symbol? param)
        (list (list param val)))
    ((and (pair? param) (pair? val))
            (bind (car param) (car val))
            (bind (cdr param) (cdr val))))
        (error "can't bind" param val))))

We allow names and values to be arbitrary trees, so for example

    '((a b) . c)
    '((1 2) 3 4))

evaluates to

((a 1)
 (b 2)
 (c (3 4)))

(If you'll recall, (x . y) is the pair formed by (cons 'x 'y), an improper list.) The generality of bind means our argument-binding syntax — in vau, lambda, let, etc. — will be richer than Scheme's.

Next, a function to find a (name value) entry, given the name and an environment. This just invokes assq on each frame until we find a match.

(define (m-lookup name env)
    (if (null? env)
        (error "could not find" name)
        (let ((binding (assq name (car env))))
            (if binding
                (m-lookup name (cdr env))))))

We also need a representation for operatives. A simple choice is that a Qoppa operative is represented by a Scheme procedure that takes the operands and current environment as arguments. Now we can write the Qoppa evaluator itself.

(define (m-eval env exp) (cond
    ((symbol? exp)
        (cadr (m-lookup exp env)))
    ((pair? exp)
        (m-operate env (m-eval env (car exp)) (cdr exp)))

(define (m-operate env operative operands)
    (operative env operands))

The evaluator has only three cases. If exp is a symbol, it refers to a value in the current environment. If it's a cons pair, the car must evaluate to an operative and the cdr holds operands. Anything else evaluates to itself: numbers, strings, Booleans, and Qoppa operatives (represented by Scheme procedures).

Instead of the traditional eval and apply we have "eval" and "operate". Thanks to our uniform representation of operatives, the latter is very simple.

Qoppa builtins

Now we need to populate the global environment with useful built-in operatives. vau is the most significant of these. Here is its corresponding Scheme procedure.

(define (m-vau static-env vau-operands)
    (let ((params    (car   vau-operands))
          (env-param (cadr  vau-operands))
          (body      (caddr vau-operands)))

        (lambda (dynamic-env operands)
                        (cons env-param   params)
                        (cons dynamic-env operands))

When applying vau, you provide a parameter tree, a name for the caller's environment, and a body. The result of applying vau is an operative which, when applied, evaluates that body. It does so in the environment captured by vau, extended with arguments.

Here's the global environment:

(define (make-global-frame)
    (define (wrap-primitive fun)
        (lambda (env operands)
            (apply fun (map (lambda (exp) (m-eval env exp)) operands))))
        (list 'vau m-vau)
        (list 'eval    (wrap-primitive m-eval))
        (list 'operate (wrap-primitive m-operate))
        (list 'lookup  (wrap-primitive m-lookup))
        (list 'bool    (wrap-primitive (lambda (b t f) (if b t f))))
        (list 'eq?     (wrap-primitive eq?))
        ; more like these

(define global-env (list (make-global-frame)))

Other than vau, each built-in operative evaluates all of its arguments. That's what wrap-primitive accomplishes. We can think of these as functions, whereas vau is something more exotic.

We expose the interpreter's m-eval and m-operate, which are essential for building new features as library code. We could implement lookup as library code; providing it here just prevents some code duplication.

The other functions inherited from Scheme are:

  • Type predicates: null? symbol? pair?

  • Pairs: cons car cdr set-car! set-cdr!

  • Arithmetic: + * - / <= =

  • I/O: error display open-input-file read eof-object

Scheme as a Qoppa library

The Qoppa interpreter uses Scheme syntax like lambda, define, let, if, etc. Qoppa itself supports none of this; all we get is vau and some basic data types. But this is enough to build a Qoppa library which provides all the Scheme features we used in the interpreter. This code starts out very cryptic, and becomes easier to read as we have more high-level features available. You can read through the full library if you like. This section will go over some of the more interesting parts.

Our first task is a bit of a puzzle: how do you define define? It's only possible because we expose the interpreter's representation of environments. We can push a new binding onto the top frame of env, like so:

(set-car! env
        (cons <name> (cons <value> null))
        (car env)))

We use this idea twice, once inside the vau body for define, and once to define define itself.

((vau (name-of-define null) env
    (set-car! env (cons
        (cons name-of-define
            (cons (vau (name exp) defn-env
                    (set-car! defn-env (cons
                        (cons name (cons (eval defn-env exp) null))
                        (car defn-env))))
        (car env))))
    define ())

Next we'll define Scheme's if, which evaluates one branch or the other. We do this in terms of the Qoppa builtin bool, which always evaluates both branches.

(define if (vau (b t f) env
    (eval env
        (bool (eval env b) t f))))

We already saw the code for list, which evaluates each of its arguments. Many other operatives have this behavior, so we should abstract out the idea of "evaluate all arguments". The operative wrap takes an operative and returns a transformed version of that operative, which evaluates all of its arguments.

(define wrap (vau (operative) oper-env
    (vau args args-env
        (operate args-env
            (eval    oper-env operative)
            (operate args-env list args)))))

Now we can implement lambda as an operative that builds a vau term, evals it, and then wraps the resulting operative.

(define lambda (vau (params body) static-env
        (eval static-env
            (list vau params '_ body)))))

This works just like Scheme's lambda:

(define fact (lambda (n)
    (if (<= n 1)
        (* n (fact (- n 1))))))

Actually, it's incomplete, because Scheme's lambda allows an arbitrary number of expressions in the body. In other words Scheme's

(lambda (x) a b c)

is syntactic sugar for

(lambda (x) (begin a b c))

begin evaluates its arguments in order left to right, and returns the value of the last one. In Scheme it's a special form, because normal argument evaluation happens in an undefined order. By contrast, the Qoppa interpreter implements a left-to-right order, so we'll define begin as a function.

(define last (lambda (xs)
    (if (null? (cdr xs))
        (car xs)
        (last (cdr xs)))))

(define begin (lambda xs (last xs)))

Now we can mutate the binding for lambda to support multiple expressions.

(define set! (vau (name exp) env
        (lookup name env)
        (list (eval env exp)))))

(set! lambda
    ((lambda (base-lambda)
        (vau (param . body) env
            (eval env (list base-lambda param (cons begin body)))))

Note the structure

((lambda (base-lambda) ...) lambda)

which holds on to the original lambda operative, in a private frame. That's right, we're using lambda to save lambda so we can overwrite lambda. We use the same approach when defining other sugar, such as the implicit lambda in define.

There are some more bits of Scheme we need to implement: cond, let, map, append, and so forth. These are mostly straightforward; read the code if you want the full story. By far the most troublesome was Scheme's apply function, which takes a function and a list of arguments, and is supposed to apply the function to those arguments. The problem is that our functions are really operatives, and expect to call eval on each of their arguments. If we already have the values in a list, how do we pass them on?

Qoppa and Kernel have very different solutions to this problem. In Kernel, "applicatives" (things that evaluate all their arguments) are a distinct type from operatives. wrap is the primitive constructor of applicatives, and its inverse unwrap is used to implement apply. This design choice simplifies apply but complicates the core evaluator, which needs to distinguish applicatives from operatives.

For Qoppa I implemented wrap as a library function, which we saw before. But then we don't have unwrap. So apply takes the uglier approach of quoting each argument to prevent double-evaluation.

(define apply (wrap (vau (operative args) env
    (eval env (cons
        (map (lambda (x) (list quote x)) args))))))

In either Kernel or Qoppa, you're not allowed to apply apply to something that doesn't evaluate all of its arguments.


The code we saw above is split into two files:

  • qoppa.scm is the Qoppa interpreter, written in Scheme

  • prelude.qop is the Qoppa code which defines wrap, lambda, etc.

I defined a procedure execute-file which reads a file from disk and runs each expression through m-eval. The last line of qoppa.scm is

(execute-file "prelude.qop")

so the definitions in prelude.qop are available immediately.

We start by loading qoppa.scm into a Scheme interpreter. I'm using Guile here, but I've actually tested this with a variety of R5RS implementations.

$ guile -l qoppa.scm
guile> (m-eval global-env '(fact 5))
$1 = 120

This establishes that we've implemented the features used by fact, such as define and lambda. But did we actually implement enough to run the Qoppa interpreter? To test this, we need to go deeper.

guile> (execute-file "qoppa.scm")
$2 = done
guile> (m-eval global-env '(m-eval global-env '(fact 5)))
$3 = 120

This is factorial implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme, implemented as a library for Qoppa, implemented in Scheme (implemented in C). Of course it's outrageously slow; on my machine this (fact 5) takes about 5 minutes. But it demonstrates that a tiny language of operatives, augmented with an appropriate library, can provide enough syntactic features to run a non-trivial Scheme program. As for how to do this efficiently, well, I haven't got far enough into the literature to have any idea.

Thursday, April 5, 2012

A minimal encoder for uncompressed PNGs

I've often wondered how hard it is to output a PNG file directly, without using a library or a standard tool like pnmtopng. (I'm not sure when you'd actually want to do this; maybe for a tiny embedded system with a web interface.)

I found that constructing a simple, uncompressed PNG does not require a whole lot of code, but there are some odd details I got wrong on the first try. Here's a crash course in writing a minimal PNG encoder. We'll use only a small subset of the PNG specification, but I'll link to the full spec so you can read more.

The example code is not too fast; it's written in Python and has tons of string copying everywhere. My goal was to express the idea clearly, and let you worry about coding it up in C for your embedded system or whatever. If you're careful, you can avoid ever copying the image data.

We will assume the raw image data is a Python byte string (non-Unicode), consisting of one byte each for red, green, and blue, for each pixel in English reading order. For reference, here is how we'd "encode" this data in the much simpler PPM format.

def to_ppm(width, height, data):
return 'P6\n%d %d\n255\n%s' % (width, height, data)

I lied when I said we'd use no libraries at all. I will import Python's standard struct module. I figured an exercise in converting integers to 4-byte big endian format would be excessively boring. Here's how we do it with struct.

import struct

def be32(n):
return struct.pack('>I', n)

A PNG file contains a sequence of data chunks, each with an associated length, type, and CRC checksum. The type is a 4-byte quantity which can be interpreted as four ASCII letters. We'll implement crc later.

def png_chunk(ty, data):
return be32(len(data)) + ty + data + be32(crc(ty + data))

The IHDR chunk, always the first chunk in a file, contains basic header information such as width and height. We will hardcode a color depth of 8 bits, color type 2 (RGB truecolor), and standard 0 values for the other fields.

def png_header(width, height):
return png_chunk('IHDR',
struct.pack('>IIBBBBB', width, height, 8, 2, 0, 0, 0))

The actual image data is stored in DEFLATE format, the same compression used by gzip and friends. Fortunately for our minimalist project, DEFLATE allows uncompressed blocks. Each one has a 5-byte header: the byte 0 (or 1 for the last block), followed by a 16-bit data length, and then the same length value with all of the bits flipped. Note that these are little-endian numbers, unlike the rest of PNG. Never assume a format is internally consistent!

MAX_DEFLATE = 0xffff
def deflate_block(data, last=False):
n = len(data)
assert n <= MAX_DEFLATE
return struct.pack('<BHH', bool(last), n, 0xffff ^ n) + data

Since a DEFLATE block can only hold 64 kB, we'll need to split our image data into multiple blocks. We will actually want a more general function to split a sequence into chunks of size n (allowing the last chunk to be smaller than n).

def pieces(seq, n):
return [seq[i:i+n] for i in xrange(0, len(seq), n)]

PNG wants the DEFLATE blocks to be encapsulated as a zlib data stream. For our purposes, this means we prefix a header of 78 01 hex, and suffix an Adler-32 checksum of the "decompressed" data. That's right, a self-contained PNG encoder needs to implement two different checksum algorithms.

def zlib_stream(data):
segments = pieces(data, MAX_DEFLATE)

blocks = ''.join(deflate_block(p) for p in segments[:-1])
blocks += deflate_block(segments[-1], last=True)

return '\x78\x01' + blocks + be32(adler32(data))

We're almost done, but there's one more wrinkle. PNG has a pre-compression filter step, which transforms a scanline of data at a time. A filter doesn't change the size of the image data, but is supposed to expose redundancies, leading to better compression. We aren't compressing anyway, so we choose the no-op filter. This means we prefix a zero byte to each scanline.

At last we can build the PNG file. It consists of the magic PNG signature, a header chunk, our zlib stream inside an IDAT chunk, and an empty IEND chunk to mark the end of the file.

def to_png(width, height, data):
lines = ''.join('\0'+p for p in pieces(data, 3*width))

return ('\x89PNG\r\n\x1a\n'
+ png_header(width, height)
+ png_chunk('IDAT', zlib_stream(lines))
+ png_chunk('IEND', ''))

Actually, a PNG file may contain any number of IDAT chunks. The zlib stream is given by the concatenation of their contents. It might be convenient to emit one IDAT chunk per DEFLATE block. But the IDAT boundaries really can occur anywhere, even halfway through the zlib checksum. This flexibility is convenient for encoders, and a hassle for decoders. For example, one of many historical PNG bugs in Internet Explorer is triggered by empty IDAT chunks.

Here are those checksum algorithms we need. My CRC function follows the approach of code fragment 5 from Wikipedia. For better performance you would want to precompute a lookup table, as suggested by the PNG spec.

def crc(data):
c = 0xffffffff
for x in data:
c ^= ord(x)
for k in xrange(8):
v = 0xedb88320 if c & 1 else 0
c = v ^ (c >> 1)
return c ^ 0xffffffff

def adler32(data):
s1, s2 = 1, 0
for x in data:
s1 = (s1 + ord(x)) % 65521
s2 = (s2 + s1) % 65521
return (s2 << 16) + s1

Now we can test this code. We'll generate a grid of red-green-yellow gradients, and write it in both PPM and PNG formats.

w, h = 500, 300
img = ''
for y in xrange(h):
for x in xrange(w):
img += chr(x % 256) + chr(y % 256) + '\0'

open('out.ppm', 'wb').write(to_ppm(w, h, img))
open('out.png', 'wb').write(to_png(w, h, img))

Then we can verify that the two files contain identical image data.

$ pngtopnm out.png | sha1sum - out.ppm
e19c1229221c608b2a45a4488f9959403b8630a0  -
e19c1229221c608b2a45a4488f9959403b8630a0  out.ppm

That's it! As usual, the code is on GitHub. You can also read what others have written on similar subjects here, here, here, or here.