Tuesday, September 13, 2011

debug-diff: Colorized diffs between Haskell values

Today I uploaded a small library for comparing Haskell values using a textual diff. This has served me well when diagnosing failing tests, or comparing a function with a proposed replacement.

Let's try it on some disassembled x86 code:

$ ghci
λ> :m + Hdis86 Debug.Diff Data.ByteString 
λ> let f = disassemble amd64 . pack 
λ> diff (f [0x31,0xff,0x41,0x5e,0x48,0x89,0xc7]) 
        (f [0x31,0xff,0x41,0x5c,0x48,0x89,0xc7])
--- /tmp/ddiff_x15061   2011-09-13 22:52:13.911842969 -0400 
+++ /tmp/ddiff_y15061   2011-09-13 22:52:13.911842969 -0400 
@@ -1,3 +1,3 @@
 [Inst [] Ixor [Reg (Reg32 RDI), Reg (Reg32 RDI)],
- Inst [Rex] Ipop [Reg (Reg64 R14)], 
+ Inst [Rex] Ipop [Reg (Reg64 R12)], 
  Inst [Rex] Imov [Reg (Reg64 RDI), Reg (Reg64 RAX)]]

This uses groom as a pretty-printer, and shells out to colordiff by default.

Sadly, a textual diff does not always produce usable results. I'm also interested in generic diffs for algebraic data types. There are some packages for this (1, 2) but I haven't yet learned how to use them. This could also be a good application for the new generics support in GHC 7.2.

Saturday, September 3, 2011

Lambda to pi

I gave a talk a while back which included an interpreter for the pi-calculus, and a compiler from the lambda-calculus to it. I didn't really do justice to the material in a few slides, so here's a proper blog post.

This article is available as a Literate Haskell file, which you can load into GHCi directly.

The π-calculus

If the λ-calculus is a minimal functional language, then the π-calculus is a minimal concurrent language. There's basically only three things we can do:

  • Fork a concurrent thread of execution

  • Create message channels and send them through other message channels

  • Loop forever

There's a conventional syntax for the π-calculus, which I don't much care for. Since we're writing an interpreter in Haskell, we'll use Haskell datatypes as our only syntax for the π-calculus.

import Control.Concurrent
(forkIO, Chan, newChan, readChan, writeChan)

import Control.Applicative
import Control.Monad
import Control.Monad.State
import qualified Data.Map as M

Here's the abstract syntax of the π-calculus:

type Name = String

data Pi
= Pi :|: Pi
| Rep Pi
| Nil
| New Name Pi
| Out Name Name Pi
| Inp Name Name Pi
| Embed (IO ()) Pi

Names are bound to message channels, and the only thing we can send through a channel is another channel.

newtype MuChan = Wrap (Chan MuChan)

type Env = M.Map Name MuChan

Repetition and forking are straightforward:

run :: Env -> Pi -> IO ()

run env (Rep p) = forever (run env p)

run env (a :|: b) =
let f = forkIO . run env
in f a >> f b >> return ()

We also have a base-case program that does nothing:

run _ Nil = return ()

And we have three operations on channels. (New bind p) creates a new channel, binds it to the local name bind, then does p:

run env (New bind p) = do
c <- Wrap <$> newChan
run (M.insert bind c env) p

(Out dest msg p) sends the channel msg to the channel dest, then does p:

run env (Out dest msg p) = do
let Wrap c = env M.! dest
writeChan c (env M.! msg)
run env p

(Inp src bind p) reads a channel from src, binds it to the local name bind, then does p:

run env (Inp src bind p) = do
let Wrap c = env M.! src
recv <- readChan c
_ <- forkIO $ run (M.insert bind recv env) p
return ()

Why does Inp invoke forkIO? It's so that Inp under Rep is always available to receive a message, even if a subprogram is blocking on something else. You can think of this as an identity (Rep x)(x :|: Rep x). So we fork a new thread for each "connection", like the UNIX daemons of yore. If we're not under Rep then this is unnecessary but harmless.

Embed is something I threw in so that we can observe our program's execution from its side-effects:

run env (Embed x a) = x >> run env a

Compiling from the λ-calculus

Sending channels through channels is an interesting model, but how many algorithms can we really implement this way? It turns out that the π-calculus can implement any computable function. We'll demonstrate this by writing a compiler from λ-terms to π-terms.

Here's a simple, untyped lambda calculus with effects:

data Lam
= Lam :@: Lam
| Var Name
| Abs Name Lam
| Eff (IO ()) Lam

We'll use a state monad as a source of fresh names. For simplicity, we assume that the input λ-terms do not use any names beginning with '_'.

type M a = State [Name] a

runM :: M a -> a
runM = flip evalState ['_' : show (x :: Integer) | x <- [1..]]

fresh :: M Name
fresh = state (\(x:xs) -> (x,xs))
-- With mtl version 1, use 'State' not 'state'

withFresh :: (Name -> r) -> M r
withFresh f = f <$> fresh

Our compiler will produce π-calculus code which needs to send and receive pairs of values. We can encode a pair as a channel with two values enqueued. mkInp2 and mkOut2 construct functions for receiving and sending pairs, respectively.

mkInp2 :: M (Name -> (Name, Name) -> Pi -> Pi)
mkInp2 = withFresh $ \pair from (bind1, bind2) p ->
Inp from pair $
Inp pair bind1 $
Inp pair bind2 $

mkOut2 :: M (Name -> (Name, Name) -> Pi -> Pi)
mkOut2 = withFresh $ \pair dest (from1, from2) p ->
New pair $
Out pair from1 $
Out pair from2 $
Out dest pair $

Now we come to the compiler itself. The π-term produced by compile k e should evaluate the λ-term e and send the result to the "continuation channel" named by k.

compile :: Name -> Lam -> M Pi
compile k (Var x) = return (Out k x Nil)
compile k (Eff eff a) = Embed eff <$> compile k a

We implement a function as a concurrent process which receives arguments through a channel, and sends back results. ("Lambda as a Service"?) The channel elements are pairs of (function argument, channel where result should be sent).

compile k (Abs x b) = do
inp2 <- mkInp2
f <- fresh
ret <- fresh
body <- compile ret b
return $
New f $
Out k f $
Rep $
inp2 f (x, ret) body

Note that the compiler allocates a fresh name for ret just once, but this name is bound to a different channel by each request.

A function application compiles according to the usual call-by-value rule: evaluate the function, evaluate the argument, then apply. Actually, these steps happen in three concurrent processes, but the "apply" process demands the argument value before sending it to the function, so we end up implementing strict semantics.

compile k (f :@: x) = do
out2 <- mkOut2
[f_cont, f_val, x_cont, x_val] <- replicateM 4 fresh
f_proc <- compile f_cont f
x_proc <- compile x_cont x
let app_proc
= Inp f_cont f_val $
Inp x_cont x_val $
out2 f_val (x_val, k) Nil
return $
New f_cont $
New x_cont $
(f_proc :|: x_proc :|: app_proc)

At the top level of a program, we still need a continuation channel. But we'll never read from this channel, because we're only running the program for its effects.

lambdaToPi :: Lam -> Pi
lambdaToPi b = runM $ do
k <- fresh
New k <$> compile k b

Testing the compiler

We'll test the compiler by doing some arithmetic on Church numerals.

-- \m n f -> n (m f)
e_mult :: Lam
e_mult = Abs "m" . Abs "n" . Abs "f" $
(Var "n" :@: (Var "m" :@: Var "f"))

-- \f x -> f (f (f (... x)))
e_church :: Int -> Lam
e_church n = Abs "f" . Abs "x" $
foldr (:@:) (Var "x") (replicate n $ Var "f")

-- \n -> n (\x -> trace "S" x) (trace "0" n)
e_shownum :: Lam
e_shownum = Abs "n" $
Var "n" :@: (Abs "x" (Eff (putChar 'S') (Var "x")))
:@: (Eff (putChar '0') (Var "n"))

-- compute 2 times 3
e_test :: Lam
e_test = e_shownum :@: (e_mult :@: e_church 2 :@: e_church 3)

And we run it like so:

GHCi> run M.empty (lambdaToPi e_test)

The answer of 6 is printed to the terminal, asynchronously, in unary.


I thought up this particular scheme for compiling to the π-calculus, but it's probably been documented already.

I'm not sure my π-calculus interpreter is correct. In particular, the term (Rep (x :|: y)) is a forkbomb, when it should be operationally equivalent to (Rep x :|: Rep y). I don't know if fixing this would be easy or hard.

Edward Kmett wrote an interpreter and pretty-printer for the π-calculus, in the style of "Finally Tagless, Partially Evaluated" [PDF] by Carette, Kiselyov, and Shan.