Tuesday, September 28, 2010

yices-easy: Simple SAT / SMT solving for Haskell

NP is the new P

In school I learned that NP-complete problems are intractably hard, chief among them Boolean satisfiability (SAT). My mind was thoroughly blown when I heard about SAT solvers and their success at a wide variety of real-world problems (1, 2, 3, 4). Apparently NP is the new P.

Setting aside the theory for a moment, what this implies is the feasibility (for some problems) of a new style of programming based on solving logical equations. We can describe a problem in terms of requirements for the solution, and leave the task of actually finding that solution to a highly-tuned general-purpose library.

There are quite a few SAT-solver libraries on Hackage. I tried some of them on some toy projects. I ran into a number of frustrations, as well as some code that just plain didn't work.

So I decided to write my own binding to Yices. Yices is an efficient and flexible solver for "SAT modulo theories" (SMT). This means that it understands not only Boolean logic but also types like integers, bit vectors, functions, etc. Unfortunately it's not open-source, but it does support a variety of interfaces to external code. The raw C API in bindings-yices was a great starting point, and today I'm releasing yices-easy 0.1.

yices-easy is not the first Yices binding on Hackage (1, 2, 3), nor will it be the last. It's not feature-complete, nor heavily optimized. The goal is simply to lower the entry barrier for Haskell developers (in particular, me) to learn about and use this powerful alien technology.

Latin squares

Let's see an example! A Latin square is an n × n matrix where each row and column is a permutation of [1..n]. We can express these constraints directly using Yices's integer variables:

import Yices.Easy
import Yices.Easy.Sugar
import Yices.Easy.Build

import Control.Monad ( forM_, liftM2 )
import qualified Data.Map as M

cell :: (Int,Int) -> String
cell (x,y) = concat ["c", show x, "_", show y]

query :: Int -> Query
query n = execBuild $ do
let cells = liftM2 (,) [1..n] [1..n]
forM_ cells $ \c -> do
x <- declInt $ cell c
assert ((x >=. 1) &&. (x <=. fromIntegral n))
forM_ cells $ \c@(i0,j0) -> do
let notEq c1 = assert (Var (cell c) /=. Var (cell c1))
forM_ [i0+1..n] $ \i -> notEq (i, j0)
forM_ [j0+1..n] $ \j -> notEq (i0,j )

run :: Int -> IO ()
run n = do
Sat model <- solve $ query n
let soln c = case M.lookup (cell c) model of Just (ValInt k) -> k
line i = forM_ [1..n] $ \j -> putStr (show (soln (i,j)) ++ " ")
forM_ [1..n] $ \i -> line i >> putChar '\n'

Testing it:

GHCi> run 9
4 2 6 8 5 1 7 9 3 
1 9 7 3 8 6 4 2 5 
5 3 9 1 4 7 6 8 2 
3 4 5 2 7 9 8 1 6 
7 5 8 6 9 2 3 4 1 
2 8 1 5 6 4 9 3 7 
6 7 4 9 1 3 2 5 8 
8 6 2 4 3 5 1 7 9 
9 1 3 7 2 8 5 6 4 

This takes about 2 to 4 seconds on my laptop. Yices is nondeterministic, so you can get several Latin squares by running the solver repeatedly.

The function query builds a Yices Query. This is a totally first-order value that you can build and inspect in pure Haskell. We're using some infix syntactic sugar and some monadic bookkeeping, but these are not essential parts of the library. At the core, we just build a Yices syntax tree and hand it off to solve, which translates it to Yices's internal data structures and invokes the solver.

Our query consists of two parts. First we declare an integer-valued variable for each cell in the Latin square, and constrain them to take on values from [1..n]. Then we constrain each cell to differ from those directly below or to the right. run invokes the solver, then produces formatted output.

Graph coloring

A classic NP-complete problem is graph coloring. Given is a graph of nodes and edges, and a set of k colors. We must pick a color for each node, such that nodes connected by an edge never have the same color. Again, we can express this directly in Yices:

import Yices.Easy
import Yices.Easy.Sugar
import Yices.Easy.Build

import Control.Monad
import System
import Data.List
import qualified Data.Map as M
import qualified Data.GraphViz as G

type Edge = (Ident, Ident)
data Graph = Graph [Ident] [Edge]

parse :: String -> Graph
parse g = Graph vs es where
es = map ((\[x,y] -> (x,y)) . words) $ lines g
vs = nub $ concat [ [x,y] | (x,y) <- es ]

query :: Int -> Graph -> Query
query n (Graph vs es) = execBuild $ do
forM_ vs $ \v -> do
x <- declInt v
assert ((x >=. 0) &&. (x <. fromIntegral n))
forM_ es $ \(x,y) -> assert (Var x /=. Var y)

render :: Graph -> Model -> String
render (Graph vs es) m = G.printDotGraph g where
g = G.DotGraph False False Nothing $ G.DotStmts gbl [] vss ess
gbl = [G.NodeAttrs [G.Style [G.SItem G.Filled []]]]
vss = [G.DotNode v [G.Color [G.X11Color $ color v]] | v <- vs]
ess = [G.DotEdge x y False [] | (x,y) <- es]
colors = [G.Red, G.Green, G.Blue, G.Cyan, G.Magenta, G.Yellow, G.White]
color v = case M.lookup v m of Just (ValInt i) -> colors !! fromIntegral i

main :: IO ()
main = do
[nx,file] <- getArgs
numColors <- readIO nx
graph <- parse `fmap` readFile file
result <- solve $ query numColors graph
case result of
Sat model -> writeFile "out.dot" $ render graph model
_ -> putStrLn "No solution." >> exitFailure

Unsurprisingly, most of this code is input / output glue. In parse we parse a very simple edge-list format (see below). query builds the Yices query and is quite similar to the previous example. (In fact, Latin squares are the colorings of a rook's graph.) render uses the graphviz library to build a colorful graph, which we can render with dot.

Let's color the Petersen graph:

$ cat petersen.txt
Ao Bo
Bo Co
Co Do
Do Eo
Eo Ao
Ao Ai
Bo Bi
Co Ci
Do Di
Eo Ei
Ai Ci
Bi Di
Ci Ei
Di Ai
Ei Bi
$ ghc --make -O graph-color.hs
$ export LD_LIBRARY_PATH=~/local/lib
$ ./graph-color 2 petersen.txt
No solution.
$ ./graph-color 3 petersen.txt
$ dot -Tpng out.dot > out.png
$ xview out.png

The coloring works! Too bad Graphviz can't recognize the striking symmetry in the Petersen graph.

Note that I had to specify LD_LIBRARY_PATH because libyices.so is not installed globally on my system. If that's the case for you, you will also need to pass --extra-include-dirs=PATH and --extra-lib-dirs=PATH options to cabal when you install bindings-yices, which is used by my library.

What next?

So that's yices-easy. It's a young library; I need people to bang on it and find bugs. What can you do with an SMT solver?

Sunday, September 26, 2010

View patterns for validation

GHC's ViewPatterns extension has a lot of unexpected uses. One that I've found recently is input validation.

{-# LANGUAGE ViewPatterns #-}

import Control.Monad
import Text.Printf

months :: [String]
months = words "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"

Using this function:

range :: (Ord a) => a -> a -> a -> a
range lb ub x
| (x < lb) || (x > ub) = error "argument out of range"
| otherwise = x

we can put bounds on arguments:

month :: Int -> String
month (range 1 12 -> m) = months !! (m-1)

like so:

GHCi> month 3
GHCi> month 15
"*** Exception: argument out of range

This version provides better error messages:

-- V for 'verbose'
rangeV :: (Ord a, Show a) => String -> a -> a -> a -> a
rangeV fun lb ub x
| (x < lb) || (x > ub) = error (printf msg fun (show x) (show lb) (show ub))
| otherwise = x
where msg = "%s: argument %s is outside of range [%s,%s]"

like so:

monthV :: Int -> String
monthV (rangeV "monthV" 1 12 -> m) = months !! (m-1)

GHCi> monthV 3 "Mar" GHCi> monthV 15 "*** Exception: monthV: argument 15 is outside of range [1,12]

Or handle failure monadically:

-- Mp for MonadPlus
rangeMp :: (MonadPlus m, Ord a) => a -> a -> a -> m a
rangeMp lb ub x = guard ((x >= lb) && (x <= ub)) >> return x

like so:

monthMaybe :: Int -> Maybe String
monthMaybe (rangeMp 1 12 -> Just m) = Just (months !! (m-1))
monthMaybe _ = Nothing

GHCi> monthMaybe 3 Just "Mar" GHCi> monthMaybe 15 Nothing

Saturday, September 25, 2010

GHCi and Cabal

Maybe I'm the only one who didn't already know this, but I'd like to describe how to use GHCi within a Cabal project under development.

Say I'm working on a library foobar:

~/foobar $ find -type f

I want to write a test program, which won't go in the Cabal distribution:

~/foobar $ cat > test.hs
import Data.FooBar
main = print fooBar

~/foobar $ cabal configure && cabal build
Resolving dependencies...
Configuring foobar-0.1...
Preprocessing library foobar-0.1...
Building foobar-0.1...
Registering foobar-0.1...

~/foobar $ ghci test.hs
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 2] Compiling Data.FooBar      ( Data/FooBar.hs, interpreted )
[2 of 2] Compiling Main             ( test.hs, interpreted )
Ok, modules loaded: Data.FooBar, Main.

GHCi is loading Data.FooBar into the interpreter, but I wanted to test the compiled, optimized version that was just produced by Cabal. Let's try telling it about the dist/ directory:

~/foobar $ ghci test.hs -package-conf dist/package.conf.inplace
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 2] Compiling Data.FooBar      ( Data/FooBar.hs, interpreted )
[2 of 2] Compiling Main             ( test.hs, interpreted )
Ok, modules loaded: Data.FooBar, Main.

Still no good. Seems that the existence of the file Data/FooBar.hs overrides anything in the package database. Move to another directory:

~/foobar      $ mkdir test && cd test
~/foobar/test $ mv ../test.hs .
~/foobar/test $ ghci test.hs -package-conf ../dist/package.conf.inplace
GHCi, version 6.12.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 1] Compiling Main             ( test.hs, interpreted )
Ok, modules loaded: Main.
*Main> fooBar
Loading package foobar-0.1 ... linking ... done.
"testing 123"

It works. :) For compiled code as well:

~/foobar/test $ ghc --make -O test.hs -package-conf ../dist/package.conf.inplace
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
~/foobar/test $ ./test
"testing 123"

Friday, September 24, 2010

clogparse: parsing #haskell IRC logs

The #haskell IRC channel on Freenode has archived logs going back to late 2001. That's a lot of data: over 10 million lines, almost 700 MB uncompressed. There's all kinds of clever analyses we could do, and I think I know what language we'd like to use.

So I wrote clogparse, a library for parsing these log files. Let's see who lambdabot has attacked this year:

import Data.IRC.CLog.Parse
import System.Environment ( getArgs )
import qualified Data.Text as Text
import qualified Data.Text.IO as Text

main :: IO ()
main = getArgs >>= mapM_ (\c -> do
es <- parseLog haskellConfig c
let m = Nick $ Text.pack "lambdabot"
mapM_ Text.putStrLn [ t | EventAt _ (Act n t) <- es, n == m ])

Running it:

$ ghc --make -O -Wall test.hs
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...

$ ./test logs/10.* +RTS -A400M
hits lunabot with an assortment of kitchen utensils
will count to five...
would never hurt Cale!
pulls copumpkin through the Evil Mangler
throws some pointy lambdas at medfly
slaps Jafet
puts on her slapping gloves, and slaps cads
hits Twey with an assortment of kitchen utensils
pats Gracenotes on the head
pushes c_wraith for using fromJust from his chair

This is allocation-heavy code. I increased the allocation area size with +RTS -A400M, which significantly improves performance.

Of course it's hard to beat grep for something so simple. Let's next look at when people join the channel throughout the day:

import Data.IRC.CLog.Parse
import System.Environment
import Data.List
import Data.Time
import Text.Printf
import Control.Concurrent.Spawn
import qualified Data.IntMap as M

-- Map times to 10-minute buckets.
bucket :: UTCTime -> Int
bucket (UTCTime _ dt) = floor (toRational dt / 600)

unbucket :: Int -> String
unbucket n = printf "%02d:%02d" d (m*10) where
(d,m) = divMod n 6

-- For use with 'alter'; increments an IntMap key strictly.
inc :: (Num a) => a -> Maybe a -> Maybe a
inc x Nothing = Just x
inc x (Just n) = Just $! (n+x)

-- Get a count of 'Join' events per time bucket for one file.
get :: FilePath -> IO (M.IntMap Int)
get p = do
es <- parseLog haskellConfig p
let bs = [ bucket t | EventAt t (Join _ _) <- es ]
return $! foldl' (flip . M.alter $ inc 1) M.empty bs

main :: IO ()
main = do
p <- pool 8 -- to avoid exhausting memory
ms <- getArgs >>= mapM (spawn . p . get) >>= sequence
-- IntMap lacks a strict 'unionWith', so fold over assocs.
let acc = foldl' (\h (k,v) -> M.alter (inc v) k h) M.empty
m = acc $ concatMap M.assocs ms
-- print (time, count) in gnuplot-friendly format
mapM_ (uncurry $ printf "%s %8d\n")
[ (unbucket k, v) | (k,v) <- M.assocs m ]

We count how many join events occur in each ten-minute interval of the day. We parse in a separate thread for each file, then consolidate the results. Spawning more than 3,000 threads is normally not a problem, but we don't want to load every log file into memory at once. We'd like to have the simplicity of one thread per file, but keep some of them blocked waiting for others to finish. That's exactly what the new pool function in spawn 0.2 does.

Let's see the results:

$ ghc --make -O -Wall -threaded histogram.hs
$ ./histogram logs/* +RTS -N2 > hist.dat
$ gnuplot
gnuplot> set xdata time
gnuplot> set timefmt "%H:%M"
gnuplot> set format x "%H:%M"
gnuplot> plot 'hist.dat' using 1:2 with steps

And we get this graph:

(Modulo some extra gnuplot tweaking on my part. Click for big.)

Times are labeled in UTC. You can sort of make sense of this data: there's a big peak around 09:10, at the beginning of the day in Europe, and some others later on, maybe from the USA. There's another big peak at 23:00; why? How much of this is due to particular institutions, netsplits, etc?

Anyway, that's a couple of small examples of what this library might be good for. I'm looking forward to seeing some clever applications in the future.

Thursday, September 23, 2010

Executing a ByteString

Why not!

{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign ( FunPtr, withForeignPtr, castPtrToFunPtr )
import System ( getArgs )
import qualified Data.ByteString as B ( pack, ByteString )
import qualified Data.ByteString.Internal as B ( toForeignPtr )

payload :: B.ByteString
payload = B.pack [
0x55, -- push %rbp
0x48, 0x89, 0xe5, -- mov %rsp,%rbp
0x89, 0x7d, 0xec, -- mov %edi,-0x14(%rbp)
0xc7, 0x45, 0xfc,
1, 0, 0, 0, -- movl $0x1,-0x4(%rbp)
0xeb, 0x0e, -- jmp 1e <fact+0x1e>
0x8b, 0x45, 0xfc, -- mov -0x4(%rbp),%eax
0x0f, 0xaf, 0x45, 0xec, -- imul -0x14(%rbp),%eax
0x89, 0x45, 0xfc, -- mov %eax,-0x4(%rbp)
0x83, 0x6d, 0xec, 1, -- subl $0x1,-0x14(%rbp)
0x83, 0x7d, 0xec, 1, -- cmpl $0x1,-0x14(%rbp)
0x7f, 0xec, -- jg 10 <fact+0x10>
0x8b, 0x45, 0xfc, -- mov -0x4(%rbp),%eax
0xc9, -- leaveq
0xc3 ] -- retq

type F = Int -> Int

foreign import ccall "dynamic"
getF :: FunPtr F -> F

main :: IO ()
main = do
[s] <- getArgs
n <- readIO s
let (fptr, _, _) = B.toForeignPtr payload
withForeignPtr fptr $ \ptr -> do
let f = getF $ castPtrToFunPtr ptr
print $ f n

Testing it:

$ runhaskell foo.hs 6

This code will only work on x86-64.

Tuesday, September 21, 2010

crystalfontz 0.1

I've uploaded a Haskell library for talking to Crystalfontz LCD displays -- specifically, the one model that I own. If you have a different LCD and would like to poke it from Haskell, or you're dying to use one of the many unimplemented commands, let me know.

jspath 0.1

I've uploaded the first version of jspath, a small library which traverses JSON structures to extract substructures matching a path.

Higher-rank type constraints

I recently encountered a Haskell function which apparently requires "higher-rank" type constraints. I worked around the issue with some type trickery. I'd like to share the problem and the workaround, and especially get feedback on whether there's a better solution.

I did say type trickery:

, FlexibleContexts
, RankNTypes
, ScopedTypeVariables #-}

Suppose we're implementing an in-place sorting algorithm. We'll use the ST monad, so that we can expose a pure interface. Assuming that we only want to sort primitive types, we will use the unboxed STUArray type, which should improve space and time usage. Actually, it's totally unwarranted premature optimization for the sake of the problem. :)

import Control.Monad.ST
import Data.Array.ST

The monadic part of this code is simple enough. I've left out the part where we actually, you know, sort the list. Consult your bedside copy of CLRS.

:: forall a s.
(Ord a, MArray (STUArray s) a (ST s))
=> [a]
-> ST s [a]
sortM xs = do
arr <- newListArray (1, length xs) xs
:: ST s (STUArray s Int a)
-- do some in-place sorting here
getElems arr

Note that sortM has a constraint MArray (STUArray s) a (ST s). All we're really saying is that a is a type primitive enough that STUArray can unbox it. The rest of the constraint just connects the concrete world of STUArray with the generic MArray interface.

Note also that we give an explicit type on our call to newListArray. Otherwise, there's nothing to constrain what sort of array we get, and GHC will complain of ambiguity. Furthermore, we need to unify this a and s with the a and s from the signature on sortM. We do that using the ScopedTypeVariables extension and the explicit forall. (If you turn on UnicodeSyntax you can write it as ; how cool is that?)

Good enough so far. Now, the point of using ST was to expose a pure interface, right?

sortP_1 :: (Ord a) => [a] -> [a]
sortP_1 xs = runST (sortM xs)

Suddenly GHC is very unhappy:

    Could not deduce (MArray (STUArray s) a (ST s)) from the context ()
      arising from a use of `sortM' at code.hs:21:20-27
    Possible fix:
      add (MArray (STUArray s) a (ST s)) to the context of
        the polymorphic type `forall s. ST s a'
      or add an instance declaration for (MArray (STUArray s) a (ST s))
    In the first argument of `runST', namely `(sortM xs)'
    In the expression: runST (sortM xs)
    In the definition of `sortP_1': sortP_1 xs = runST (sortM xs)

Okay, fair enough. We've given no reason for GHC to believe that a is an STUArray-compatible type. Let's copy over the constraint from sortM:

:: (Ord a, MArray (STUArray s) a (ST s))
=> [a] -> [a]
sortP_2 xs = runST (sortM xs)

Still no good:

    Could not deduce (MArray (STUArray s1) a (ST s1))
      from the context ()
      arising from a use of `sortM' at code.hs:27:20-27
    Possible fix:
      add (MArray (STUArray s1) a (ST s1)) to the context of
        the polymorphic type `forall s. ST s a'
      or add an instance declaration for (MArray (STUArray s1) a (ST s1))
    In the first argument of `runST', namely `(sortM xs)'
    In the expression: runST (sortM xs)
    In the definition of `sortP_2': sortP_2 xs = runST (sortM xs)

Note that GHC gives an error about the variable s1, not s. This is a valuable hint. Our caller (notionally) chooses one s when she provides evidence to satisfy the constraint MArray (STUArray s) a (ST s). Then runST (notionally) gets to choose a different s because of its higher-rank type.

What to do? It's useless for our caller to pick some particular s and provide evidence. We need that evidence for all s:

:: (Ord a, forall s. MArray (STUArray s) a (ST s))
=> [a] -> [a]
sortP_3 xs = runST (sortM xs)

Unfortunately, GHC does not support this "higher-rank constraint":

code.hs:32:13: malformed class assertion

Well... the -> of functions and the => of constraints are not as different as they may appear. A Haskell compiler can implement type classes by translating constraints to explicit "dictionary" arguments. Let's do the same thing ourselves:

data Evidence s e where
:: (MArray (STUArray s) e (ST s))
=> Evidence s e

We don't quite need to reproduce the MArray methods in our "dictionary". (That's good, because some of them are hidden. A strange class indeed.) It's enough to write a GADT constructor with this constrained type. The constructor will implicitly capture the constraint evidence (such as a dictionary) from its call-site.

Now we need to express the universal quantification on s:

data ElemType e = ElemType (forall s. Evidence s e)

A (non-⊥) value of type ElemType e is evidence that e is a suitable element type for STUArray. That's all we wanted the caller to say in the first place!

Now we can pass the appropriate evidence:

:: forall a.
(Ord a)
=> ElemType a
-> [a] -> [a]
sortP_4 (ElemType Evidence) xs = runST (sortM xs)

Not quite:

    Couldn't match expected type `forall s. Evidence s a'
           against inferred type `Evidence s e'
    In the pattern: Evidence
    In the pattern: ElemType Evidence
    In the definition of `sortP_4':
        sortP_4 (ElemType Evidence) xs = runST (sortM xs)

GHC still wants some coaxing to match up this s with the other s. We push the pattern-matching of Evidence inside runST:

:: forall a.
(Ord a)
=> ElemType a
-> [a] -> [a]
sortP_5 (ElemType e) xs = runST (f e) where
f :: Evidence s a -> ST s [a]
f Evidence = sortM xs


[1 of 1] Compiling Main             ( code.hs, interpreted )
Ok, modules loaded: Main.

A glorious sight! Then, a moment of concern: is sortP_5 actually usable? Does ElemType have non-⊥ values? Let's check:

*Main> sortP_5 (ElemType Evidence) ([3,4] :: [Int])

Nice. It's sorted and everything. :) As expected, this works only for unboxable types:

*Main> sortP_5 (ElemType Evidence) ["foo","bar"]

    Could not deduce (MArray (STUArray s) [Char] (ST s))
      from the context ()
      arising from a use of `Evidence' at <interactive>:1:18-25

As usual, this interesting exercise came from a question on the #haskell IRC channel. That's why it's one of my favorite places, even if they do specialize in making GHC cry.