Saturday, December 4, 2010

Type-level Fix and generic folds

This article describes a fixpoint combinator for Haskell types, with some justification of why such a thing would be useful. It's meant to be an accessible introduction to material which is already covered elsewhere.

You'll need some basic Haskell knowledge, such as how to define and use polymorphic data types. And you might want to first study the value-level fix function. An excellent introduction can be found here.

This code is standard Haskell 98, except in the "GHC tricks" section.

Recursive types

Many useful data types are recursive:

data Natural = Zero | Succ Natural     -- natural numbers
data List a = Nil | Cons a (List a) -- lists
data Tree a = Node a [Tree a] -- rose trees

In each case, the type declared on the left-hand side of "=" is mentioned on the right-hand side. This is fine, but just for fun let's see if there's another way.

We can imagine a language with a keyword "self", which refers to the type being defined:

data Natural = Zero | Succ self
data List a = Nil | Cons a self
data Tree a = Node a [self]

Haskell doesn't have this keyword, so we'll make self into a type parameter:

data NaturalF self = Zero | Succ self
data ListF a self = Nil | Cons a self
data TreeF a self = Node a [self]

We've changed the names because these aren't the types we wanted. They're non-recursive and each has an extra parameter.

To build the recursive type Natural using the non-recursive NaturalF, we need to "tie the knot":

type Natural = NaturalF (NaturalF (NaturalF (NaturalF ...)))

i.e.

type Natural = NaturalF Natural

But this is disallowed:

foo.hs:3:0:
    Cycle in type synonym declarations:
      foo.hs:3:0-30: type Natural = NaturalF Natural

Since type defines a transparent synonym, this would generate an infinitely-large type-expression. Haskell disallows those. We need to hide the recursion behind a data constructor:

newtype Natural = MkNat (NaturalF Natural)

When we use this type, the "wrapper" constructor MkNat alternates with the constructors of NaturalF:

toNatural :: Integer -> Natural
toNatural 0 = MkNat Zero
toNatural n = MkNat . Succ $ toNatural (n-1)

Type-level Fix

So far this is just pointless complexity. We've replaced one type:

data Natural = Zero | Succ Natural

with two types:

data    NaturalF self = Zero | Succ self
newtype Natural = MkNat (NaturalF Natural)

Notice that the definition of Natural doesn't depend on the structure of natural numbers. Let's rename Natural to Fix and MkNat to In:

newtype Fix   = In (NaturalF Fix)

Then we'll abstract out NaturalF as a type parameter f:

newtype Fix f = In (f (Fix f))

Now Natural is just a synonym:

type Natural = Fix NaturalF

When we create values of type Natural, we still have "wrapper" constructors:

toNatural :: Integer -> Natural
toNatural 0 = In Zero
toNatural n = In . Succ $ toNatural (n-1)

Properties of Fix

Recall that the value-level fix function can be defined as

fix f = f (fix f)

This looks a lot like our type

newtype Fix f = In (f (Fix f))

The similarity is reflected at the type and kind levels:

GHCi> :t fix
fix :: (a -> a) -> a
GHCi> :k Fix
Fix :: (* -> *) -> *

fix is a function which takes another function. Fix is a type constructor which takes another type constructor.

The types Fix f and f (Fix f) are isomorphic, and we'll want to convert between them. In one direction we'll use the data constructor

In :: f (Fix f) -> Fix f

and the other direction is easily defined as

out :: Fix f -> f (Fix f)
out (In x) = x

Folding lists

So why bother writing types this way? Well, we usually don't. But factoring the recursion out of our data type enables us to factor the recursion out of our functions.

Consider taking the sum of a list:

sumF :: (Num a) => List a -> a
sumF (In Nil) = 0
sumF (In (Cons x xs)) = x + sumF xs

We traverse the structure with the combining function (+), making a recursive call at each recursive position of the data type. How can we capture this pattern?

If we're reducing a value of type List T to get a result of type R, then a value of type ListF T R represents a list node with the recursive call's result already in place. So we might look for a function like

cata :: (ListF a b -> b) -> List a -> b

If we had this, we could write

sumF :: (Num a) => List a -> a
sumF = cata f where
f Nil = 0
f (Cons x xs) = x + xs

The function f specifies one step of reduction, and the magical function cata handles the rest.

The recursion pattern captured by cata is very similar to foldr on the built-in list type. In fact, we can implement foldr with cata:

foldrF :: (a -> b -> b) -> b -> List a -> b
foldrF f z = cata g where
g Nil = z
g (Cons x xs) = f x xs

If foldr is good enough, then what's so special about cata? The answer is that cata actually has this very general type:

cata :: (Functor f) => (f a -> a) -> Fix f -> a

I'll give the one-line definition of cata later. Writing it yourself is a fun puzzle.

We can see that we'll need this instance:

instance Functor (ListF a) where
fmap _ Nil = Nil
fmap f (Cons x xs) = Cons x (f xs)

This fmap just takes a function and applies it to the self-typed field, if any. It's not recursive and has no relation to the [] instance of Functor, for which fmap = map.

Folding natural numbers

Let's put our generic cata to good use on another type. Every natural number is either zero, or the successor of another natural number:

data NaturalF self = Zero | Succ self
type Natural = Fix NaturalF

Again, fmap applies a function to each self-typed field:

instance Functor NaturalF where
fmap _ Zero = Zero
fmap f (Succ v) = Succ (f v)

We can then use cata to define conversion to Integer:

fromNatural :: Natural -> Integer
fromNatural = cata f where
f Zero = 0
f (Succ n) = 1 + n

as well as addition and multiplication:

add :: Natural -> Natural -> Natural
add n = cata f where
f Zero = n
f (Succ m) = In $ Succ m

mul :: Natural -> Natural -> Natural
mul n = cata f where
f Zero = In Zero
f (Succ m) = add n m

Testing it:

GHCi> fromNatural (add (toNatural 2) (mul (toNatural 3) (toNatural 5)))
17

Folding rose trees

Let's try one more structure, namely rose trees:

data TreeF a self = Node a [self]
type Tree a = Fix (TreeF a)

instance Functor (TreeF a) where
fmap f (Node v xs) = Node v (map f xs)

As before, fmap applies a function to each self-typed field. Since we have a list of these, we use map.

We define some traversals:

valuesF :: Tree a -> [a]
valuesF = cata (\(Node x xs) -> x : concat xs)

depthF :: Tree a -> Integer
depthF = cata (\(Node _ xs) -> 1 + maximum (0:xs))

and test them:

GHCi> let n x = In . Node x
GHCi> let t1 = n 'a' []
GHCi> valuesF t1
"a"
GHCi> depthF t1
1
GHCi> let t2 = n 'a' [n 'b' [], n 'c' [n 'd' []], n 'e' []]
GHCi> valuesF t2
"abcde"
GHCi> depthF t2
3

The definition of cata

As promised:

cata :: (Functor f) => (f a -> a) -> Fix f -> a
cata g = g . fmap (cata g) . out

Generic programming

We usually manipulate data because we care about its meaning. But some operations only depend on the data's structure. If we have some complex nested data value, we might want to pretty-print it, or serialize it with binary, or collect all the Strings buried within. These are concepts which apply to any data type you like. Shouldn't our code be similarly adaptable?

This idea is called generic programming. Implementations in Haskell include syb, uniplate, multiplate, instant-generics, GHC's built-in support, and many other efforts.

Each of these libraries demands certain information about each data type, and in return provides some generic operations. In this view, our above development of Fix constitutes a really tiny generics library. We require

  • that you write recursive types as Fix F, where F is non-recursive, and
  • that you provide instance Functor F.

In return, you get the single generic operation cata, which reduces a recursive structure to a "summary" value, by replacing each constructor with an arbitrary computation.

Our toy generics library is simple, but it doesn't actually save much work. Using Fix adds noise to the code, and our Functor instances aren't much shorter than a custom fold function for each type. Conversely, our uses of cata aren't much simpler than explicit recursive traversals.

Practicality aside, I think it's cool to define folding "once and for all". I'm interested to learn (from you!) what else can be done with Fix'd data or similar techniques.

For a non-trivial generics library built around similar ideas, see multirec and the accompanying paper.

Odds and ends

Nomenclature

Fix is sometimes named Mu, because the Greek letter μ is used to represent least fixed points.

cata stands for catamorphism.

Some of this stuff shows up in "Evolution of a Haskell Programmer". They build a factorial function using not only catamorphisms but also zygomorphisms and paramorphisms. Tell your friends!

In TaPL there's a discussion of equirecursive versus isorecursive type systems. Haskell is in the latter category. Fix f is isomorphic to, but not equivalent to, f (Fix f). The functions In and out provide the isomorphism. If we defined Fix using data instead of newtype, the isomorphism would fail because of the extra value In ⊥.

Isomorphisms

NaturalF is isomorphic to Maybe:

data NaturalF self = Zero    | Succ self
data Maybe a = Nothing | Just a

The Functor instances are also the same.

NaturalF is also isomorphic to ListF (), ignoring undefined values.

GHC tricks

While the above is standard Haskell, we can pull a few more tricks by using a couple of GHC extensions.

GHC could help us reduce boilerplate by deriving Functor instances.

With some GHC extensions, we can write Show for Fix:

instance (Show (f (Fix f))) => Show (Fix f) where
showsPrec p (In x) = showParen (p >= 11) (showString "In " . showsPrec 11 x)

Testing it:

GHCi> toNatural 3
In (Succ (In (Succ (In (Succ (In Zero))))))

This requires -XFlexibleContexts -XUndecidableInstances. The latter is required because f (Fix f) is a syntactically larger type than Fix f, so GHC can't convince itself that instance resolution will definitely terminate. For well-behaved arguments to Fix, it works out. We'll have something like

-- from above
instance (Show (f (Fix f)))
=> Show (Fix f)

-- from 'deriving (Show)' on NaturalF
instance (Show self)
=> Show (NaturalF self)

which instantiates to

instance (Show (NaturalF (Fix NaturalF)))
=> Show (Fix NaturalF)

instance (Show (Fix NaturalF))
=> Show (NaturalF (Fix NaturalF))

and GHC is clever enough to resolve the mutual recursion.

Remarkably, GHC can derive this instance with -XStandaloneDeriving:

deriving instance (Show (f (Fix f))) => Show (Fix f)

That's how I got the precedence-correct code above, via -ddump-deriv.