This article describes a fixpoint combinator for Haskell types, with some justification of why such a thing would be useful. It's meant to be an accessible introduction to material which is already covered elsewhere.
You'll need some basic Haskell knowledge, such as how to define and use polymorphic data types. And you might want to first study the value-level fix
function. An excellent introduction can be found here.
This code is standard Haskell 98, except in the "GHC tricks" section.
Recursive types
Many useful data types are recursive:
data Natural = Zero | Succ Natural -- natural numbers
data List a = Nil | Cons a (List a) -- lists
data Tree a = Node a [Tree a] -- rose trees
In each case, the type declared on the left-hand side of "=
" is mentioned on the right-hand side. This is fine, but just for fun let's see if there's another way.
We can imagine a language with a keyword "self
", which refers to the type being defined:
data Natural = Zero | Succ self
data List a = Nil | Cons a self
data Tree a = Node a [self]
Haskell doesn't have this keyword, so we'll make self
into a type parameter:
data NaturalF self = Zero | Succ self
data ListF a self = Nil | Cons a self
data TreeF a self = Node a [self]
We've changed the names because these aren't the types we wanted. They're non-recursive and each has an extra parameter.
To build the recursive type Natural
using the non-recursive NaturalF
, we need to "tie the knot":
type Natural = NaturalF (NaturalF (NaturalF (NaturalF ...)))
i.e.
type Natural = NaturalF Natural
But this is disallowed:
foo.hs:3:0:
Cycle in type synonym declarations:
foo.hs:3:0-30: type Natural = NaturalF Natural
Since type
defines a transparent synonym, this would generate an infinitely-large type-expression. Haskell disallows those. We need to hide the recursion behind a data constructor:
newtype Natural = MkNat (NaturalF Natural)
When we use this type, the "wrapper" constructor MkNat
alternates with the constructors of NaturalF
:
toNatural :: Integer -> Natural
toNatural 0 = MkNat Zero
toNatural n = MkNat . Succ $ toNatural (n-1)
Type-level Fix
So far this is just pointless complexity. We've replaced one type:
data Natural = Zero | Succ Natural
with two types:
data NaturalF self = Zero | Succ self
newtype Natural = MkNat (NaturalF Natural)
Notice that the definition of Natural
doesn't depend on the structure of natural numbers. Let's rename Natural
to Fix
and MkNat
to In
:
newtype Fix = In (NaturalF Fix)
Then we'll abstract out NaturalF
as a type parameter f
:
newtype Fix f = In (f (Fix f))
Now Natural
is just a synonym:
type Natural = Fix NaturalF
When we create values of type Natural
, we still have "wrapper" constructors:
toNatural :: Integer -> Natural
toNatural 0 = In Zero
toNatural n = In . Succ $ toNatural (n-1)
Properties of Fix
Recall that the value-level fix
function can be defined as
fix f = f (fix f)
This looks a lot like our type
newtype Fix f = In (f (Fix f))
The similarity is reflected at the type and kind levels:
GHCi> :t fix
fix :: (a -> a) -> a
GHCi> :k Fix
Fix :: (* -> *) -> *
fix
is a function which takes another function. Fix
is a type constructor which takes another type constructor.
The types Fix f
and f (Fix f)
are isomorphic, and we'll want to convert between them. In one direction we'll use the data constructor
In :: f (Fix f) -> Fix f
and the other direction is easily defined as
out :: Fix f -> f (Fix f)
out (In x) = x
Folding lists
So why bother writing types this way? Well, we usually don't. But factoring the recursion out of our data type enables us to factor the recursion out of our functions.
Consider taking the sum of a list:
sumF :: (Num a) => List a -> a
sumF (In Nil) = 0
sumF (In (Cons x xs)) = x + sumF xs
We traverse the structure with the combining function (+)
, making a recursive call at each recursive position of the data type. How can we capture this pattern?
If we're reducing a value of type List T
to get a result of type R
, then a value of type ListF T R
represents a list node with the recursive call's result already in place. So we might look for a function like
cata :: (ListF a b -> b) -> List a -> b
If we had this, we could write
sumF :: (Num a) => List a -> a
sumF = cata f where
f Nil = 0
f (Cons x xs) = x + xs
The function f
specifies one step of reduction, and the magical function cata
handles the rest.
The recursion pattern captured by cata
is very similar to foldr
on the built-in list type. In fact, we can implement foldr
with cata
:
foldrF :: (a -> b -> b) -> b -> List a -> b
foldrF f z = cata g where
g Nil = z
g (Cons x xs) = f x xs
If foldr
is good enough, then what's so special about cata
? The answer is that cata
actually has this very general type:
cata :: (Functor f) => (f a -> a) -> Fix f -> a
I'll give the one-line definition of cata
later. Writing it yourself is a fun puzzle.
We can see that we'll need this instance:
instance Functor (ListF a) where
fmap _ Nil = Nil
fmap f (Cons x xs) = Cons x (f xs)
This fmap
just takes a function and applies it to the self
-typed field, if any. It's not recursive and has no relation to the []
instance of Functor
, for which fmap = map
.
Folding natural numbers
Let's put our generic cata
to good use on another type. Every natural number is either zero, or the successor of another natural number:
data NaturalF self = Zero | Succ self
type Natural = Fix NaturalF
Again, fmap
applies a function to each self
-typed field:
instance Functor NaturalF where
fmap _ Zero = Zero
fmap f (Succ v) = Succ (f v)
We can then use cata
to define conversion to Integer
:
fromNatural :: Natural -> Integer
fromNatural = cata f where
f Zero = 0
f (Succ n) = 1 + n
as well as addition and multiplication:
add :: Natural -> Natural -> Natural
add n = cata f where
f Zero = n
f (Succ m) = In $ Succ m
mul :: Natural -> Natural -> Natural
mul n = cata f where
f Zero = In Zero
f (Succ m) = add n m
Testing it:
GHCi> fromNatural (add (toNatural 2) (mul (toNatural 3) (toNatural 5)))
17
Folding rose trees
Let's try one more structure, namely rose trees:
data TreeF a self = Node a [self]
type Tree a = Fix (TreeF a)
instance Functor (TreeF a) where
fmap f (Node v xs) = Node v (map f xs)
As before, fmap
applies a function to each self
-typed field. Since we have a list of these, we use map
.
We define some traversals:
valuesF :: Tree a -> [a]
valuesF = cata (\(Node x xs) -> x : concat xs)
depthF :: Tree a -> Integer
depthF = cata (\(Node _ xs) -> 1 + maximum (0:xs))
and test them:
GHCi> let n x = In . Node x
GHCi> let t1 = n 'a' []
GHCi> valuesF t1
"a"
GHCi> depthF t1
1
GHCi> let t2 = n 'a' [n 'b' [], n 'c' [n 'd' []], n 'e' []]
GHCi> valuesF t2
"abcde"
GHCi> depthF t2
3
The definition of cata
As promised:
cata :: (Functor f) => (f a -> a) -> Fix f -> a
cata g = g . fmap (cata g) . out
Generic programming
We usually manipulate data because we care about its meaning. But some operations only depend on the data's structure. If we have some complex nested data value, we might want to pretty-print it, or serialize it with binary, or collect all the String
s buried within. These are concepts which apply to any data type you like. Shouldn't our code be similarly adaptable?
This idea is called generic programming. Implementations in Haskell include syb, uniplate, multiplate, instant-generics, GHC's built-in support, and many other efforts.
Each of these libraries demands certain information about each data type, and in return provides some generic operations. In this view, our above development of Fix
constitutes a really tiny generics library. We require
- that you write recursive types as
Fix F
, whereF
is non-recursive, and - that you provide
instance Functor F
.
In return, you get the single generic operation cata
, which reduces a recursive structure to a "summary" value, by replacing each constructor with an arbitrary computation.
Our toy generics library is simple, but it doesn't actually save much work. Using Fix
adds noise to the code, and our Functor
instances aren't much shorter than a custom fold function for each type. Conversely, our uses of cata
aren't much simpler than explicit recursive traversals.
Practicality aside, I think it's cool to define folding "once and for all". I'm interested to learn (from you!) what else can be done with Fix
'd data or similar techniques.
For a non-trivial generics library built around similar ideas, see multirec and the accompanying paper.
Odds and ends
Nomenclature
Fix
is sometimes named Mu
, because the Greek letter μ is used to represent least fixed points.
cata
stands for catamorphism.
Some of this stuff shows up in "Evolution of a Haskell Programmer". They build a factorial function using not only catamorphisms but also zygomorphisms and paramorphisms. Tell your friends!
In TaPL there's a discussion of equirecursive versus isorecursive type systems. Haskell is in the latter category. Fix f
is isomorphic to, but not equivalent to, f (Fix f)
. The functions In
and out
provide the isomorphism. If we defined Fix
using data
instead of newtype
, the isomorphism would fail because of the extra value In ⊥
.
Isomorphisms
NaturalF
is isomorphic to Maybe
:
data NaturalF self = Zero | Succ self
data Maybe a = Nothing | Just a
The Functor
instances are also the same.
NaturalF
is also isomorphic to ListF ()
, ignoring undefined values.
GHC tricks
While the above is standard Haskell, we can pull a few more tricks by using a couple of GHC extensions.
GHC could help us reduce boilerplate by deriving Functor
instances.
With some GHC extensions, we can write Show
for Fix
:
instance (Show (f (Fix f))) => Show (Fix f) where
showsPrec p (In x) = showParen (p >= 11) (showString "In " . showsPrec 11 x)
Testing it:
GHCi> toNatural 3
In (Succ (In (Succ (In (Succ (In Zero))))))
This requires -XFlexibleContexts
-XUndecidableInstances
. The latter is required because f (Fix f)
is a syntactically larger type than Fix f
, so GHC can't convince itself that instance resolution will definitely terminate. For well-behaved arguments to Fix
, it works out. We'll have something like
-- from above
instance (Show (f (Fix f)))
=> Show (Fix f)
-- from 'deriving (Show)' on NaturalF
instance (Show self)
=> Show (NaturalF self)
which instantiates to
instance (Show (NaturalF (Fix NaturalF)))
=> Show (Fix NaturalF)
instance (Show (Fix NaturalF))
=> Show (NaturalF (Fix NaturalF))
and GHC is clever enough to resolve the mutual recursion.
Remarkably, GHC can derive this instance with -XStandaloneDeriving
:
deriving instance (Show (f (Fix f))) => Show (Fix f)
That's how I got the precedence-correct code above, via -ddump-deriv
.