Showing posts with label types. Show all posts
Showing posts with label types. Show all posts

Saturday, December 4, 2010

Type-level Fix and generic folds

This article describes a fixpoint combinator for Haskell types, with some justification of why such a thing would be useful. It's meant to be an accessible introduction to material which is already covered elsewhere.

You'll need some basic Haskell knowledge, such as how to define and use polymorphic data types. And you might want to first study the value-level fix function. An excellent introduction can be found here.

This code is standard Haskell 98, except in the "GHC tricks" section.

Recursive types

Many useful data types are recursive:

data Natural = Zero | Succ Natural     -- natural numbers
data List a = Nil | Cons a (List a) -- lists
data Tree a = Node a [Tree a] -- rose trees

In each case, the type declared on the left-hand side of "=" is mentioned on the right-hand side. This is fine, but just for fun let's see if there's another way.

We can imagine a language with a keyword "self", which refers to the type being defined:

data Natural = Zero | Succ self
data List a = Nil | Cons a self
data Tree a = Node a [self]

Haskell doesn't have this keyword, so we'll make self into a type parameter:

data NaturalF self = Zero | Succ self
data ListF a self = Nil | Cons a self
data TreeF a self = Node a [self]

We've changed the names because these aren't the types we wanted. They're non-recursive and each has an extra parameter.

To build the recursive type Natural using the non-recursive NaturalF, we need to "tie the knot":

type Natural = NaturalF (NaturalF (NaturalF (NaturalF ...)))

i.e.

type Natural = NaturalF Natural

But this is disallowed:

foo.hs:3:0:
    Cycle in type synonym declarations:
      foo.hs:3:0-30: type Natural = NaturalF Natural

Since type defines a transparent synonym, this would generate an infinitely-large type-expression. Haskell disallows those. We need to hide the recursion behind a data constructor:

newtype Natural = MkNat (NaturalF Natural)

When we use this type, the "wrapper" constructor MkNat alternates with the constructors of NaturalF:

toNatural :: Integer -> Natural
toNatural 0 = MkNat Zero
toNatural n = MkNat . Succ $ toNatural (n-1)

Type-level Fix

So far this is just pointless complexity. We've replaced one type:

data Natural = Zero | Succ Natural

with two types:

data    NaturalF self = Zero | Succ self
newtype Natural = MkNat (NaturalF Natural)

Notice that the definition of Natural doesn't depend on the structure of natural numbers. Let's rename Natural to Fix and MkNat to In:

newtype Fix   = In (NaturalF Fix)

Then we'll abstract out NaturalF as a type parameter f:

newtype Fix f = In (f (Fix f))

Now Natural is just a synonym:

type Natural = Fix NaturalF

When we create values of type Natural, we still have "wrapper" constructors:

toNatural :: Integer -> Natural
toNatural 0 = In Zero
toNatural n = In . Succ $ toNatural (n-1)

Properties of Fix

Recall that the value-level fix function can be defined as

fix f = f (fix f)

This looks a lot like our type

newtype Fix f = In (f (Fix f))

The similarity is reflected at the type and kind levels:

GHCi> :t fix
fix :: (a -> a) -> a
GHCi> :k Fix
Fix :: (* -> *) -> *

fix is a function which takes another function. Fix is a type constructor which takes another type constructor.

The types Fix f and f (Fix f) are isomorphic, and we'll want to convert between them. In one direction we'll use the data constructor

In :: f (Fix f) -> Fix f

and the other direction is easily defined as

out :: Fix f -> f (Fix f)
out (In x) = x

Folding lists

So why bother writing types this way? Well, we usually don't. But factoring the recursion out of our data type enables us to factor the recursion out of our functions.

Consider taking the sum of a list:

sumF :: (Num a) => List a -> a
sumF (In Nil) = 0
sumF (In (Cons x xs)) = x + sumF xs

We traverse the structure with the combining function (+), making a recursive call at each recursive position of the data type. How can we capture this pattern?

If we're reducing a value of type List T to get a result of type R, then a value of type ListF T R represents a list node with the recursive call's result already in place. So we might look for a function like

cata :: (ListF a b -> b) -> List a -> b

If we had this, we could write

sumF :: (Num a) => List a -> a
sumF = cata f where
f Nil = 0
f (Cons x xs) = x + xs

The function f specifies one step of reduction, and the magical function cata handles the rest.

The recursion pattern captured by cata is very similar to foldr on the built-in list type. In fact, we can implement foldr with cata:

foldrF :: (a -> b -> b) -> b -> List a -> b
foldrF f z = cata g where
g Nil = z
g (Cons x xs) = f x xs

If foldr is good enough, then what's so special about cata? The answer is that cata actually has this very general type:

cata :: (Functor f) => (f a -> a) -> Fix f -> a

I'll give the one-line definition of cata later. Writing it yourself is a fun puzzle.

We can see that we'll need this instance:

instance Functor (ListF a) where
fmap _ Nil = Nil
fmap f (Cons x xs) = Cons x (f xs)

This fmap just takes a function and applies it to the self-typed field, if any. It's not recursive and has no relation to the [] instance of Functor, for which fmap = map.

Folding natural numbers

Let's put our generic cata to good use on another type. Every natural number is either zero, or the successor of another natural number:

data NaturalF self = Zero | Succ self
type Natural = Fix NaturalF

Again, fmap applies a function to each self-typed field:

instance Functor NaturalF where
fmap _ Zero = Zero
fmap f (Succ v) = Succ (f v)

We can then use cata to define conversion to Integer:

fromNatural :: Natural -> Integer
fromNatural = cata f where
f Zero = 0
f (Succ n) = 1 + n

as well as addition and multiplication:

add :: Natural -> Natural -> Natural
add n = cata f where
f Zero = n
f (Succ m) = In $ Succ m

mul :: Natural -> Natural -> Natural
mul n = cata f where
f Zero = In Zero
f (Succ m) = add n m

Testing it:

GHCi> fromNatural (add (toNatural 2) (mul (toNatural 3) (toNatural 5)))
17

Folding rose trees

Let's try one more structure, namely rose trees:

data TreeF a self = Node a [self]
type Tree a = Fix (TreeF a)

instance Functor (TreeF a) where
fmap f (Node v xs) = Node v (map f xs)

As before, fmap applies a function to each self-typed field. Since we have a list of these, we use map.

We define some traversals:

valuesF :: Tree a -> [a]
valuesF = cata (\(Node x xs) -> x : concat xs)

depthF :: Tree a -> Integer
depthF = cata (\(Node _ xs) -> 1 + maximum (0:xs))

and test them:

GHCi> let n x = In . Node x
GHCi> let t1 = n 'a' []
GHCi> valuesF t1
"a"
GHCi> depthF t1
1
GHCi> let t2 = n 'a' [n 'b' [], n 'c' [n 'd' []], n 'e' []]
GHCi> valuesF t2
"abcde"
GHCi> depthF t2
3

The definition of cata

As promised:

cata :: (Functor f) => (f a -> a) -> Fix f -> a
cata g = g . fmap (cata g) . out

Generic programming

We usually manipulate data because we care about its meaning. But some operations only depend on the data's structure. If we have some complex nested data value, we might want to pretty-print it, or serialize it with binary, or collect all the Strings buried within. These are concepts which apply to any data type you like. Shouldn't our code be similarly adaptable?

This idea is called generic programming. Implementations in Haskell include syb, uniplate, multiplate, instant-generics, GHC's built-in support, and many other efforts.

Each of these libraries demands certain information about each data type, and in return provides some generic operations. In this view, our above development of Fix constitutes a really tiny generics library. We require

  • that you write recursive types as Fix F, where F is non-recursive, and
  • that you provide instance Functor F.

In return, you get the single generic operation cata, which reduces a recursive structure to a "summary" value, by replacing each constructor with an arbitrary computation.

Our toy generics library is simple, but it doesn't actually save much work. Using Fix adds noise to the code, and our Functor instances aren't much shorter than a custom fold function for each type. Conversely, our uses of cata aren't much simpler than explicit recursive traversals.

Practicality aside, I think it's cool to define folding "once and for all". I'm interested to learn (from you!) what else can be done with Fix'd data or similar techniques.

For a non-trivial generics library built around similar ideas, see multirec and the accompanying paper.

Odds and ends

Nomenclature

Fix is sometimes named Mu, because the Greek letter μ is used to represent least fixed points.

cata stands for catamorphism.

Some of this stuff shows up in "Evolution of a Haskell Programmer". They build a factorial function using not only catamorphisms but also zygomorphisms and paramorphisms. Tell your friends!

In TaPL there's a discussion of equirecursive versus isorecursive type systems. Haskell is in the latter category. Fix f is isomorphic to, but not equivalent to, f (Fix f). The functions In and out provide the isomorphism. If we defined Fix using data instead of newtype, the isomorphism would fail because of the extra value In ⊥.

Isomorphisms

NaturalF is isomorphic to Maybe:

data NaturalF self = Zero    | Succ self
data Maybe a = Nothing | Just a

The Functor instances are also the same.

NaturalF is also isomorphic to ListF (), ignoring undefined values.

GHC tricks

While the above is standard Haskell, we can pull a few more tricks by using a couple of GHC extensions.

GHC could help us reduce boilerplate by deriving Functor instances.

With some GHC extensions, we can write Show for Fix:

instance (Show (f (Fix f))) => Show (Fix f) where
showsPrec p (In x) = showParen (p >= 11) (showString "In " . showsPrec 11 x)

Testing it:

GHCi> toNatural 3
In (Succ (In (Succ (In (Succ (In Zero))))))

This requires -XFlexibleContexts -XUndecidableInstances. The latter is required because f (Fix f) is a syntactically larger type than Fix f, so GHC can't convince itself that instance resolution will definitely terminate. For well-behaved arguments to Fix, it works out. We'll have something like

-- from above
instance (Show (f (Fix f)))
=> Show (Fix f)

-- from 'deriving (Show)' on NaturalF
instance (Show self)
=> Show (NaturalF self)

which instantiates to

instance (Show (NaturalF (Fix NaturalF)))
=> Show (Fix NaturalF)

instance (Show (Fix NaturalF))
=> Show (NaturalF (Fix NaturalF))

and GHC is clever enough to resolve the mutual recursion.

Remarkably, GHC can derive this instance with -XStandaloneDeriving:

deriving instance (Show (f (Fix f))) => Show (Fix f)

That's how I got the precedence-correct code above, via -ddump-deriv.

Tuesday, September 21, 2010

Higher-rank type constraints

I recently encountered a Haskell function which apparently requires "higher-rank" type constraints. I worked around the issue with some type trickery. I'd like to share the problem and the workaround, and especially get feedback on whether there's a better solution.

I did say type trickery:

{-# LANGUAGE
GADTs
, FlexibleContexts
, RankNTypes
, ScopedTypeVariables #-}

Suppose we're implementing an in-place sorting algorithm. We'll use the ST monad, so that we can expose a pure interface. Assuming that we only want to sort primitive types, we will use the unboxed STUArray type, which should improve space and time usage. Actually, it's totally unwarranted premature optimization for the sake of the problem. :)

import Control.Monad.ST
import Data.Array.ST

The monadic part of this code is simple enough. I've left out the part where we actually, you know, sort the list. Consult your bedside copy of CLRS.

sortM
:: forall a s.
(Ord a, MArray (STUArray s) a (ST s))
=> [a]
-> ST s [a]
sortM xs = do
arr <- newListArray (1, length xs) xs
:: ST s (STUArray s Int a)
-- do some in-place sorting here
getElems arr

Note that sortM has a constraint MArray (STUArray s) a (ST s). All we're really saying is that a is a type primitive enough that STUArray can unbox it. The rest of the constraint just connects the concrete world of STUArray with the generic MArray interface.

Note also that we give an explicit type on our call to newListArray. Otherwise, there's nothing to constrain what sort of array we get, and GHC will complain of ambiguity. Furthermore, we need to unify this a and s with the a and s from the signature on sortM. We do that using the ScopedTypeVariables extension and the explicit forall. (If you turn on UnicodeSyntax you can write it as ; how cool is that?)

Good enough so far. Now, the point of using ST was to expose a pure interface, right?

sortP_1 :: (Ord a) => [a] -> [a]
sortP_1 xs = runST (sortM xs)

Suddenly GHC is very unhappy:

    Could not deduce (MArray (STUArray s) a (ST s)) from the context ()
      arising from a use of `sortM' at code.hs:21:20-27
    Possible fix:
      add (MArray (STUArray s) a (ST s)) to the context of
        the polymorphic type `forall s. ST s a'
      or add an instance declaration for (MArray (STUArray s) a (ST s))
    In the first argument of `runST', namely `(sortM xs)'
    In the expression: runST (sortM xs)
    In the definition of `sortP_1': sortP_1 xs = runST (sortM xs)

Okay, fair enough. We've given no reason for GHC to believe that a is an STUArray-compatible type. Let's copy over the constraint from sortM:

sortP_2
:: (Ord a, MArray (STUArray s) a (ST s))
=> [a] -> [a]
sortP_2 xs = runST (sortM xs)

Still no good:

    Could not deduce (MArray (STUArray s1) a (ST s1))
      from the context ()
      arising from a use of `sortM' at code.hs:27:20-27
    Possible fix:
      add (MArray (STUArray s1) a (ST s1)) to the context of
        the polymorphic type `forall s. ST s a'
      or add an instance declaration for (MArray (STUArray s1) a (ST s1))
    In the first argument of `runST', namely `(sortM xs)'
    In the expression: runST (sortM xs)
    In the definition of `sortP_2': sortP_2 xs = runST (sortM xs)

Note that GHC gives an error about the variable s1, not s. This is a valuable hint. Our caller (notionally) chooses one s when she provides evidence to satisfy the constraint MArray (STUArray s) a (ST s). Then runST (notionally) gets to choose a different s because of its higher-rank type.

What to do? It's useless for our caller to pick some particular s and provide evidence. We need that evidence for all s:

sortP_3
:: (Ord a, forall s. MArray (STUArray s) a (ST s))
=> [a] -> [a]
sortP_3 xs = runST (sortM xs)

Unfortunately, GHC does not support this "higher-rank constraint":

code.hs:32:13: malformed class assertion

Well... the -> of functions and the => of constraints are not as different as they may appear. A Haskell compiler can implement type classes by translating constraints to explicit "dictionary" arguments. Let's do the same thing ourselves:

data Evidence s e where
Evidence
:: (MArray (STUArray s) e (ST s))
=> Evidence s e

We don't quite need to reproduce the MArray methods in our "dictionary". (That's good, because some of them are hidden. A strange class indeed.) It's enough to write a GADT constructor with this constrained type. The constructor will implicitly capture the constraint evidence (such as a dictionary) from its call-site.

Now we need to express the universal quantification on s:

data ElemType e = ElemType (forall s. Evidence s e)

A (non-⊥) value of type ElemType e is evidence that e is a suitable element type for STUArray. That's all we wanted the caller to say in the first place!

Now we can pass the appropriate evidence:

sortP_4
:: forall a.
(Ord a)
=> ElemType a
-> [a] -> [a]
sortP_4 (ElemType Evidence) xs = runST (sortM xs)

Not quite:

    Couldn't match expected type `forall s. Evidence s a'
           against inferred type `Evidence s e'
    In the pattern: Evidence
    In the pattern: ElemType Evidence
    In the definition of `sortP_4':
        sortP_4 (ElemType Evidence) xs = runST (sortM xs)

GHC still wants some coaxing to match up this s with the other s. We push the pattern-matching of Evidence inside runST:

sortP_5
:: forall a.
(Ord a)
=> ElemType a
-> [a] -> [a]
sortP_5 (ElemType e) xs = runST (f e) where
f :: Evidence s a -> ST s [a]
f Evidence = sortM xs

And...

[1 of 1] Compiling Main             ( code.hs, interpreted )
Ok, modules loaded: Main.
*Main>

A glorious sight! Then, a moment of concern: is sortP_5 actually usable? Does ElemType have non-⊥ values? Let's check:

*Main> sortP_5 (ElemType Evidence) ([3,4] :: [Int])
[3,4]

Nice. It's sorted and everything. :) As expected, this works only for unboxable types:

*Main> sortP_5 (ElemType Evidence) ["foo","bar"]

<interactive>:1:18:
    Could not deduce (MArray (STUArray s) [Char] (ST s))
      from the context ()
      arising from a use of `Evidence' at <interactive>:1:18-25

As usual, this interesting exercise came from a question on the #haskell IRC channel. That's why it's one of my favorite places, even if they do specialize in making GHC cry.