Tuesday, October 18, 2011

Safe top-level mutable variables for Haskell

I uploaded the safe-globals package for Haskell, which lets you declare top-level mutable variables like so:

{-# LANGUAGE TemplateHaskell #-}
import Data.Global
import Data.IORef

declareIORef "ref"
[t| Int |]
[e| 3 |]

main = do
readIORef ref >>= print -- prints 3
writeIORef ref 5
readIORef ref >>= print -- prints 5

This creates a module-level binding

ref :: IORef Int

which can be managed through the usual module import/export mechanism.

Why global state?

Global state is a sign of bad software design, especially in Haskell. Why would we ever need it? Suppose you're wrapping a C library which is not thread-safe. Using a (hidden!) global lock, you can expose an interface which is simple and safe. In other words, you're using global state to compensate for others using global state.1 Another use case is generating unique identifiers to speed up comparison of values. This can be done without breaking referential transparency, but you need a source of IDs which is really and truly global.

In these situations it's typical to create global variables using a hack such as

ref :: IORef Int
ref = unsafePerformIO (newIORef 3)
{-# NOINLINE ref #-}

My library is just a set of Template Haskell macros for the same hack. If global variables are seldom needed, then what good are these macros?

  • Writing out the hack each time is unsafe. I might forget the NOINLINE pragma, or subvert the type system with a polymorphic reference. The safe-globals library prevents these mistakes. I'm of the opinion — and I know it's not shared by all — that even questionable techniques should be made as safe as possible. Call it "harm reduction" if you like.

  • In ten years, if GHC 9 requires an extra pragma for safety, then safe-globals can be updated, without changing every package that uses it. If JHC's ACIO feature is ported to GHC, then safe-globals can take advantage and get rid of the hacks entirely.

But the direct impetus to write safe-globals was the appearance of the global-variables library, which drew some attention in the Haskell community. global-variables aims to solve the same problem, using a different approach with a number of drawbacks. The rest of this article outlines some of these drawbacks.

Spooky action at a distance

Among the stated features of global-variables are

Avoid having to pass references explicitly throughout the program in order to let distant parts communicate. Enable a communication by convention scheme, where e.g. different libraries may communicate without code dependencies.

This refers to the fact that two global refs with the same name string will become entangled, no matter where in a program they were declared. This is certainly a bug, not a feature. Untracked interactions between different components are the archetypal defect in software engineering.

Neither is there a clear way for a user of global-variables to opt out of this misfeature. The best you can do is augment your names with some prefix which you hope is unique — the same non-solution used by C libraries. Haskell solves namespace problems with a module system and a package system. global-variables circumvents both.

Still, suppose that you choose "communication by convention" for your library. You'll need to manually document the name and type of every ref used by this communication, since they aren't tracked by the type system. A mismatch (as from a library upgrade) will cause silent breakage. Worse, you need to tell every library user how to initialize your library's own variables, and hope that they do it correctly. When a ref is given different initializers in different declarations, the result is indeterminate.

Type clashes

A polymorphic reference, with a type like ∀ t. IORef t, breaks the type system. You can write a value of one type and then read it with another type. So it's important for global-variables to disallow polymorphic refs. The mechanism it uses is that each declaration is implicitly a family of refs, one for each monomorphic type (via Typeable).

Now consider this reasonable-looking program:

{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.Global -- from global-variables
import Data.IORef

-- Define the famous factorial function
fact :: Integer -> Integer
fact 0 = 1
fact n = n * fact (n-1)

-- Now use it through a global ref
ref = declareIORef "ref" 0
main = do
writeIORef ref (length "hello")
r <- readIORef ref
print (fact r)

This will print 1, not 120. The ref is written at type Int (the return type of length) and an implicitly different ref is read at type Integer (because of the subsequent call to fact).

You can certainly argue that top-level refs should always be declared with a monomorphic type signature. Indeed, my library enforces this. But global-variables doesn't, and can't. Making type clashes a run-time error would be a step in the right direction.


  1. A common response is that locking should be added in C code; however, concurrent programming in C is cumbersome and dangerous. It's much easier, if a bit ugly, to implement locking on the Haskell side. You could however store an MVar lock in a C global variable via StablePtr. Has anyone done this?

No comments:

Post a Comment