Sunday, May 3, 2015

Modeling garbage collectors with Alloy: part 1

Formal methods for software verification are usually seen as a high-cost tool that you would only use on the most critical systems, and only after extensive informal verification. The Alloy project aims to be something completely different: a lightweight tool you can use at any stage of everyday software development. With just a few lines of code, you can build a simple model to explore design issues and corner cases, even before you've started writing the implementation. You can gradually make the model more detailed as your requirements and implementation get more complex. After a system is deployed, you can keep the model around to evaluate future changes at low cost.

Sounds great, doesn't it? I have only a tiny bit of prior experience with Alloy and I wanted to try it out on something more substantial. In this article we'll build a simple model of a garbage collector, visualize its behavior, and fix some problems. This is a warm-up for exploring more complex GC algorithms, which will be the subject of future articles.

I won't describe the Alloy syntax in full detail, but you should be able to follow along if you have some background in programming and logic. See also the Alloy documentation and especially the book Software Abstractions: Logic, Language, and Analysis by Daniel Jackson, which is a very practical and accessible introduction to Alloy. It's a highly recommended read for any software developer.

You can download Alloy as a self-contained Java executable, which can do analysis and visualization and includes an editor for Alloy code.

The model

We will start like so:

open util/ordering [State]

sig Object { }
one sig Root extends Object { }

sig State {
    pointers: Object -> set Object,
    collected: set Object,

The garbage-collected heap consists of Objects, each of which can point to any number of other Objects (including itself). There is a distinguished object Root which represents everything that's accessible without going through the heap, such as global variables and the function call stack. We also track which objects have already been garbage-collected. In a real implementation these would be candidates for re-use; in our model they stick around so that we can detect use-after-free.

The open statement invokes a library module to provide a total ordering on States, which we will interpret as the progression of time. More on this later.


In the code that follows, it may look like Alloy has lots of different data types, overloading operators with total abandon. In fact, all these behaviors arise from an exceptionally simple data model:

Every value is a relation; that is, a set of tuples of the same non-zero length.

When each tuple has length 1, we can view the relation as a set. When each tuple has length 2, we can view it as a binary relation and possibly as a function. And a singleton set is viewed as a single atom or tuple.

Since everything in Alloy is a relation, each operator has a single definition in terms of relations. For example, the operators . and [] are syntax for a flavor of relational join. If you think of the underlying relations as a database, then Alloy's clever syntax amounts to an object-relational mapping that is at once very simple and very powerful. Depending on context, these joins can look like field access, function calls, or data structure lookups, but they are all described by the same underlying framework.

The elements of the tuples in a relation are atoms, which are indivisible and have no meaning individually. Their meaning comes entirely from the relations and properties we define. Ultimately, atoms all live in the same universe, but Alloy gives "warnings" when the type system implied by the sig declarations can prove that an expression is always the empty relation.

Here are the relations implied by our GC model, as tuple sets along with their types:

Object: {Object} = {O1, O2, ..., Om}
Root:   {Root}   = {Root}
State:  {State}  = {S1, S2, ..., Sn}

pointers:  {(State, Object, Object)}
collected: {(State, Object)}

first: {State} = {S1}
last:  {State} = {Sn}
next:  {(State, State)} = {(S1, S2), (S2, S3), ..., (S(n-1), Sn)}

The last three relations come from the util/ordering library. Note that a sig implicitly creates some atoms.


The live objects are everything reachable from the root:

fun live(s: State): set Object {

*(s.pointers) constructs the reflexive, transitive closure of the binary relation s.pointers; that is, the set of objects reachable from each object.

Of course the GC is only part of a system; there's also the code that actually uses these objects, which in GC terminology is called the mutator. We can describe the action of each part as a predicate relating "before" and "after" states.

pred mutate(s, t: State) {
    t.collected = s.collected
    t.pointers != s.pointers
    all a: Object - |
        t.pointers[a] = s.pointers[a]

pred gc(s, t: State) {
    t.pointers = s.pointers
    t.collected = s.collected + (Object -
    some t.collected - s.collected

The mutator cannot collect garbage, but it can change the pointers of any live object. The GC doesn't touch the pointers, but it collects any dead object. In both cases we require that something changes in the heap.

It's time to state the overall facts of our model:

fact {
    no first.collected
    first.pointers = Root -> (Object - Root)
    all s: State - last |
        let t = |
        mutate[s, t] or gc[s, t]

This says that in the initial state, no object has been collected, and every object is in the root set except Root itself. This means we don't have to model allocation as well. Each state except the last must be followed by a mutator step or a GC step.

The syntax all x: e | P says that the property P must hold for every tuple x in e. Alloy supports a variety of quantifiers like this.

Interacting with Alloy

The development above looks nice and tidy — I hope — but in reality, it took a fair bit of messing around to get to this point. Alloy provides a highly interactive development experience. At any time, you can visualize your model as a collection of concrete examples. Let's do that now by adding these commands:

pred Show {}
run Show for 5

Now we select this predicate from the "Execute" menu, then click "Show". The visualizer provides many options to customise the display of each atom and relation. The config that I made for this project is "projected over State", which means you see a graph of the heap at one moment in time, with forward/back buttons to reach the other States.

After clicking around a bit, you may notice some oddities:

Diagram of a heap with an object pointing to the root

The root isn't a heap object; it represents all of the pointers that are reachable without accessing the heap. So it's meaningless for an object to point to the root. We can exclude these cases from the model easily enough:

fact {
    all s: State | no s.pointers.Root

(This can also be done more concisely as part of the original sig.)

Now we're ready to check the essential safety property of a garbage collector:

assert no_dangling {
    all s: State | no (s.collected &

check no_dangling for 5 Object, 10 State

And Alloy says:

Executing "Check no_dangling for 5 Object, 10 State"
   8338 vars. 314 primary vars. 17198 clauses. 40ms.
   Counterexample found. Assertion is invalid. 14ms.

Clicking "Counterexample" brings up the visualization:

Diagram of four states. A single heap object is unrooted, then collected, but then the root grows a new pointer to it!

Whoops, we forgot to say that only pointers to live objects can be stored! We can fix this by modifying the mutate predicate:

pred mutate(s, t: State) {
    t.collected = s.collected
    t.pointers != s.pointers
    all a: Object - |
        t.pointers[a] = s.pointers[a]

    // new requirement!
    all a: |
        t.pointers[a] in

With the result:

Executing "Check no_dangling for 5 Object, 10 State"
   8617 vars. 314 primary vars. 18207 clauses. 57ms.
   No counterexample found. Assertion may be valid. 343ms.

SAT solvers and bounded model checking

"May be" valid? Fortunately this has a specific meaning. We asked Alloy to look for counterexamples involving at most 5 objects and 10 time steps. This bounds the search for counterexamples, but it's still vastly more than we could ever check by exhaustive brute force search. (See where it says "8617 vars"? Try raising 2 to that power.) Rather, Alloy turns the bounded model into a Boolean formula, and feeds it to a SAT solver.

This all hinges on one of the weirdest things about computing in the 21st century. In complexity theory, SAT (along with many equivalents) is the prototypical "hardest problem" in NP. Why do we intentionally convert our problem into an instance of this "hardest problem"? I guess for me it illustrates a few things:

  • The huge gulf between worst-case complexity (the subject of classes like NP) and average or "typical" cases that we encounter in the real world. For more on this, check out Impagliazzo's "Five Worlds" paper.

  • The fact that real-world difficulty involves a coordination game. SAT solvers got so powerful because everyone agrees SAT is the problem to solve. Standard input formats and public competitions were a key part of the amazing progress over the past decade or two.

Of course SAT solvers aren't quite omnipotent, and Alloy can quickly get overwhelmed when you scale up the size of your model. Applicability to the real world depends on the small scope hypothesis:

If an assertion is invalid, it probably has a small counterexample.

Or equivalently:

Systems that fail on large instances almost always fail on small instances with similar properties.

This is far from a sure thing, but it already underlies a lot of approaches to software testing. With Alloy we have the certainty of proof within the size bounds, so we don't have to resort to massive scale to find rare bugs. It's difficult (but not impossible!) to imagine a GC algorithm that absolutely cannot fail on fewer than 6 nodes, but is buggy for larger heaps. Implementations will often fall over at some arbitrary resource limit, but algorithms and models are more abstract.


It's not surprising that our correctness property

all s: State | no (s.collected &

holds, since it's practically a restatement of the garbage collection "algorithm":

t.collected = s.collected + (Object -

Because reachability is built into Alloy, via transitive closure, the simplest model of a garbage collector does not really describe an implementation. In the next article we'll look at incremental garbage collection, which breaks the reachability search into small units and allows the mutator to run in-between. This is highly desirable for interactive or real-time apps; it also complicates the algorithm quite a bit. We'll use Alloy to uncover some of these complications.

In the meantime, you can play around with the simple GC model and ask Alloy to visualize any scenario you like. For example, we can look at runs where the final state includes at least 5 pointers, and at least one collected object:

pred Show {
    #(last.pointers) >= 5
    some last.collected

run Show for 5

Thanks for reading! You can find the code in a GitHub repository which I'll update if/when we get around to modeling more complex GCs.


  1. I really love your article, I also love travel China, in my past 5 years I traveled China a lot, see my favorite site: China vacation packages | China city tours | Beijing Travel Service

  2. I really enjoyed the article. I am a researcher working in formal methods, mostly model based testing and model checking. It is very nice to see someone really trying to take this kind of methodology into actual software development. Did you know that there are some model checkers that would take the source code of the implementation as the model? In particular, there are some that work at LLVM IR level, so you could feed it with the real GC implementation and check for properties or bugs.