Introduction

This document summarises the findings from a research sprint in August 2018. The source material is an email thread.

This is not a design document; it does not describe a program that exists, nor does it express an intention to write a particular program. Instead, it explores the space of possible designs, identifying things that might be desirable and/or feasible, and naming things.

Having said that, minimising differences from the MPS is an important consideration.

Goals

The overall goal was to identify unforced design choices in the MPS, and in particular to explore those we would like to change. The motivating examples were:

Unprotect the parts of the heap that are invisible to the mutator anyway.
Do scanning work in a background thread, to take the burden off the mutator thread.
Exploit sources of reachability information other than scanning, especially time-since-mutation, the mutator's working set, and overlapping collections.
Explore alternative definitions of "generation". Be clear what user-meaningful metrics we're optimising for. Find user-meaningful tuning parameters.
Explore the "mutator is grey" design.
Mitigate some of the pathological cases of the "zone set" approximation, and reduce our reliance on placing objects in particular zones.
Identify any other cases where the implementation has diverged from the ideal design.

Prerequisites

We agreed that the following were prerequisites:

Separate concepts that are currently conflated in the MPS, e.g. colour, protection state, visibility to the mutator.
Separate concepts that are specific to the collection algorithm (e.g. work queue, generations, condemning, zone preference) from concepts that are forced on us by the problem (e.g. reachability, object age, protection, efficient representation of sets of objects).
Express data structure invariants in a more mathematical way, without reference to the collection algorithm.

Actions

We feel that concrete benefits will follow from the following actions:

Identify simplifying and generalising abstractions, and engineer them post-hoc into the code (example).
Measure behaviour in realistic cases.
Experiment with different collection algorithms. This includes the motivating examples above.

This document first defines mathematical notation.

Then it defines visibility, novelty and colour. These are completely separable ideas, each with its own supporting theorems, and their sections do not refer to each other.

Then it elaborates what the MPS might look like if it used all three ideas.

Finally, it explores some alternative generational collection heuristics, and evaluates them against some user-meaningful metrics. This part is largely orthogonal to everything else.

Notation and theory

Here are some useful unicode characters, in case it is easier to copy and paste them then to type them in another way:

Equality: ≠ ≝
Inequality: ≤ ≥ ≾
Logic: ∀ ∃ ∄ ¬ ∧ ∨ ∎
Implication: ⇐ ⇒ ⇔
Membership: ∈ ∉ ∋ ∌
Inclusion: ⊂ ⊃ ⊄ ⊅ ⊆ ⊇ ⊈ ⊉
Arithmetic: ∅ − ∩ ∪ ⋂ ⋃ ×
Arrows: ← → ↚ ↛ ⤳
Functions: λ ⁻¹ ∘
Subscripts: ₀₁₂₃₄₅₆₇₈₉ ₐₑᵢⱼₒᵣᵤᵥₓ ₊₋₌

Graph

We model the heap as a directed graph (G, →), in which G is a set and → is a binary relation on G. Elements of G are called objects and elements of → are called references. If x→y we say "x has a reference to y".

To avoid pedantic special cases in the following definitions, it is convenient to suppose that every object has a reference to itself, i.e. that → is reflexive. This might be unnecessarily conservative, but the excluded cases are not relevant.

We adopt the obvious definitions of the inverse relation ←, the negated relations ↛ and ↚, and the transitive closures →* and ←*.

Mutation

The mutator of a graph (G, →) is represented by a distinguished object m∈G. The mutator object represents all the user threads that have access to the heap. It is conceivable that one might model each thread by a separate mutator object, but we have not so far found it useful to do so.

The mutator is able to mutate the graph in its immediate vicinity. Formally we should say that this results in a new graph (G', →'), but we should also understand that the old graph (G, →) ceases to exist. Informally it is often more convenient to talk about the graph changing over time.

The collector is also able to mutate the graph, in order to reclaim the space occupied by objects that are unreachable from the mutator. If we wish to prove invariant properties of the graph, our case analysis must include reclaim operations.

We write (G, →) ⤳ (G', →') iff there is a mutation step that transforms the graph (G, →) into the graph (G', →'). The allowed mutation steps follow.

Create

Given x ∉ G, the operation create(x) models the mutator allocating memory for a new object.

G' ≝ G ∪ {x}
→' ≝ → ∪ {(m, x), (x, x)}.

Read

Given m → x → y, the operation read(x, y) models the mutator loading from x a reference to y.

G' ≝ G
→' ≝ → ∪ {(m, y)}.

Write

Given m → x and m → y, the operation write(x, y) models the mutator storing to x a reference to y.

G' ≝ G
→' ≝ → ∪ {(x, y)}.

Null

Given m → x → y, with x ≠ y, the operation null(x, y) models the mutator deleting from x a reference to y.

G' ≝ G
→' ≝ → − {(x, y)}.

Drop

In our discussion we defined drop(x) to model the mutator forgetting about the object. This definition is superfluous because it is equivalent to null(m, x).

Reclaim

Given a set W ⊆ G − {m}, let B ≝ G − W. Given also that ∄b∈B, w∈W. b→w, the operation reclaim(W) models the collector deleting the objects in W from the graph to form the graph (G', →'), where:

G' ≝ B
→' ≝ → ∩ (B×B)

Reachability

An object x ∈ G is said to be reachable if m→*x.

The purpose of a garbage collector is to find objects that are not reachable, and to reclaim them, i.e. to free up the memory that they occupy. It is not necessary to find all the unreachable objects at once. It is necessary to avoid dangling references from surviving objects to reclaimed objects, even if the surviving objects are unreachable.

Theorem: The mutation step reclaim(W) cannot be applied if W overlaps the reachable set. The operation preserves the reachable set and all references between reachable objects.

Theorem: The mutation step create(x) adds x to the reachable set. The mutation step null(x, y) may remove zero or more objects from the reachable set. Other mutation steps preserve the reachable set.

History

A history is a sequence h₀, h₁, ... of mutation steps.

Executing the history gives rise to a sequence of graphs. The initial graph (G₀, →₀) may be taken to be ({m}, {(m, m)}) where m is the mutator. Let (Gᵢ₊₁, →ᵢ₊₁) be the graph formed by applying hᵢ to (Gᵢ, →ᵢ). A history is well-formed iff all these graphs are defined. We will sometimes write (G, →) for the last graph in the sequence.

Let time(i) be a non-decreasing function interpreted as the time at which step hᵢ occurred.

Time can be any data type supporting the operations of a lattice, i.e. ≤, min and max. For simplicity, we may suppose that time is an integer. A good candidate for time is the number of times we've scanned the roots. Almost certainly we will choose a type with a total order.

Let now be a global variable indicating the current time; at least as large as any time(i).

Reference partition

A reference partition, written A↛B, of a graph (G, →), consists of two sets A, B ⊆ G such that ∄x∈A, y∈B. x→y.

Note that the sets A and B must not overlap, because → is assumed to be reflexive. Sometimes they are called the "black" and "white" sets respectively, but this is not always appropriate. Sometimes they are called the "source" and "target" sets. We abbreviate "reference partition" to RP.

RPs are our primary tool for finding a set W such that we can reclaim(W). Specifically, we aim to construct a final RP B↛W being one that satisfies G = B ∪ W, i.e. that includes all the objects. Non-final RPs are useful to represent the state of the algorithm.

Theorem: If A↛B is an RP for (G, →), then B↚A is an RP for (G, ←).

Historical note

The notation given here is new, and the definition is a slight simplification of the original. We found that the "grey" set was not useful; it can always be taken to be the rest of G.

Extended reference partition

An extended reference partition is a sequence of non-overlapping sets A₀, ..., Aₐ such that for all i A₀ ∪ ... ∪ Aᵢ₋₁ ↛ Aᵢ₊₁ ∪ ... ∪ Aₐ is an RP.

We used this definition a lot during our discussion (example). The theory of ERPs is a straightforward generalisation of that for RPs.

Continuous reference partition

A continuous reference partition a↛b, of a graph (G, →), consists of two functions a, b from G to time, such that ∄x∈G, y∈G. b(x)<a(y) ∧ x→y.

This definition is adapted from this email.

Theorem: If a↛b is a CRP for (G, →), then −b↚−a is a CRP for (G, ←).

Note that b↚a is usually not a CRP; it is necessary to negate the functions.

Query

Given a CRP a↛b and a time t, define query(a↛b, t) to be the RP A↛B where:

A ≝ {x∈G | t≤a(x)}
B ≝ {x∈G | b(x)<t}

Interpretation

In our discussion, a common wrong interpretation of a CRP is that it represents all the intermediate states during the mutation of a single RP. It does not.

The right interpretation is that a CRP represents a time-parameterised family of nested RPs. The interpretation of the time parameter can vary, but it typically identifies a root-scanning event that the RP is defined with respect to. Pertinent examples are visibility, novelty and colour, defined later in this document; there may be others.

We can recover any one RP of the family using the query function. However, we typically want to do this as late as possible, because we will see below that most operations which we might perform on the RP can more profitably be performed on the CRP instead. For example, by mutating the CRP, we can mutate all the RPs at once.

Fringe

Given a graph (G, →) and a set of objects P⊆G, define:

fringe(P, →) ≝ {y | ∃x∈P. x→y}
verge(P, →) ≝ {x | ∃y∈P. x→y}

Sugar: fringe(P) means fringe(P, →) and verge(P) means verge(P, →) where the → relation is unambiguous.

The definition of fringe differs slightly from the one in this email, and the definition of verge differs slightly from the one in this email.

Theorem: P↛(G−fringe(P)) and (G−verge(P))↛P are RPs.

Informally, we can construct an RP with any source set A or any target set B.

Theorem: ∀A, B. If A↛B is an RP then fringe(A) ∩ B = A ∩ verge(B) = ∅.

Informally, these constructions are optimal.

Of CRPs

Given a graph (G, →) and a function p from G to time, define:

fringe(p, →)(y) ≝ max {p(x) | x→y}
verge(p, →)(x) ≝ min {p(y) | x→y}

Sugar: fringe(p) means fringe(p, →) and verge(p) means verge(p, →) where the → relation is unambiguous.

Theorem: p↛fringe(p) and verge(p)↛p are CRPs.

Informally, we can construct a CRP with any source function a or any target function b.

Theorem: ∀a, b. If a↛b is a CRP then fringe(a)≤b and a≤verge(b).

Informally, these constructions are as tight as possible.

Theorem: Let A↛B ≝ query(p↛fringe(p), t). Then B = G−fringe(A).

Theorem: Let A↛B ≝ query(verge(p)↛p, t). Then A = G−verge(A).

Informally, taking the fringe or verge of a function constructs many tight RPs at once.

Unions and intersections

Given two RPs A₁↛B₁ and A₂↛B₂, define their union and intersection as follows:

A₁↛B₁ ∪ A₂↛B₂ ≝ (A₁ ∩ A₂) ↛ (B₁ ∪ B₂)
A₁↛B₁ ∩ A₂↛B₂ ≝ (A₁ ∪ A₂) ↛ (B₁ ∩ B₂)

Note that the terminology mirrors the treatment of the target set.

Theorem: the union and intersection are RPs.

Of CRPs

Given two CRPs a₁↛b₁ and a₂↛b₂, define their union and intersection as follows:

a₁↛b₁ ∪ a₂↛b₂ ≝ min(a₁, a₂) ↛ min(b₁, b₂)
a₁↛b₁ ∩ a₂↛b₂ ≝ max(a₁, a₂) ↛ max(b₁, b₂)

Here min(a₁, a₂) means the pointwise minimum, i.e. the function λx. min(a₁(x), a₂(x)).

Theorems:

The union and intersection are CRPs.
∀t. query(a₁↛b₁, t) ∪ query(a₂↛b₂, t) = query(a₁↛b₁ ∪ a₂↛b₂, t)
∀t. query(a₁↛b₁, t) ∩ query(a₂↛b₂, t) = query(a₁↛b₁ ∩ a₂↛b₂, t)

Informally, by computing unions of CRPs, we can compute unions of all the RPs at once, and similarly for intersections.

Induced graph

Let (G, →) be a graph, and let f be a function from G to some smaller set G'. Then f induces a graph (G', →') where →' ≝ {(f(x), f(y)) | x∈G, y∈G, x→y}.

There are two pertinent examples in the MPS: segments and zones. We say one segment refers to another if any object in the first segment refers to any object in the second.

There is an asymmetric version of the definition in which we have two different functions, one applied to the source objects and the other to the target objects. For example, we could make a relation that models references from segments to zones. Much of the theory generalises to this case, but for brevity I will not consider it futher.

The function f may be extended to mutation steps as follows:

f(create(x)) ≝ create(f(x))
f(read(x, y)) ≝ read(f(x), f(y))
f(write(x, y)) ≝ write(f(x), f(y))
f(null(x, y)) ≝ null(f(x), f(y))
f(reclaim(W)) ≝ reclaim(f(W))

Theorem: If a mutation step h modifies (G, →) then either there is no effect on (G', →'), or the effect is f(h), or h=reclaim(W) and f⁻¹(f(W))≠W.

Informally, we may think of the mutator as acting on segments (as opposed to objects) whenever it is convenient to do so.

There are two main motivations for using induced graphs: fast searching, and efficient representation of RPs and CRPs.

Fast searching

Suppose you want to find all the objects that point to objects in some target set. The simple way to do this is to loop over all the objects, scanning them for references into the target set. A faster way is available if you have already computed the induced segment graph. Loop over all the segments, testing for references into the target set, and if you find one, loop over the objects in the segment.

Efficient representation of RPs and CRPs.

Theorem: Let A'↛'B' be an RP on the graph (G', →') induced by f. Then A↛B is an RP on (G, →) where:

A ≝ f⁻¹(A')
B ≝ f⁻¹(B')

Theorem: Let a'↛'b' be a CRP on the graph (G', →') induced by f. Then a↛b is a CRP on (G, →) where:

a = a'∘f
b = b'∘f

Here, f⁻¹(A') means {x∈G | f(x)∈A'} and a'∘f means λx. a'(f(x)).

Informally, representing an RP or a CRP on the induced graph is sufficient to represent an RP or a CRP on the graph. Of course, not all RPs and CRPs can be exactly represented in this way.

Conservative approximation

An RP A'↛B' conservatively approximates an RP A↛B, written A'↛B' ≾ A↛B, iff A'⊆A and B'⊆B.

The motivation for this definition is that A'↛B' might be more efficiently representable than A↛B. In particular, it might be representable as an RP of an induced graph. The information lost in the approximation (if any) is traded for computational efficiency.

Theorem: If A↛B is an RP, A'⊆A and B'⊆B, then A'↛B' is also an RP (and therefore a conservative approximation).

Separation

A separation of a set P is a superset of fringe(P).

An isolation of a set P is a superset of verge(P).

These define RPs that approximate the ones defined by fringe and verge. This concept was defined in this email and used in this email.

Of CRPs

A CRP a'↛b' conservatively approximates a CRP a↛b, written a'↛b' ≾ a↛b, iff ∀x. a'(x)≤a(x) ∧ b'(x)≥b(x).

Theorem: if a↛b is a CRP and ∀x. a'(x)≤a(x) ∧ b'(x)≥b(x) then a'↛b' is also a CRP (and therefore a conservative approximation).

Theorem: a'↛b' ≾ a↛b iff ∀t. query(a'↛b', t) ≾ query(a↛b, t).

Informally, by computing approximations of CRPs, we can compute approximations of many RPs at once, with exactly the same precision.

Summary

Given an RP A↛B on a graph (G, →) and a function f that induces a graph (G', →'), define summary(A↛B, f) to be A'↛'B' where:

A' ≝ {x'∈G' | f⁻¹(x')⊆A}
B' ≝ {x'∈G' | f⁻¹(x')⊆B}

Theorem: A'↛'B' is an RP on (G', →').

Theorem: f⁻¹(A)↛f⁻¹(B) ≾ A↛B.

Theorem: For all RPs A'↛'B' such that f⁻¹(A')↛f⁻¹(B') ≾ A↛B we find A'↛'B' ≾ summary(A↛B, f).

Informally, summary(A↛B, f) is the best way of approximating A↛B by an RP of the induced graph.

Of CRPs

Given a CRP a↛b on a graph (G, →) and a function f that induces a graph (G', →'), define summary(a↛b, f) to be a'↛b' where:

a'(x') ≝ min {a(x) | x∈f⁻¹(x')}
b'(x') ≝ max {b(x) | x∈f⁻¹(x')}

Theorem: a'↛'b' is a CRP on (G', →').

Theorem: a'∘f↛b'∘f ≾ a↛b.

Theorem: For all CRPs a'↛'b' such that a'∘f↛b'∘f ≾ a↛b we find a'↛'b' ≾ summary(a↛b, f).

Informally, summary(a↛b, f) is the best way of approximating a↛b by a CRP of the induced graph.

Theorem: ∀t. query(summary(a↛b, f), t) = summary(query(a↛b, t), f)

Informally, by computing summaries of CRPs, we can compute sumaries of many RPs at once, with exactly the same precision.

Opportunistic scanning

TODO: Move this section. It does not belong here.

Scanning an object x means finding all all objects y such that x→y. Scanning a segment means scanning all the objects in the segment.

All segments get scanned occasionally, for a variety of reasons, including read- and write-barrier hits and to make progress on collection. Scanning a segment is a moderately expensive operation, because it touches quite a bit of memory.

It makes sense to try to avoid scanning the same segment for different reasons at different times. If ever we endure the expense of scanning a segment for any reason, we should make as much use of the information as we can. At least, we should refresh the summary for the segment of every RP and CRP that we are maintaining.

In MPS

Currently the only concept in the MPS that is similar to a summary is a "zone set" or "ref set". Some work has been done to generalise the concept.

Novelty

The restriction to the allowed mutation steps naturally gives rise to a CRP on the reversed [graph] (G, ←), which we call "novelty". Intuitively it models the creation and modification times of the objects.

Abstract definition

Let (G₀, →₀), (G₁, →₁), ..., (G, →) be a sequence of graphs with mutator m, arising from executing a history h₀, h₁, ... .

Creation time

Define:

newtime(x) ≝ max {time(i) | hᵢ=create(x)}

Except newtime(m) ≝ now

Theorem: If x→ᵢy then newtime(y)≤now.

Define novelty to be newtime↚fringe(newtime, ←).

Theorem: novelty is a CRP.

FIXME: I would like to redefine novelty ≝ newtime↚oldtime, as that is the meaning requried by the rest of the document.

Modification time

The time of the most recent write to an object is an approximation of fringe(newtime, ←).

Define:

writetime(x) ≝ max {time(i) | x=m or hᵢ=create(x) or hᵢ=write(x, y) or hᵢ=null(x, y)}

Theorem: ∀x. fringe(newtime, ←)(x)≤writetime(x).

Theorem: newtime↚writetime ≾ novelty

In our discussion, we discovered the newtime↚writetime CRP first, but novelty is simpler and more discriminating.

Practical approximation

Note that as the mutator runs, and the history lengthens, and (G, →) is mutated, the fringe(newtime, ←) function is also implicitly mutated. In particular, any write(x, y) or null(x, y) step can change fringe(newtime, ←)(x). In contrast, newtime is constant for each object as long as the object exists, and so requires no maintenance.

A decent approximation to fringe(newtime, ←) can be maintained using a combination of a write barrier and scanning. Formally, we will maintain (by incremental mutation) a function oldtime which is an upper bound on fringe(newtime, ←).

Invariant: ∀x. fringe(newtime, ←)(x)≤oldtime(x)≤writetime(x).

Write barrier

Invariant: Write-protect every object x∈G for which oldtime(x)<now.

Note: read-protection is unnecessary for tracking novelty.

Each time we increase now, we establish the invariant by taking the following steps:

Set newtime(m) to now.
Write-protect all objects.
Simulate a write-barrier hit on m itself.

Each time we get a write-barrier hit on an object x, we can preserve the invariant by taking the following actions:

Set oldtime(x) to now.
Unprotect x.

Theorem: newtime↛oldtime ≾ novelty

The principal source of approximation in this algorithm is that when the mutator executes write(x, y) we pay no attention to y. We just use the conservative bound newtime(y)≤now. We even do this for null(x, y) operations, because we cannot distinguish them from write operations. As long as now remains unchanged, subsequent write(x, ?) and null(x, ?) operations would not change oldtime(x), so it is sound to unprotect x.

Scanning

Whenever we feel like it, we can select any object x, scan it, look up newtime(y) for all y such that x→y, compute the actual fringe(newtime, ←)(x), and set oldtime(x) to the result. This might reduce oldtime(x). If oldtime(x)<now, we must write-protect x.

This is an example of opportunistic scanning; whenever we scan x for any reason we might as well do this too. All objects get scanned occasionally, so oldtime will usually be tight for objects that have not been modified recently.

Theorem: If we never scan, then oldtime=writetime.

Segments

On real CPUs, the granularity of hardware memory protection is larger than individual objects. In practice, we have to write-protect any object that shares a segment with an object we wish to protect. In effect, this means that we define the novelty CRP on the induced segment graph. This is another form of approximation. On the plus side, it yields a CRP that can be efficiently represented.

Note that computing novelty on the segment graph means we have to approximate newtime as well as oldtime. The newtime of a segment should be set to the earliest newtime of any object in the segment, as for a summary.

Use

The most important application of novelty is as a heuristic for guessing whether objects are reachable. Traditional generational garbage collection may be seen as the use of newtime thresholds to partition the heap into generations which are heuristically expected to have different mortality rates. oldtime gives additional information which might also prove useful.

Novelty, like any CRP, can be used to disambiguate ambiguous references. Intuitively, the argument is that if an object x contains an ambiguous reference that appears to point to an object y such that oldtime(x)<newtime(y), then it is not a reference.

In the MPS

The closest concept in the MPS to newtime is the generation that a segment belongs to.

The closest concept in the MPS to oldtime is the zone set of a segment. This is an approximation which relies on maintaining a correlation between the generation of an object and its zone.

Both concepts have proved extremely useful.

job003764 is to implement something similar to oldtime. DL has made at least three attempts to do so. He says the attempts were not abandoned for any technical reason, and that he still wants to do this work.

Visibility

The restriction to the allowed mutation steps naturally gives rise to a CRP, which we call "visibility". Intuitively it models the working set of the mutator, which may be roughly defined as the set of objects that the mutator has accessed recently.

Abstract definition.

Let (G₀, →₀), (G₁, →₁), ..., (G, →) be a sequence of graphs with mutator m, arising from executing a history h₀, h₁, ... .

Visible set

Given a time t, let i ≝ min {j | time(j)=t}, and define:

roots(t) ≝ {x | m→ᵢx}
visible(t) ≝ {x | ∃j≥i. m→ⱼx}.
created(t) ≝ {y | ∃j≥i. hⱼ=create(y)}
readaddr(t) ≝ {x | ∃j≥i, y. hⱼ=read(x, y)}
readdata(t) ≝ {y | ∃j≥i, x. hⱼ=read(x, y)}

Informally:

roots(t) is the set of objects that the mutator refered to at time t.
visible(t) is the set of objects that the mutator refered to since time t.
created(t) is the set of objects created since time t.
readaddr(t) is the set of objects read from since time t.
readdata(t) is the set of references acquired by reading since time t.

Theorem: visible(t) = roots(t) ∪ created(t) ∪ readdata(t)

Informally, the only way the mutator could have refered to an object since time t is if it already had it, it created it, or it read it out of another object.

Hidden set

Intuitively, the mutator can only learn about objects that are referred to by objects it reads from. However, perhaps counterintuitively, readdata(t) is not necessarily a subset of fringe(readaddr(t)).

For example, suppose at some earlier time m→x→y, and that the mutator executes read(x, y) then null(x, y), so that now m→x and m→y but not x→y. Now the readaddr set for the earlier time is {x} and the current fringe of that set is also {x}, but the readdata set for the earlier time is {y}.

Also perhaps counterintuitively, fringe(readaddr(t)) does not increase monotonically with mutator activity, even though both readaddr(t) and readdata(t) do, and even though fringe(readaddr(t)) is monotonic in t.

The problem is that the → relation has changed since the read operation. When the mutator executes null operations, the fringe shrinks. This motivates the following definitions:

revealed(t) ≝ {z | ∃j≥i, x, y. hⱼ=read(x, y) and x→ⱼz}
hidden(t) ≝ G − roots(t) − created(t) − revealed(t)

Theorem: readdata(t) ⊆ revealed(t)

Theorem: hidden(t) ∩ visible(t) = ∅.

Informally, (the complement of) the hidden set is a bound on the visible set.

As an RP

Define:

read(t) ≝ {m} ∪ created(t) ∪ readaddr(t)

Theorem: read(t) ⊆ visible(t).

Theorem: read(t)↛hidden(t) is an RP.

Informally, if an object x in read(t) refers to some object y, then at least one of the following must hold:

x is m and y is in root(t).
x is in created(t) and y=x.
x in in readaddr(t) and y is in revealed(t)
The mutator executed write(x, y) since time t, and y is in visible(t).

As a CRP

Define:

readtime(x) ≝ max {time(i) | x=m or hᵢ=create(x) or ∃y. hᵢ=read(x, y)}
visibletime(x) ≝ max {time(i) | m→ᵢx}
hidetime(x) ≝ max {time(i) | m→ᵢx or ∃y, z. hᵢ=read(y, z) and y→ᵢx}

Theorem: ∀x. readtime(x) ≤ visibletime(x) ≤ hidetime(x)

Theorems:

readtime(x) = max {t | x∈read(t)}
visibletime(x) = max {t | x∈visible(t)}
hidetime(x) = max {t | x∉hidden(t)}

Define visibility to be readtime↛hidetime.

Theorem: visibility is a CRP.

Practical approximation

Note that as the mutator runs, and the history lengthens, and (G, →) is mutated, the functions readtime and hidetime are also implicitly mutated. In particular, any read(x, y) step can change readtime(x) and hidetime(z) for any z such that x→z, and a drop(x) step can change hidetime(x).

Short of emulating the CPU, there is no practical way to act on every read step, and noticing drop steps is even harder. Accurate readtime and hidetime functions can nonetheless be maintained with no additional approximation, using a combination of a read barrier and scanning.

Read barrier

Invariant: read-protect every object x∈G for which readtime(x)<hidetime(x)=now.

Note: Write-protection is not strictly necessary for tracking visibility, but it is necessary for pre-scanning, described below. In any case, it might not be possible to read-protect memory without write-protecting it.

Each time we increase "now" we can establish the invariant by taking the following actions:

Set hidetime(m) to now.
Simulate a read barrier hit on m itself.

Each time the mutator reads from a read-protected object x, the hardware informs us of a read barrier hit, and we have a chance to run some code before the read step occurs. We can preserve the invariant by taking the following actions:

Set hidetime(y) to now for all objects y such that x→y.
Of those objects, read-protect those for which readtime(y)<now.
Set readtime(x) to now and unprotect it.

In any period in which now remains unchanged, only the first read(x, ?) can affect readtime(x) or hidetime(z) for x→z. We do not care if the mutator reads from the same object more than once, so it is sound to unprotect x.

Segments

On real CPUs, the granularity of hardware memory protection is larger than individual objects. In the MPS, the granularity is segments. Therefore, to maintain the read-barrier invariant, we must protect all objects that share a segment with any object that we wish to protect.

In effect, this means that we define the visibility CRP on the induced segment graph. This introduces further approximation. On the plus side, it yields a CRP that can be efficiently represented.

Pre-scanning

Note that it is sound and preferable to find all segments y such that x→y before read-protecting a segment x, and to cache the result until the read barrier hit on x, if any. The email which introduced this idea has diagrams. The extra state is stored only for the segments coloured red.

This is an example of opportunistic scanning; whenever we scan x for any reason we might as well do this too.

Write barrier

To protect the cached information from pre-scanning, we must write-protect every segment that we read-protect.

Whenever we get a write-barrier hit, we pretend we had a read-barrier hit first, to make use of the cached information before it goes stale.

Use

The most important application of visibility is to separate the parts of the heap that the mutator can access from the parts that the collector can access. For this application, an RP is sufficient. The practical benefit is that no memory protection of any kind is needed for objects x such that hidetime(x)<now. This was succinctly stated in this email.

A second possible application of visibility, and indeed any CRP, is to disambiguate ambiguous references. Intuitively, the argument is that if an object x contains an ambiguous reference that appears to point to an object y such that hidetime(x)<readtime(y), then it is not a reference.

A third possible application of visibility is as a heuristic for guessing whether objects are reachable. Objects with recent readtime are probably reachable.

In MPS

The concept of visibility does not currently exist in MPS, but most of the infrastructure to support it does, including root scanning, hardware memory protection, and segment summaries.

Hardware memory protection is indeed used to separate the parts of the heap that the mutator can access from the parts that the collector can access. This is currently expressed using the "shield" API.

The concept of a mutation step is not currently used in the MPS; it currently makes the more conservative approximation that the mutator can acquire a reference to any object it chooses. This results in more segments than necessary being protected, and it makes it easier than necessary for the mutator to conjure up an ambiguous reference.

We discussed the relationship of visibility to the existing shield API, but it requires further thought.

Colour

Colour as an RP was introduced in 1978 by Dijkstra et al. Colour as a CRP follows from imagining overlapping several collection cycles. The RP for collection t can be recovered as query(colour, t), where t is the time at which the roots were scanned at the start of the collection.

What reason is there to believe that the black and white sets of different collections are nested as necessary to form a CRP? This monotonicity property does not follow logically from any definition. Instead, it follows from a strategy-stealing argument.

Abstract definition

Bearing in mind the goals, we want the definition of colour to be as algorithm-independent as possible. We would like the MPS to be correct for any collection algorithm that fits the abstraction, as this gives us the maximum flexibility to experiment.

Having said that, we do allow strategy-stealing arguments to exclude algorithms that are not worth trying. For example, we exclude the possibility of maintaining two separate colour CRPs; it is always better to maintain their union.

Although we will talk in terms of CRPs, we do not neglect the possibility of using only RPs. Every argument that applies to CRPs also applies to RPs, and an RP-based algorithm can be constructed from a CRP-based algorithm either by using the query function, or by defining time to be a boolean {earlier, now}.

White objects

Let whitetime be any function from G to time such that whitetime(m)=now.

Say an object is white for collection t iff whitetime(x)<t.

Informally, whitetime(x) is the time of the start of the collection at the end of which we plan on reclaiming x, with values now or more indicating that we suspect that x is still reachable. We may make any plan we choose, and we can change our mind later by increasing whitetime (which is called "greyening" in the MPS).

Note that we allow whitetime(x)>now. This can be used to record a heuristic guess that x will remain reachable for a while. We might call this pardoning x: the opposite of what the MPS calls condemning x.

Completion.

Given the start time t of a collection, define the collection to be complete iff ∀x∈G. whitetime(x)≥t.

In order to complete collection t, we have to do one of two things for each x∈G:

Reclaim some set containing x.
Increase its whitetime(x) to at least t.

The easiest but least helpful way to complete collection t≤now is by pardoning every object. At the other end of the spectrum, the ideal way to complete the collection is to reclaim all objects that were unreachable at time t, and set the whitetime of all other objects to t. Practical algorithms are somewhere in between.

Tracing

Implicit in the concept of colour is the concept of tracing. Intuitively, the plan represented by whitetime is unrealistic unless it satisfies the condition that whitetime(x)≤whitetime(y) whenever x→y. Tracing is a process that incrementally establishes this condition. Some of us feel that colour terminology should not be used for RPs and CRPs that we do not trace.

Definition: Tracing an object x∈G means finding all objects y such that x→y, and for each one increasing whitetime(y) to at least whitetime(x).

Theorem: Tracing an object x∈G establishes verge(whitetime)(x)=whitetime(x).

Tracing is an example of opportunistic scanning; whenever we scan x for any reason, and assuming that tracing x is something we will eventually need to do, we might as well do it then.

It is necessary to trace m occasionally, otherwise the value of now never propagates into the rest of the graph. Let us trace m each time now increases.

Note that it is sound to increase whitetime(y) to a larger value than whitetime(x). We might wish to do this heuristically; for example we might round whitetime(y) up to a multiple of the largest power of two less than the age of y. This ensures that the number of times we trace y, and the objects reachable from it, grows at most logarithmically with the age of y. The effect is similar to generational collection.

Black objects

To construct proofs that objects can be reclaimed, we must pair whitetime with another function blacktime to form a CRP:

Invariant 1: ∀x. blacktime(x) ≤ verge(whitetime)(x)

Informally, verge(whitetime)(x) is the time of the start of the earliest collection which we can't complete without tracing x, and blacktime(x) is the time we plan to trace x, which must be earlier.

By a strategy-stealing argument, each time we trace an object x, and thereby increase verge(whitetime)(x) to whitetime(x), we may and therefore must increase blacktime(x) to whitetime(x).

The argument does not apply to m; increasing blacktime(m) is a significant commitment, and different collection algorithms may choose to do it at different times. The choice distinguishes the mutator is black from the mutator is grey designs.

Using more strategy-stealing arguments, we will argue below that there is otherwise very little choice in defining blacktime. The best choice seems to be determined by whitetime, by which objects you trace, by blacktime(m) and by what the mutator does.

TODO: It would be useful to be sure that we only ever need to scan an object x if blacktime(x)<blacktime(m). Can we assume that?

Define colour to be blacktime↛whitetime.

Theorem: Invariant 1 implies that colour is a CRP.

Say an object is black for collection t iff t≤blacktime(x).

Theorem: Suppose t is a time such that B↛W ≝ query(colour, t) is a final RP. Then we can reclaim(W) and complete collection t.

Simple collection

To understand abstractions, it is helpful to have an example. A simple example of a collection algorithm is to repeat the following steps while any object has a blacktime earlier than now:

Find the objects with the smallest blacktime.
Of those, find the objects with the largest whitetime.
If the blacktime and whitetime are equal, reclaim the objects.
Otherwise, trace at least one of the objects.

Theorem: This algorithm eventually collects all unreachable objects.

This algorithm is proposed as a point of comparison. There is probably a better one.

Practical approximation

Note that as the mutator runs, and (G, →) is mutated, verge(whitetime) is also implicitly mutated, in ways which we cannot hope to track. This endangers invariant 1; we cannot just define blacktime=verge(whitetime) as we would like to.

We can maintain invariant 1 by incrementally mutating the blacktime function using read and write barriers, and some tracing.

Constraints

To prove that we maintain invariant 1 (and to determine the best blacktime function consistent with the proof) we need to consider all ways in which verge(whitetime)(x) can decrease:

We must never decrease whitetime(y) for any object y without finding every object x such that x→y and fixing blacktime(x). In practice, this means we must never decrease whitetime(y). The main case in which we change whitetime(y) is tracing, which only increases it.
Whenever we create(x) we are free to choose whitetime(x). We must choose whitetime(x)≥blacktime(m), to preserve the invariant for m.
A read barrier is needed to prevent the mutator learning a reference that would break the invariant for m.
A write barrier is needed to prevent the mutator writing a reference into an object x that would break the invariant for x.

Read barrier

Invariant 2: Read-protect every object x such that blacktime(x)<blacktime(m).

Each time we increase blacktime(m), i.e. when we trace m, we can establish invariant 2 by read-protecting every object x such that blacktime(x)<blacktime(m).

Each time we get a read-barrier hit on an object x, we must have m→x and therefore whitetime(x)≥blacktime(m). We can preserve invariant 2 by taking the following actions:

Trace x, so as to increase blacktime(x) to whitetime(x).
Unprotect x.

Theorem: verge(whitetime)(m) never decreases below blacktime(m).

Write barrier

Invariant 3: Write-protect every object x such that blacktime(x)>blacktime(m).

Each time we increase blacktime(x), i.e. when we trace x, we can preserve the invariant by write-protecting x if its new blacktime exceeds that of m.

Each time we get a write-barrier hit on an object x, we can preserve the invariant by taking the following actions:

Decrease blacktime(x) to blacktime(m).
Unprotect x.

Theorem: verge(whitetime)(x) never decreases below blacktime(x).

Segments

For efficiency, we would like to represent colour as a CRP of the induced segment graph. However, if were to do this naively, we would never be able to reclaim objects that share segments with surviving objects. Worse still, we would never be able to reclaim objects that are reachable from those objects. The problem could get arbitrarily bad.

We really need control of the whitetimes of individual objects, because that's what determines which objects will be reclaimed. We also need control of the blacktimes of individual objects, because that's how we prove that reclamation is possible. The absolute minimum requirement is that if the whitetime of an object is minimal for its segment, we need to be able to leave it unchanged while we increase the whitetimes of other objects in the same segment, and similarly for blacktimes.

There are several possible ways to do this efficiently.

Marking

We can store two whitetime values and two blacktime values for each segment, and a boolean for each object indicating which one to use. The booleans are called mark bits. We say an object is marked if it is using the later whitetime (and the corresponding blacktime).

To increase the whitetime of an object y in the segment to at least t, we first mark y, then increase the later whitetime of the segment to at least t. This might increase the whitetimes of other marked objects.

To set the blacktime of a marked object, we can only decrease the segment-level blacktime. This may decrease the blacktimes of other marked objects.

Whenever we reclaim the unmarked objects in the segment (or if ever there are no unmarked objects) we can set the earlier whitetime and blacktime equal to the later whitetime and blacktime, and unmark all objects. We can then set the blacktime for marked objects to infinity.

Forwarding

We can achieve the effect of increasing the whitetime of an object by moving it to a segment with a larger whitetime.

Moving an object y is non-trivial, because there is no efficient way of finding all objects x such that x→y. Instead, we copy y and all of its references into a new object y' in the destination segment, and overwrite y with a forwarding object that refers only to y'.

Later, as we discover references to y, we replace them with references to y'. In particular, we do this during tracing. We also have to prevent the mutator from acquiring a reference to y; the existing colour read barrier is sufficient provided whitetime(y)<blacktime(m).

TODO: Can we assume whitetime(y)<blacktime(m)?

We can choose whitetime(y') freely by placing it in an appropriate segment. Meanwhile, whitetime(y) remains unchanged. Eventually, the whole segment containing y gets reclaimed.

Note that to support overlapping collections we may sometimes forward an object more than once. In general, there can be a chain of forwarding objects pointing to the true object.

Nailed objects

If there is an ambiguous reference to an object y, we cannot forward it. We say it is nailed. In this case we can use a hybrid data structure. We store two whitetimes for each segment, and a boolean for each nailed object, indicating which one to use. To increase the whitetime of an object, we forward it if possible, otherwise we mark it.

In MPS

If you define time to be a boolean {earlier, now}, then this section describes (aspects of) the current design of the MPS fairly accurately. Understanding the existing MPS as an instance of a more general abstraction is a goal; it tells us how to modularise the code.

Tri-colour collection

In the boolean case, there are only three states an object can be in, being black, grey and white, because the fourth possibility (blacktime=now, whitetime=earlier) is impossible.

There is no choice as to which objects to trace (the grey objects) so it could be said that the MPS is currently using the simple collection algorithm.

Condemning

The MPS has a concept of "condemning" a set of objects, which means making the objects white for the next collection. This concept does not obviously fit the abstraction defined here, in which the "current" colour of objects is changed by incrementing the global variable "now".

I think condemning can be seen as an instance of the abstraction, as follows. First, think of condemning a set as condemning the whole world and then pardoning the complementary set. Second, think of condemning the whole world as incrementing now. Third, think of pardoning a set as increasing the whitetimes of objects in the set.

Because the MPS only has a boolean notion of time, it cannot overlap collections. At the end of a collection cycle, when the "earlier" time is unused, the MPS must first replaces all occurrences of "now" with "earlier" before it can start the next collection cycle. Although it's a bit forced, this does fit the abstraction; we don't care how time is represented, and this is just a way of incrementing "now".

Representation

The MPS uses marking for its non-moving pools. The two whitetimes are not stored, because they are always earlier and now respectively.

The MPS uses forwarding for its moving pools, with a nailboard for ambiguously referenced objects. There is some ongoing work to improve the abstraction of nailed objects; this is a goal.

The MPS currently uses a "mutator is black" design, which corresponds to fixing blacktime(m)=now. Note that this guarantees invariant 3, making the write barrier unnecessary. Escaping this special case is a goal.

7-state model

The previous sections give independent definitions of three [CRP]s: novelty, visibility and colour. Any subset of these could be used together. This section elaborates the consequences of combining all three.

We will choose to use pre-scanning for visibility, to avoid the need to scan read-protected memory.

We might as well make the mutator black for the current collection, since the work needed to support that decision is a subset of the work needed to support pre-scanning for visibility. Therefore, in this section, let us maintain the extra invariant blacktime(m)=now. This allows several simplifications.

Let us make the addition assumption that the collection algorithm doesn't bother to trace an object x unless blacktime(x)<blacktime(m). This helps with the implementation of forwarding.

The material in this section is largely a repeat of material from earlier sections (which are cross-referenced), grouped in a different way, and specialised. In a real implementation, we would hope to avoid "hand compiling" the algorithm like this. However, it is a useful exercise, to help understand what the algorithm does when, and what data structures support it. It could also help us to design the necessary abstractions.

State

As usual, represent the heap by a graph (G, →) with mutator m∈G.

To represent the CRPs we maintain the following functions from G to time:

newtime
oldtime
readtime
hidetime
blacktime
whitetime

These can mostly be stored at the segment level, i.e. objects in the same segment share the same values, except that colour is represented at a finer granularity.

Also stored at the segment level is a cache of the segments y that each segment x refers to. The cached information remains valid as long as readtime(x)<hidetime(x). However, the cached information is only needed if hidetime(x)=now; for other segments it can be discarded and later recomputed.

Mutator

The mutator notionally has a state too:

newtime(m)=now
oldtime(m)=now
readtime(m)=now
hidetime(m)=now
blacktime(m)=now
whitetime(m)=now

When the mutator executes a create(x) step, we initialise the state of x to match the state of the mutator.

Inequalities

We have the following invariants by design:

∀x. newtime(x)≤oldtime(x) because novelty is a CRP on (G, ←).
∀x. readtime(x)≤hidetime(x) because visibility is a CRP on (G, →).
∀x. blacktime(x)≤whitetime(x) because colour is a CRP on (G, →).

We have the following inequalities "accidentally" due to the emergent behaviour of the combined algorithm:

∀x. hidetime(x)≤blacktime(x).
∀x. oldtime(x)≤readtime(x).

Seven states

Pleasingly, the inequalities mean that the six times form an increasing sequence. Therefore, for each collection t, every object x must be in one of seven states:

New: t≤newtime(x)
Changed: newtime(x)<t≤oldtime(x)
Read: oldtime(x)<t≤readtime(x)
Dim: readtime(x)<t≤hidetime(x)
Finished: hidetime(x)<t≤blacktime(x)
Grey: backtime(x)<t≤whitetime(x)
White: whitetime(x)<t

Protection

Hardware memory protection is unnecessary for hidden segments, because the mutator cannot access such segments anyway. We add this side-condition to the abstract memory protection requirements, and simplify, to obtain the following invariants:

Novelty: Write-protect every object x∈G for which oldtime(x)<hidetime(x)=now.
Visibility: Read- and write-protect every object x∈G for which readtime(x)<hidetime(x)=now.
Colour:
- invariant 2: Read-protect every object x such that blacktime(x)<hidetime(x)=now. There are no such objects.
- invariant 3: Write-protect every object x such that blacktime(x)>hidetime(x)=now.

Transitions

The state is updated in response to mutator activity and in the background by the collector, always maintaining the invariants.

Scanning

Background collector threads can scan hidden segments at any time, according to the priorities of the collection algorithm. Scanning is also performed when the mutator hits a read- or a write-barrier. Only hidden segments are scanned.

The unit of scanning is a segment, i.e. an object of an induced graph. Scanning a segment means scanning every object x in the segment. Scanning x means finding all objects y such that x→y. Whenever we do this, we take the opportunity to do all of the following work, before applying the appropriate protection to x:

Colour: If blacktime(x)≥whitetime(x) or blacktime(x)≥blacktime(m), do nothing. Otherwise, for each y, increase whitetime(y) to at least whitetime(x), e.g. by forwarding y to a new segment, or by marking it, and increase blacktime(x) to whitetime(x). This is the tracing that supports [colour].
Novelty: Set oldtime(x) to max {newtime(y) | x→y}. This supports novelty.
Visibility: If necessary, recompute and cache the set of the segments containing the objects y. This is the pre-scanning that supports visibility.

Increasing now

Each time we increase the global variable "now", we pause the mutator and find all objects x such that m→x. Whenever we do this, we take the opportunity to do all of the following work:

Novelty: Set newtime(m) and oldtime(m) to now.
Colour: Set whitetime(m) and blacktime(m) to now. For each x such that m→x, increase whitetime(x) to at least now.
Visibility: Set hidetime(m) and readtime(m) to now. For each x such that m→x, set hidetime(x) to now, and scan x.

Note that increasing now allows us to remove hardware protection from almost everything, apart from the objects directly referenced by m.

Read barrier

Each time the mutator hits a read barrier on a segment x, we do all of the following work, then remove the read protection from x:

Visibility: For each segment y in the stored set of segments referred to by x, set hidetime(y) to now, and scan y. Then set readtime(x) to now.
Colour: Naively, we would also scan x to maintain the colour read barrier, but pre-scanning makes this unnecessary. We've already done the work, and doing it again would have no effect.

Write barrier

Each time the mutator hits a write barrier on a segment x, we do all of the following work, then remove the write protection from x:

Visibility: Run the read barrier handler.
Novelty: Increase oldtime(x) to now.
Colour: Decrease blacktime(x) to now.

TODO

Redefine novelty ≝ newtime↚oldtime in novelty.

Move opportunistic scanning section to somewhere more appropriate.

Introduction

Goals

Prerequisites

Actions

Contents

Notation and theory

Graph

Mutation

Create

Read

Write

Null

Drop

Reclaim

Reachability

History

Reference partition

Historical note

Extended reference partition

Continuous reference partition

Query

Interpretation

Fringe

Of CRPs

Unions and intersections

Of CRPs

Induced graph

Fast searching

Efficient representation of RPs and CRPs.

Conservative approximation

Separation

Of CRPs

Summary

Of CRPs

Opportunistic scanning

In MPS

Novelty

Abstract definition

Creation time

Modification time

Practical approximation

Write barrier

Scanning

Segments

Use

In the MPS

Visibility

Abstract definition.

Visible set

Hidden set

As an RP

As a CRP

Practical approximation

Read barrier

Segments

Pre-scanning

Write barrier

Use

In MPS

Colour

Abstract definition

White objects

Completion.

Tracing

Black objects

Simple collection

Practical approximation

Constraints

Read barrier

Write barrier

Segments

Marking

Forwarding

Nailed objects

In MPS

Tri-colour collection

Condemning

Representation

7-state model

State