Lightly Soiled Lambda Calculus

A pure functional language is one in which everything is immutable. Such a language has the property that every expression is referentially transparent, i.e. it can be replaced with its value without changing the meaning of the program.

Functional purity is not necessary for referential transparency, however, and it forbids useful idioms. A weaker rule that nonetheless guarantees referential tansparency is that no expression is both shared and mutable. Mark Wainwright suggests that a language with this property should be called “lightly soiled”.

As I write this, Rust is the main example of a lightly soiled language. It has a static type system (“borrow checker”) that checks the “shared NAND mutable” property. Its static type system has additional goals too, such as tracking responsibility for freeing memory. A common criticism of Rust’s borrow checker is that it is difficult to learn how to use it.

Here I attempt to make a “lightly soiled lambda calculus” as a prototypical example of a lightly soiled language. It simplifies Rust’s design by lowering its ambitions:

Heap-allocated memory is freed using reference counting.
Values with short lifetimes cannot be returned from function calls.
It’s a toy language; the only constructor is “λ”.

Concepts

In an ordinary type system, each expression that stands for a value is assigned a type, which describes and restricts the possible values. In addition, the lightly soiled lambda calculus assigns each expression a set of duties, which restrict how the value can be used.

Duties

There are two possible duties:

constant - the value must not be mutated.
doomed - the value must not outlive the current function call.

Each of these is a trade-off. Taking on a duty permits behaviour or assumptions which would not otherwise be allowed:

constant - the value can be shared.
doomed - the value can be assumed dead after the current function call.

Other duties are conceivable, but not needed here.

Function duties

Function types are also annotated with duties which restrict how the function uses its free variables. Specifically, a function whose type is marked as constant cannot mutate its free variables.

If variable has certain duties, and stands for a function of a type that does not respect those duties, then the function cannot be called. Specifically, a function that mutates its free variables cannot be called via a variable marked as constant.

Subtyping

It is always safe to treat an expression as having additional duties, but duties can never be removed or otherwise ignored.

It is always safe to treat a function as respecting fewer duties, but never more.

Grammar

Assume a “ground language”, consisting of things like integers, strings, and operations on them. The ground language also includes all the primitive mutation and I/O operations. The details of the ground language are abstract here, so I will make use of non-terminals <ground-expr> and <ground-type>, being respectively the expressions and types that come from the ground language.

Assume a non-terminal <name> that represents valid variable names.

Keywords

The grammar uses the following keywords:

λ to define a function.
Superscript ᶜ to indicate constant values.
Superscript ᵈ to indicate doomed values.
, to separate list items, and brackets to enclose them.
: to introduce a type annotation.
brackets also for disambiguation.

Notation

Please read <foo> , ... as a comma-separated list of zero or more <foo>s.

Please read <foo>? as an optional <foo>.

Expressions

As in the lambda calculus, an expression (other than a <ground-expr> or a bracketed expression) must be a variable name, a function application, or a function definition:

<expr> ::=
    <ground-expr>
    ( <expr> )
    <name>
    <expr> ( <expr> , ... )
    λ ( <param> , ... ) <expr>

Note that functions in this language take zero or more arguments, in contrast to the simply-typed lambda calculus in which functions always take one argument. This is to obviate Currying, which is not always possible.

As in the simply-typed lambda calculus, function parameters require explicit type annotations:

<param> ::=
    <name> : <duty-type>

In the function definition λ(x₁: u₁, ...) e the names x₁, ... must all be distinct.

Types

As in the simply-typed lambda calculus, a type (other than a <ground-type> or a bracketed type) must be a function type:

<type> ::=
    <ground-type>
    ( <type> )
    λ ᶜ? ( <duty-type> , ... ) <duty-type>

Note that two kinds of function type are distinguished, according to the ᶜ flag.

<duty-type> adds duties to <type>:

<duty-type> ::= ᶜ? ᵈ? <type>

Environments

As in the simply-typed lambda calculus, the type-checking rules make use of the concept of an <environment>. The purpose of the <environment> is to track the free variables of an <expr> and the way in which they are used. <environment>s are a concept internal to the type-checker, and never appear in a program.

Define an <environment> to be a finite partial map from <name> to <duty-type>. I will use the syntax { <name> : <duty-type> , ... } to write literal <environment>s.

Meta-syntactic variables

Boolean variables c, d stand for duty flags. e stands for an <expr>, t for a <type>, u for a <duty-type>, and x for a <name>. The greek letter Γ stands for an <environment>.

Utility functions

Define predicates is_constant and is_doomed on <duty-type> in the obvious way.

Semantics

Erasing all type and duty annotations from a lightly-soiled lambda calculus expression gives an untyped lambda calculus expression, which has roughly its usual semantics. However, the values manipulated by the lambda calculus expression are references into a heap, and can be used to mutate the contents of the heap.

Heaps and values

Let <ref> be a set equipped with an allocate procedure that produces elements that have never been used before.

A <scope> is a finite map from <name> to <ref>.

A <heap> is a finite map from <ref> to <value>.

A <value> (other than a <ground-value>) is a closure:

<value> ::=
    <ground-value>
    λ <scope> ( <name> , ... ) <expr>

The <value> λS(x₁, ...) e is well-formed iff the domain of the <scope> S excludes the set of bindings {x₁, ...}.

Big-step reduction

Write (e, S, H⁻) → (r, H⁺) to mean the <expr> e in <scope> S reduces to the <ref> r while mutating the <heap> H⁻ into H⁺. The relation → is defined inductively by the following rules:

Ground: (x, S, H⁻) → (r, H⁺) if the ground language says so.

Variable: (x, S, H) → (S(x), H)

Function application: (e₀(e₁, ...), S, H⁻) → (r, H⁺) if:

n is the number of arguments
(e₀, S, H⁻) → (r₀, H₀)
For each argument i from 1 to n:
- (eᵢ, S, Hᵢ₋₁) → (rᵢ, Hᵢ)
Hₙ(r₀) = λ S₀ (x₁, ...) e
S = S₀ ∪ {x₁: r₁, ...}
(e, S, Hₙ) → (r, H⁺)

Function definition: (λc(x₁: u₁, ...) e, S, H⁻) → (r, H⁺) if:

r is not in H, i.e. we generate it using allocate.
H⁺ = H⁻ ∪ {r: λS'(x₁, ...) e}
S' is S restricted to the free names of λc(x₁: u₁, ...) e.

Note that the type and duty annotations are ignored. They may be erased from the program before executing it, to save time and memory.

Note that the heap, which is carefully threaded through the evaluation sequence, is not mutated by any rule except “Ground”. The rule “Function definition” extends the heap but does not touch anything that was already in it. If not for the rule “Ground”, we could replace every <ref> by its corresponding <value>, to obtain the familiar untyped lambda calculus.

For simplicity, this implementation leaks memory. Frequently, we end up with a <heap> that records the value of a <ref> that can never again be reached. Garbage collecting such heap items is harmless, and can be achieved using e.g. reference counting.

Subtypes

The duties induce a subtype relation. Since we are dealing with subtypes anyway, we further allow the ground language to contribute to the subtype relation. Subtyping of functions is defined in the usual way.

The formal definition follows:

A <ground-type> t⁻ is a subtype of another t⁺ if the ground language says it is.

A function type λc⁺(u₁⁺, ...) u⁻ is a subtype of another λc⁻(u₁⁻, ...) u⁺ iff:

c⁺ ⇐ c⁻
uᵢ⁺ is a supertype of uᵢ⁻ for each i
u⁻ is a subtype of u⁺

A <duty-type> c⁻d⁻t⁻ is a subtype of another c⁺d⁺t⁺ iff:

c⁻ ⇒ c⁺
d⁻ ⇒ d⁺
t⁻ is a subtype of t⁺

An <environment> Γ⁻ is a subtype of another Γ⁺ iff:

for all x such that Γ⁺(x) is defined:
- Γ⁻(x) is a subtype of Γ⁺(x)

Type checking

We define an <expr> e to be well-typed with <duty-type> u in <environment> Γ iff we can derive a “typing judgement” Γ ⊢ e : u using the following rules:

Subtype: Γ⁻ ⊢ e : u⁺ if:

Γ⁺ ⊢ e : u⁻
u⁻ is a subtype of u⁺
Γ⁻ is a subtype of Γ⁺

Ground: {} ⊢ e : u if the ground language says so.

Variable: {x: p} ⊢ x : p

Function application: Γ ⊢ e₀(e₁, ...) : u if:

For all i:
- Γ is a subtype of Γᵢ
- Γᵢ ⊢ eᵢ : uᵢ
u₀ = cdλc(u₁, ...) u
For all i < j, x such that Γᵢ(x) and Γⱼ(x) are both defined:
- Unless is_constant(Γᵢ(x)) and is_constant(Γⱼ(x)):
  - is_doomed(Γᵢ(x))
  - ¬is_doomed(uᵢ)

Function definition: Γ ⊢ λc(x₁: u₁, ...) e : dλc (u₁, ...) u if:

None of Γ(x₁), … is defined
Γ ∪ {x₁: u₁, …} ⊢ e : u
For all x such that Γ(x) is defined:
- c ⇒ is_constant(Γ(x))
- is_doomed(Γ(x)) ⇒ d