This document explores a programming discipline in which memory locations can be shared and can be mutable but cannot be both. I will abbreviate "shared NAND mutable" to "shnam".
Most programming languages that allow mutation do not enforce the shnam discipline ("are not shnam"). In Java, C and Python, for example, you can write a value to x.f
and read the same value back from y.f
, if x
and y
are names for the same object. Pure functional languages such as Haskell and a large subset of OCaml are trivially shnam; they forbid mutation completely. However, when I talk about shnam languages I'm more interested in those that allow mutation while still being shnam. Currently, the only language I'm aware of that achieves that is Rust.
The fundamental advantage of a shnam language is that if you write a value to y.f
and later read a value from y.f
, you can be sure you'll get back the same value, even if there is an intervening write to x.f
. This is not true in a non-shnam language, because of the possibility that x
and y
are names for the same object.
The fundamental advantage has real world benefits:
These are significant advantages. Thanks to these advantages, pure functional programming languages are moderately popular despite forbidding mutation. So why do more shnam languages not exist? I think basically because it's a new idea, and people are either unaware of the benefits or unaware of the techniques that can be used to make it work. It particular, Rust's borrow checker (which is the relevant new technology) is difficult to understand.
There are also downsides to a shnam language. There are things you are not allowed to do in such a language. The restrictions are not as severe as in a pure functional language, but they exist. In addition to obeying the rules, in a safe shnam language there is the burden of proving you have obeyed the rules, which, in Rust, involves writing some arcane static types. People who like to cut corners to get their work done quickly find that frustrating.
Much of the recent history of programming language research has in one way or another been striving to shift responsibility off humans and onto machines, so as to free up the humans' time and attention and allow them to write ever more complicated programs. Among the best inventions, we can retrospectively identify several which are poor-man's-shnam:
Each of these inventions may be seen as instances of the same pattern:
Each of these inventions then persists in our engineering culture because it has value as a special-case solution, in spite of the obvious cost in complexity of having so many special cases.
Imagine if we could solve the shnam problem once and for all, by writing a large fraction of our code in a shnam language, and by teaching our entire tool chain to take advantage of that, including compilers, linkers, virtual machines and real machines. Then a large fraction of the best inventions of the past half century would be obsolete. Moreover, no similar inventions would be needed in the future. Aged programming language researchers could be replaced by a generation who never needed to think about such things, and whose time and attention could be spent solving other problems.
I feel the advantages of shnam languages have barely been explored. In particular, the only non-trivial shnam language we have (Rust) is a high-level language designed for humans. However, it is not humans that have the most difficulty with non-shnam languages; it is machines.
The days of humans writing code that is executed directly are receding into the past. There are ever more layers of processing between the code humans write and the machines that execute it. These extra layers in the software stack have been enabled by massive improvements in the performance of computers (over 1,000,000 times the FLOPS, and 1,000,000 times the memory, in about fifty years).
In many cases, the extra layers of software consume the improvements in performance completely, resulting in programs that are no more responsive than fifty years ago. Of course, some of that lost performance is explained by new features and eye candy, but even so, imagine how inefficient that code must be! Yes, I intend to claim that shnam languages could recover much of that wasted compute power.
For a program written in a non-shnam language, each layer of software micro-manages the layer beneath it, instructing every memory access, while adding more memory accesses of its own. And the layer underneath has no choice but to obey blindly, because these accesses are to memory that is both shared and mutable, and no optmimisations are possible. A single instruction in a high-level language can balloon into hundreds or thousands of instructions executed on the CPU.
A program written in a shnam language better communicates the intent of the program. When one layer of software stores a value in memory and later loads a value from the same location, the software layer beneath it can understand that it is the same value. And if that value is used to compute the address of a memory location, it can in turn reveal another value that is stored and then loaded from that location, and so on. Revealing all these dataflow paths enables other kinds of optimisation too. And this reduction in work is compounded at each layer of the software stack, resulting in a large overall reduction.
I concede that I have not explained how to engineer the tool chain to take advantage of shnam languages. My point is not that using shnam languages at multiple levels is sufficient to recover the lost performance, but that it is necessary. There are problems that just can't be solved in any other way.
Nor do I mean to focus exclusively on CPU performance. Diagnostics, documentation, safety and security could all be improved by expressing programs in shnam languages at multiple levels of the software stack.
Rust is the holotype of safe shnam languages, and it is likely that other safe shnam languages will share some features with Rust. Which features? Which aspects of Rust are necessarily preserved in another safe shnam language, and which can differ?
A necessary feature of a safe shnam language is that a program provably obeys the shnam discipline. The proof probably has to consist of some sort of type checking. However, it doesn't have to be static type checking; dynamically checking that mutated objects are not shared is good enough. And if the type checking is done statically, it still doesn't have to be done the same way as in Rust.
There is a large design space that has hardly been explored. See my Lightly-soiled Lambda Calculus for one idea.
The "restrict" keyword in C (which is poorly designed and not widely understood or used) may be viewed as an unsafe shnam language feature. If you declare function parameters with "restrict", the compiler can in principle optimise your code better. However, the better-optimised code only works if you call it correctly; if the callee expects two pointers to different data but the caller passes two pointers to the same data, the behaviour is undefined.
Other unsafe shnam languages probably have a similar flavour. A program in such a language somehow associates read and write permissions with each pointer, such that memory writable via one pointer is not readable via any other. The program includes hints which specify when the permissions change. The compiler optimises the code on the (unverified) basis that all information provided by the program is true. If it is not, the behaviour is undefined.
Again, there is a large design space that has hardly been explored.