To Share, or to Mutate?

Apr 23, 2021

You can share data, or you can mutate data, but you cannot safely do both. Much has already been written on the topic.

A two-by-two matrix of whether you're sharing vs. whether you're mutating. Three quadrants are happy, but one is very sad.

The definitions here are broad. Sharing may mean two functions or threads accessing the same variable, or two people looking at the same Git branch, perhaps so one can review the other’s work. Mutation may mean assigning a new value to the variable, or rebasing the branch. The consequences are similarly broad, but they usually boil down to data loss or corruption.

In computer programming, there are two well-known ways to stay out of shared mutable hell: Stop sharing, or stop mutating. The first option tends inevitably toward Object Oriented Programming (OOP), and gives rise to keywords like “private” and “synchronized.” Orthogonally, if you stop mutating, your code automatically tends toward Functional Programming (FP). It uses keywords like “const” and “final,” and language features like recursion.

OOP was once considered best practice, but a funny thing happened when small computers became cheap and powerful. Instead of running on a few big machines, companies like Google chose to run on lots of little ones, all networked together. That shift had a weird corollary for computer programmers. “Not sharing” was no longer an option, because all those little machines had to share data with each other. The only other apparent option was to stop mutating. Suddenly, Functional Programming was all the rage.

Recently, two novel approaches have gained popularity. The first is to write code that is both object-oriented and functional, making data both private and immutable. The Scala programming language was created largely as existence proof that this can be done at all. In practice, it works brilliantly, especially if you’re working with truly huge amounts of data. Twitter (a social media company that still has over 150,000,000 daily active users) and Spark (a popular distributed processing framework) are both written mostly in Scala. You can write OOP or FP in Scala, but it works best when you combine them.

The other, less obvious approach, is to make a compiler so smart that it knows which variables are shared or mutated, and lets them be both, only not at the same time. This is the magic trick that makes Rust fundamentally different from other programming languages. It proves code is safe (in a particular sense) without sacrificing performance or direct access to hardware—the traditional advantages of C++, without the pitfalls. We can share or mutate data at will, and let the Rust compiler formally prove that our code is free of a whole menagerie of bugs, like certain kinds of race conditions.

In an ideal world, the Rust model would spread to other, complementary languages that don’t have Rust’s high learning curve or slow compile times. Possibly, Rust will influence new languages that come to prominence in the coming decade. Rust is nearly ideal for systems programming, but it will never really compete with Python, R, SQL, or other REPL languages for interactive data exploration or rapid prototyping. The most popular (Java-based) implementation of Scala does have a REPL, but suffers from slow startup, complex package management, and other nastiness endemic to the JVM as a platform.

Of course, there is yet another possibility: Somebody might think up a solution even better than OOP, FP, or the multiparadigm approaches of Scala and Rust. If you’ve got some ideas, do share them in the comments.

Deeply Nested

Discussion about this post