Safety

Jul 29, 2022

Click the code images below to zoom, or click their captions for plain text.

Memory safety is a hot topic these days. (For example, it’s one of Carbon’s primary goals.) While everyone seems to agree that memory safety is really important, what actually constitutes safety remains subjective. This week, we’ll peek at semantically equivalent Rust, C++, and Go code to see how their approaches to safety differ.

First, consider a simple append() function that concatenates two lists of integers:

pub fn append(mut items: Vec<i32>, suffix: &[i32]) -> Vec<i32> { items.extend_from_slice(suffix); items } fn test_append() { assert_eq!(vec![1, 2, 3, 4], append(vec![1, 2], &[3, 4])); } — Rust append

std::vector<int> append( std::vector<int>&& items, std::vector<int> const& suffix) { items.insert(items.end(), suffix.begin(), suffix.end()); return items; } void test_append() { assert((std::vector<int>{1, 2, 3, 4} == append({1, 2}, {3, 4}))); } — C++ append

func Append(items, suffix []int) []int { return append(items, suffix...) } func TestAppend(t *testing.T) { want := []int{1, 2, 3, 4} if got := Append([]int{1, 2}, []int{3, 4}); !reflect.DeepEqual(got, want) { t.Errorf("Append([]int{1, 2}, []int{3, 4}) = %v; want %v", got, want) } } — Go Append

This is pretty straightforward, and doesn’t pose a challenge for any of our three languages. Things get more interesting if we define a function that takes only the suffix parameter, but returns a brand new function (a closure) that can append the suffix to any given list of items:

pub fn make_appender(suffix: &[i32]) -> impl Fn(Vec<i32>) -> Vec<i32> + '_ { |items| append(items, suffix) } fn test_make_appender() { let append34 = make_appender(&[3, 4]); assert_eq!(vec![1, 2, 3, 4], append34(vec![1, 2])); } — Rust make_appender

auto make_appender(std::vector<int> const& suffix) { return [&](std::vector<int>&& items) { return append(move(items), suffix); }; } void test_make_appender() { std::vector<int> suffix{3, 4}; auto append34 = make_appender(suffix); assert((std::vector<int>{1, 2, 3, 4} == append34({1, 2}))); } — C++ make_appender

func MakeAppender(suffix []int) func([]int) []int { return func(items []int) []int { return Append(items, suffix) } } func TestMakeAppender(t *testing.T) { append34 := MakeAppender([]int{3, 4}) want := []int{1, 2, 3, 4} if got := append34([]int{1, 2}); !reflect.DeepEqual(got, want) { t.Errorf("append34([]int{1, 2}) = %v; want %v", got, want) } } — Go MakeAppender

The Rust make_appender’s return type includes the lifetime '_, expressing an important property of the function. It means, basically: “You passed me a pointer to something. The value I return is only good as long as that object is still alive.” It’s like an expiration date for in-memory objects: If the original argument is destroyed, then the function’s return value expires, and you can’t use it anymore. For example, let’s try calling the Rust closure after the original suffix object goes out of scope:

fn test_make_appender_dangle() { let append34 = { let suffix = vec![3, 4]; make_appender(&suffix) // Won't compile. }; assert_eq!(vec![1, 2, 3, 4], append34(vec![1, 2])); } — Rust test_make_appender_dangle

$ cargo test test_make_appender_dangle Compiling capture v0.1.0 (/home/jeff/git/nested/safety) error[E0597]: `suffix` does not live long enough --> src/lib.rs:44:27 | 42 | let append34 = { | -------- borrow later stored here 43 | let suffix = vec![3, 4]; 44 | make_appender(&suffix) // Won't compile. | ^^^^^^^ borrowed value does not live long enough 45 | }; | - `suffix` dropped here while still borrowed For more information about this error, try `rustc --explain E0597`. error: could not compile `capture` due to previous error — Rust refuses to compile, rather than let a pointer dangle.

Rust won’t let pointers dangle as C++ would, nor will it have a Garbage Collector (GC) keep the original object’s memory on life support. (It’s common for GCs to keep memory alive even after you destroy/close/dispose an object, even though the object is useless, just so extant pointers to it don’t dangle.)

C++ does not handle this situation well:

void test_make_appender_dangle() { auto append34 = make_appender({3, 4}); assert((std::vector<int>{1, 2, 3, 4} == append34({1, 2}))); // FAIL: UB } — C++ test_make_appender_dangle

$ make test c++ -std=c++20 -pedantic -Wall -Wextra src/safety.cpp -o target/cpp/safety target/cpp/safety safety: src/safety.cpp:53: void test_make_appender_dangle(): Assertion `(std::vector<int>{1, 2, 3, 4} == append34({1, 2}))' failed. make: *** [Makefile:12: test_safety] Aborted — C++ has undefined behavior.

The suffix object {3, 4} in our C++ test is implicitly destroyed before the assertion even executes. Why doesn’t it survive to the end of the function? Just a weirdness of C++. Once the suffix object is destroyed, the closure (append34) is broken. It has a pointer to the memory where the suffix object used to live, but the object doesn’t live there anymore. The compiler doesn’t catch this, even with the warnings cranked up. Insead, we get undefined behavior (UB) at run time. We’re lucky the test happened to fail; sometimes, UB manages to corrupt memory (making a program do bad things) without failing any tests.

To make C++ do the right thing, we have to move the suffix into the closure:

auto make_appender_move(std::vector<int>&& suffix) { return [suffix = move(suffix)](std::vector<int>&& items) { return append(move(items), suffix); }; } void test_make_appender_move() { auto append34 = make_appender_move({3, 4}); assert((std::vector<int>{1, 2, 3, 4} == append34({1, 2}))); // OK } — C++ make_appender_move

Go fares much better. Not only does it compile, it actually works as intended:

func TestMakeAppenderDangle(t *testing.T) { append34 := func() func([]int) []int { suffix := []int{3, 4} return MakeAppender(suffix) }() want := []int{1, 2, 3, 4} if got := append34([]int{1, 2}); !reflect.DeepEqual(got, want) { t.Errorf("append34([]int{1, 2}) = %v; want %v", got, want) } } — Go TestMakeAppenderDangle

Preventing dangling pointers and undefined behavior is what most people mean when they talk about memory safety, but there’s more to the story. As you may have heard, shared mutable state is evil. Rust’s major innovation is to guarantee that mutable state is never shared, and shared state is never mutated.

For example, suppose that after calling make_appender, we mutate the suffix object. Should subsequent calls to the returned closure use the original value, or the new one?

fn test_make_appender_mutate() { let mut suffix = [3, 4]; let append34 = make_appender(&suffix); assert_eq!(vec![1, 2, 3, 4], append34(vec![1, 2])); suffix[0] = 5; // Won't compile. assert_eq!(vec![1, 2, 3, 4], append34(vec![1, 2])); } — Rust test_make_appender_mutate

$ cargo test Compiling capture v0.1.0 (/home/jeff/git/nested/safety) error[E0506]: cannot assign to `suffix[_]` because it is borrowed --> src/lib.rs:55:9 | 53 | let append34 = make_appender(&suffix); | ------- borrow of `suffix[_]` occurs here 54 | assert_eq!(vec![1, 2, 3, 4], append34(vec![1, 2])); 55 | suffix[0] = 5; // Won't compile. | ^^^^^^^^^^^^^ assignment to borrowed `suffix[_]` occurs here 56 | assert_eq!(vec![1, 2, 3, 4], append34(vec![1, 2])); | -------- borrow later used here For more information about this error, try `rustc --explain E0506`. — Rust won’t compile ambiguous code that tries to mutate shared state.

Rust does the only sane thing it can in this situation: It refuses to compile the code. Guess what C++ does?

void test_make_appender_mutate() { std::vector<int> suffix{3, 4}; auto append34 = make_appender_move(move(suffix)); assert((std::vector<int>{1, 2, 3, 4} == append34({1, 2}))); // OK suffix[0] = 5; // Undefined behavior, because suffix was moved assert((std::vector<int>{1, 2, 3, 4} == append34({1, 2}))); // Maybe! } — C++ test_make_appender_mutate

$ make test c++ -std=c++20 -pedantic -Wall -Wextra src/safety.cpp -o target/cpp/safety target/cpp/safety make: *** [Makefile:12: test_safety] Segmentation fault — C++ undefined behavior can cause segfaults.

You guessed it: undefined behavior! Again, the C++ compiler is of no help. Instead of a failed test, this time we managed a segfault.

Go compiles and runs, and lets the mutation proceed, such that modifying the suffix object becomes a backdoor way to change the behavior of the closure:

func TestMakeAppenderMutate(t *testing.T) { t.Skip() suffix := []int{3, 4} append34 := MakeAppender(suffix) want := []int{1, 2, 3, 4} check := func() { if got := append34([]int{1, 2}); !reflect.DeepEqual(got, want) { t.Errorf("append34([]int{1, 2}) = %v; want %v", got, want) } } check() // OK suffix[0] = 5 // Backdoor mutation of shared state check() // FAIL } — Go silently allows a subtle gotcha.

We could debate whether Go’s behavior makes sense, but allowing this kind of spooky action at a distance opens the door to a host of heinous bugs. This problem isn’t specific to Go in particular; in fact, Go is a great example of a modern garbage-collected language. Ownership is rarely clear in GC languages, because really, everything is sort of part-owned by the garbage collector. You may think your object owns its constituent parts, but the GC has a lien against them. GC languages could, in principle, have move semantics and other niceties; but in practice, they do not.

"All the things" meme: ALL THE OBJECTS... ARE PARTIALLY OWNED BY THE GARBAGE COLLECTOR

We’ve covered a lot of ground in this post, but there’s a great deal more to discuss, such as how object lifetimes, shared state, and mutation interact with concurrency and parallellism. Please leave a comment if you feel we should (or should not!) dig deeper into this topic, or if Rust, C++, and Go aren’t the languages you’d most like to see in future posts.

Dan

Aug 1, 2022

"This post presents a lot of code as screenshots."

Does substack not provide a way to include fixed with code blocks at all? If the only limitation is not supporting syntax highlighting I'd probably give it up to be more accessible.

Expand full comment

1 reply by Jeff Schwab