Deeply Nested

Share this post

Type Systems and Alignment

nested.substack.com

Discover more from Deeply Nested

On the craft and industry of Software Engineering
Continue reading
Sign in

Type Systems and Alignment

Jeff Schwab
Jul 26, 2021
1
Share this post

Type Systems and Alignment

nested.substack.com
Share

Computer programming languages use type systems for two distinct purposes:

  1. Defining the physical representation of values in a computer

  2. Helping human programmers think abstractly, without regard to that representation

In a perfect world, each type would serve both purposes equally well, in a harmonious union of form and function. For example, a value of the Rust type i32 may be viewed as both:

  1. A sequence of exactly thirty-two bits, in two's complement

  2. An integer between -2,147,483,648 and 2,147,483,647, inclusive

Sadly, the purposes are sometimes at odds. A classic example is the ordering of an aggregate type's member variables. To serve the first purpose—specifying physical representation—the language should lay out data in exactly the order specified by the programmer. To serve the second, though, the language should allow the programmer to declare variables in whatever order makes logical sense, without regard to the physical implementation of the type.

The C programming language heavily favors the first purpose. For example, consider a C struct having two 16-bit integers, interleaved with two 32-bit integers:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
struct CStruct {
int16_t a; // 2 bytes
int32_t b; // 4 bytes
int16_t c; // 2 bytes
int32_t d; // 4 bytes
};
int main() {
puts("C struct:\n");
printf(" offset of a: %2ld bytes\n", offsetof(struct CStruct, a));
printf(" offset of b: %2ld bytes\n", offsetof(struct CStruct, b));
printf(" offset of c: %2ld bytes\n", offsetof(struct CStruct, c));
printf(" offset of d: %2ld bytes\n", offsetof(struct CStruct, d));
printf(" total size: %2ld bytes\n", sizeof(struct CStruct));
return 0;
}
view raw main.c hosted with ❤ by GitHub

You might think the total size of the struct would be the sum of the sizes of the component values: 2 + 4 + 2 + 4 = 12 bytes. However, each two-byte field must be followed by two more bytes of “padding” to meet the alignment requirements of the subsequent four-byte fields. (“Alignment requirements” mean that some objects can only live at certain addresses in the computer’s memory.) The struct thus has four bytes of wasted space, making its total size 16 bytes:

$ cc -std=c11 -pedantic -Wall -Wextra main.c && ./a.out
C struct:
offset of a: 0 bytes
offset of b: 4 bytes
offset of c: 8 bytes
offset of d: 12 bytes
total size: 16 bytes
view raw nested-type-alignment-c-run.sh hosted with ❤ by GitHub

The compiler could avoid the wasted space by ordering the fields differently, but it won’t, because C makes a point of doing exactly what you tell it, for better or worse. C (and C++) programmers must thus either accept suboptimal run-time resource requirements, or else explicitly order field declarations to avoid padding (usually in order of decreasing size), regardless of whether that order makes any logical sense to someone reading the code.

An arguably better approach is to let programmers declare fields in whatever order makes the code clearest, then let the compiler reorder them for maximal run-time efficiency. This is Rust’s (default) approach:

struct RustStruct {
a: u16, // 2 bytes
b: u32, // 4 bytes
c: u16, // 2 bytes
d: u32, // 4 bytes
}
fn address<T>(r: &T) -> isize {
r as *const _ as isize
}
fn main() {
let object = RustStruct { a: 0, b: 0, c: 0, d: 0 };
let offset = |p| p - address(&object);
println!("Rust struct:\n");
println!(" offset of a: {:2} bytes", offset(address(&object.a)));
println!(" offset of b: {:2} bytes", offset(address(&object.b)));
println!(" offset of c: {:2} bytes", offset(address(&object.c)));
println!(" offset of d: {:2} bytes", offset(address(&object.d)));
println!(" total size: {:2} bytes", std::mem::size_of::<RustStruct>());
}
view raw main.rs hosted with ❤ by GitHub

Even though the source code lists the fields in the logical order a, b, c, d, the Rust compiler produces machine code that orders them b, d, a, c, obviating the wasted padding and minimizing the structure’s size:

$ cargo run
Rust struct:
offset of a: 8 bytes
offset of b: 0 bytes
offset of c: 10 bytes
offset of d: 4 bytes
total size: 12 bytes
view raw nested-type-alignment-rs-run.sh hosted with ❤ by GitHub

In a systems programming language, there should also be a way for programmers to force a specific layout, even if it’s suboptimal. Rust lets you do that by specifying ‘repr(C),’ meaning the type’s physical representation should be C-compatible:

#[repr(C)]
struct CRepr {
a: u16, // 2 bytes
b: u32, // 4 bytes
c: u16, // 2 bytes
d: u32, // 4 bytes
}
fn address<T>(r: &T) -> isize {
r as *const _ as isize
}
fn main() {
let object = CRepr { a: 0, b: 0, c: 0, d: 0 };
let offset = |p| p - address(&object);
println!("Rust struct with C-like layout:\n");
println!(" offset of a: {:2} bytes", offset(address(&object.a)));
println!(" offset of b: {:2} bytes", offset(address(&object.b)));
println!(" offset of c: {:2} bytes", offset(address(&object.c)));
println!(" offset of d: {:2} bytes", offset(address(&object.d)));
println!(" total size: {:2} bytes", std::mem::size_of::<CRepr>());
}
view raw main.rs hosted with ❤ by GitHub

Of course, C compatibility comes with a size (and potentially performance) penalty:

$ cargo run
Rust struct with C-like layout:
offset of a: 0 bytes
offset of b: 4 bytes
offset of c: 8 bytes
offset of d: 12 bytes
total size: 16 bytes
view raw nested-type-alignment-rs-crepr-run.sh hosted with ❤ by GitHub

The moral of the story here is not that language X is better than language Y, but that type systems must juggle the demands of wildly different use cases. Being aware of a particular type system’s faults and virtue can be key to using it effectively.

1
Share this post

Type Systems and Alignment

nested.substack.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Jeff Schwab
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing