Meta Memory

Let’s take a tree structure

struct Expr {
	Int(u32),
	Add { lhs: Box<Expr>, rhs: Box<Expr> },
}

and flatten it into a Vec<T>, or contiguous region of memory.

struct Arena(Vec<Expr>);

struct Expr {
	Int(u32),
	Add { lhs: usize, rhs: usize },
}

We can compare this representation to a traditional memory model:

memorymeta-memory
address spacearena
pointerusize
malloc()Vec::push
free()Vec::clear
dereferenceindexing
segfaultout-of-bounds panic

This simple abstraction over memory has some desirable performance and safety characteristics while also being more flexible than pointers, malloc() and free().

Whereas typically dereferencing an invalid pointer would give us a segfault, with an arena bounds-checked indexing causes a runtime panic instead.

Rather than being stuck with 64 bits per pointer, we can now decide how many bits to use for an index depending on how many objects we expect to be allocated. On modern CPU architectures the bottleneck for performance tends to be memory accesses [citation needed], so anything we can do to fit more objects into the CPU cache is important.

Arenas also gain us the ability to deallocate all objects at once by calling .clear() on the vector, which just sets the vector’s len field to 0. (Note that if the element type of the vector had a destructor, Vec::clear would end up calling each element’s destructor.)

We can detect use-after-frees through a generation count. The arena begins at generation number zero, and includes it in the indexes it hands out.

struct Arena<T> {
	data: Vec<T>,
	generation: u32,
}

struct Idx {
	idx: u32, // u32 instead of usize to save memory
	generation: u32,
}

impl<T> Arena<T> {
	fn alloc(&mut self, value: T) -> Idx {
		let idx = self.data.len() as u32;
		self.data.push(value);
		Idx { idx, generation: self.generation }
	}
}

When the arena is cleared, the generation is incremented. Indexing into the arena ensures that the index’s generation and the arena’s generation match.

impl<T> Arena<T> {
	// snip

	fn clear(&mut self) {
		self.generation += 1;
		self.data.clear();
	}

	fn get(&self, idx: Idx) -> &T {
		assert_eq!(idx.generation, self.generation);
		self.data[idx.idx]
	}
}

This way, trying to access an object that has been allocated on the arena after it’s been freed will panic. You could use #[cfg(debug_assertions)] to make these changes apply only in debug builds for extra performance in release builds.

It’s important to note that every one of the arena analogues for typical memory access and management constructs is actually translated to the typical one underneath. For instance, indexes are turned into pointers during indexing, Vec::push can internally call malloc() and a Vec<T> really is just a chunk of memory.

Thanks to RDambrosio016 and lovelymono on the Rust community Discord for reviewing this post. :)

Luna Razzaghipour
8 August 2022