First-Class Types, Syntactically

Zig has popularized the idea of treating types as first-class values that can be passed around and manipulated like any other. Generic data structures are just functions that take a type and return a type. This approach requires compile-time code execution, or “comptime”.

Personally I’m not a fan of comptime because it creates an entire separate “mode of execution” for code. You can’t use your usual debugger to debug comptime code, you can’t use your usual profiler to profile comptime code, and so on. Moreover, the process of compilation becomes less predictable, making real-time compiler-based tooling (like editor integrations) more difficult to implement. How are you supposed to provide an autocomplete list of a type’s methods when those methods could be generated by arbitrary comptime code that might need to be re-executed on every keystroke?

I recently had a revelation when I realized, after studying the Odin programming language for some time, that you can get part of the way to first-class types without compile-time code execution. Now, why would you want first-class types in the first place? Take this C code:

typedef struct Arena Arena;
Arena *arena_alloc(void);
void *arena_push(Arena *arena, size_t size, size_t align);

void
demo(void)
{
	Arena *arena = arena_alloc();
	size_t count = 10;
	int *stuff = arena_push(arena, count * sizeof(int), alignof(int));
	// do stuff with `stuff`
}

Those sizeof and alignof calls would really start grating on me if we had a codebase full of calls to arena_push. A common solution to this is to define a macro:

#define push_array(arena, T, count) ((T *)arena_push((arena), (count) * sizeof(T), alignof(T)))

void
demo(void)
{
	Arena *arena = arena_alloc();
	int *stuff = push_struct(arena, int, 10);
}

Note how the macro casts arena_push’s return value from void * to T * to guard against mistakes like int *xs = push_struct(arena, short, 5);. We now have something similar to generic functions in languages like C++ and Rust, but the type parameter is intermingled with the value parameters. Odin provides support for this kind of construct directly in the language:

Arena :: struct {}
arena_alloc :: proc() -> ^Arena {}
arena_push :: proc(arena: ^Arena, size, align: int) -> rawptr {}

// The $ indicates T is a compile-time parameter.
push_array :: proc(arena: ^Arena, $T: typeid, count: int) -> []T {
	ptr := cast([^]T) arena_push(arena, count * size_of(T), align_of(T))
	return ptr[:count]
}

demo :: proc() {
	arena := arena_alloc()
	stuff := push_array(arena, int, 10)
}

This is possible only because the language doesn’t distinguish between types and values on a syntactic level. The alternative is to syntactically separate type parameters from value parameters:

// This syntax is hypothetical.

push_array :: proc<T>(arena: ^Arena, count: int) -> []T {
	ptr := cast([^]T) arena_push(arena, count * size_of(T), align_of(T))
	return ptr[:count]
}

demo :: proc() {
	arena := arena_alloc()
	stuff := push_array<int>(arena, 10)
}

In my opinion this code is harder to read because it doesn’t let the programmer choose what order the parameters should go in, and is more cluttered to boot. Moreover, unifying the syntaxes of type expressions and value expressions doesn’t make a language any more complex, while adding separate “sections” for function type parameters and value parameters increases language complexity.

There’s another, less obvious benefit from unifying the type and value expression syntaxes: you can change the meaning of operators depending on whether they’re being applied to values or types. The most obvious example is unary *: when it’s applied to a value (*value) it dereferences the value, while when it’s applied to a type (*T) it creates a pointer to that type. A different, more interesting example is the call operator, “(”. Usually, the left-hand side is a function while the right-hand side is a comma-separated list of arguments. What if we made the call operator perform casting when the left-hand side is a type?

// In C syntax:
(int)ptr
(void *)num

// With call-cast:
int(ptr)
(*void)(num)

Go uses this syntax; I really like it!

Luna Razzaghipour
27 August 2024