Notes
2026/02/28

Flat tuples and arrays without pointer soup

Here's an idea I've been talking about with Joseph: Let's say we want to avoid the pointer soup of Lisp-like linked lists as the fundamental building block of a dynamically typed functional programming language. Instead, we want to keep data structures on the stack without requiring manual memory management or sacrificing memory safety. What are our options?

Flat bytecode tuples

In languages with static types, we can store data in structs and store these structs on the stack. So let's pretend that we have a struct-like thing in our language and let's call it a tuple (to make clear that it doesn't have a static type like structs). If, for example, we want to manipulate a point (3, 4) of two ints on the stack, we can first push the two ints onto the stack, then store the two topmost elements on the stack in a tuple:

PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE(2)

The above bytecode tells us what we want to happen (create a tuple on the stack with two ints in it), but not how to represent this data on the stack. Let's keep it simple and use the same representation as our value representation. In other words, the runtime value on the stack is just a tiny enum variant MAKE_TUPLE that tracks how many elements preceding it on the stack conceptually belong to this tuple. It acts like a bracket for the preceding elements on the stack:

3 |
4 |
__|

The two tuple elements in the above example have the same size, but that doesn't have to be the case. Tuples can be nested and can contain other tuples. Let's store our previous tuple and two other ints (that were previously pushed onto the stack) in another tuple:

PUSH_INT(1)
PUSH_INT(2)
PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE(2)
MAKE_TUPLE(3)

Conceptually:

1   |
2   |
3 | |
4 | |
__| |
____|

Using such a runtime representation has a big disadvantage: We don't know the exact size of the tuple (on the stack) without looking at its elements, because we only know how many elements the tuple stores, not how big each element is. We need to traverse all elements to know how big a tuple is, which makes accessing the stack quite inefficient. So let's change our representation and store both the number of elements and the sum total of all element sizes:

PUSH_INT(1)
PUSH_INT(2)
PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE { elems: 2, total_size: 2 }
MAKE_TUPLE { elems: 3, total_size: 5 }

If we're interested in the first element (at the top of the stack) of our 3-element tuple, we can now skip the entire sub-tuple without having to traverse it directly.

Flat bytecode arrays

However, we still don't have a way to access individual elements in O(1) time, because elements can still have different sizes. What if we want efficient array-like access? That's easy if we just require each element to be padded so that all elements have the same size. Similarly to tuples, we want to be able to compute the size of the entire array without having to walk the stack, so let's store the size per element in the array:

PUSH_INT(1)
PUSH_INT(2)
PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE { elems: 2, total_size: 2 }
MAKE_ARRAY { elems: 3, elem_size: 3 }

Conceptually:

    |
    |
1   +
    |
    |
2   +
3 | |
4 | |
__| +
____|

Optional boxing

What if we want to store a large tuple in an array that otherwise holds only smaller elements? Padding every element would be wasteful. Other languages solve that by boxing each array element, so that the size of an element is just the size of a pointer. Let's do the same thing, by introducing boxing as an optional concept:

PUSH_INT(1)
PUSH_INT(2)
PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE { elems: 2, total_size: 2 }
BOX
MAKE_ARRAY { elems: 3, elem_size: 1 }

Conceptually, now with separate stack and heap:

HEAP:
3 |
4 |
__| // <- tuple stored at heap location `x`

STACK:
1  +
2  +
&x +
___|

How do we guarantee memory safety? The usual candidates are an option: We could use GC, or just do reference counting if we want to start with something simple. Crucially, most of our values (even nested data structures) can still live on the stack and boxing becomes an explicit feature instead of being the implicit default.

Stack copies and references

But what if we want to store the same tuple more than once, for example the same point three times? So far we don't have a way to copy data and use it multiple times. We can simply copy atomic values like ints, but what about large tuples? In a language that boxes all values, we'd just have to copy a pointer (potentially incrementing its ref count), but our tuples/arrays live on the stack and can be large, so we want to avoid copying. So instead of copying the whole collection, let's just create a reference that points to the location of the data on the stack:

PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE { elems: 2, total_size: 2 }
COPY
COPY
MAKE_ARRAY { elems: 3, elem_size: 3 }

Conceptually:

      3 | |
      4 | |
+-+-> __| +
| |       |
| |       |
| +---@   +
|         |
|         |
+-----@   +
      ____|

Notice how we have copied the (3, 4) tuple twice (so that the outer array directly contains it and two copy), but we could also copy it three times so that the array ends up more compact (whether this makes sense is for the compiler to decide):

PUSH_INT(3)
PUSH_INT(4)
MAKE_TUPLE { elems: 2, total_size: 2 }
COPY
COPY
COPY
MAKE_ARRAY { elems: 3, elem_size: 1 }

Conceptually:

        3 |
        4 |
+-+-+-> __|
| | +---@   +
| +-----@   +
+-------@   +
        ____|

Returning from functions

Function arguments create copies in the same way: Atomic values are copied directly, pointers to boxed values are copied by increasing their ref count, tuples/arrays on the stack are copied by creating stack references. But this raises an important question: What happens when a reference to a value on the stack outlives the function call frame it was declared in, by being returned (either directly or as part of a data structure) from the function that created it?

Each function return effectively creates a threatened area of values that will be deallocated from the stack, but which could be referenced by the return value of the function (since stack references can only point to older values on the stack, the values that existed before the current function was called cannot contain references to the function's call frame).

Going back to our last example, let's assume that both the tuple (3, 4) and the array containing the tuple's three copies were created in the current function. We cannot directly return the array with the three copies of (3, 4) from the function, because the tuple is in the threatened area:

        3 |     |
        4 |     |-- threatened area
+-+-+-> __|     |
| | +---@   +      |
| +-----@   +      |-- return value
+-------@   +      |
        ____|

However, we can scan the return value for the minimum and maximum stack reference into the threatened area, which gives us the smallest slice that we need to keep alive to preserve all stack references. In our exampe above, there's only a single stack reference, so the slice is a single element. We then need to take that slice and move it onto the heap, while changing all the affected stack references to point to a location on the heap (conceptually to a heap-allocated stack fragment on the heap + offset into this fragment).

While this sounds simple, there are a few subtleties and potential knobs to tweak:

If a function returns a data structure that references both values that were created very early and very late on the function's stack, the stack fragment that needs to be moved onto the heap can include the entire call frame. A smarter collection algorithm (e.g. compacting the stack fragment before moving it) might make sense in such a situation. There are a lot of similarities with garbage collection strategies here.
How should boxed values be treated? If stack references can be stored in boxed values, checking the return value of a function might require recursively walking boxed data structures, which could be expensive. An alternative might be to let boxed data structure trigger the move from stack to heap for all of its affected stack references, but this would potentially copy also data that never outlives the enclosing function. Recursively walking boxed data structure might thus be the better option.
Determining the min/max stack references of the threatened area requires scanning the return value, but then the return value's stack references also need to be updated so that they point at the newly allocated stack fragment on the heap, which requires walking the return value again and fixing all the pointers. Is there a better way to do this?

Cleaning up stack references in such a way is effectively a weird mix between Rust's ownership model and standard garbage collection: Just like in Rust, references are allowed as long as the borrowed data is alive. But in contrast to Rust, this guarantee is enforced dynamically, which requires GC-like clean up operations for the threatened part of the stack.

Lastly, all of the above sounds pretty similar to what happens for closures, because both cases are about keeping referenced data alive longer than the call frame the data was created in. Is the above idea just an extension of closures to all data?

I'm not aware of languages with this memory management model, so I'm hoping to build a small prototype to see how such a language performs in practice.