Notes
2026/03/07
(Related to flat tuples and arrays, explicit boxing.)
Here's a problem with keeping data flat on the stack: when a function returns, any data it created gets thrown away. But the return value might reference some of that data via references to the stack. We need to keep the referenced data alive somehow.
The obvious answer is to move it to the heap. But here's another option: compact the referenced data in place, throw away the unreferenced garbage, and put the surviving data in front of the return value as invisible “ghost data”.
Here's the main idea: stack references are only produced for tuples and arrays (all other values are atomic), and they can only appear inside tuples and arrays. If the return value itself is a reference, we can just treat it as returning the location it points to. So we're always dealing with structured data on both sides: the return value contains references, and they point to tuples/arrays in the threatened area.
Instead of moving the threatened data onto the heap, we can keep the referenced data on the stack, right before the return value, as “ghost data”. The return value declares a larger size than it needs, wrapping around whatever ghost data it carries:
3 | | 3 | |
4 | | 4 | |
__| |-- threat. __| |
| | |-- return val. (enlarged)
| => | |
@ | @ | |
|-- return val. | |
| | |
@ | @ | |
____| ____|_____|
The left side is the situation before the return: the return value (an array of two references) points into the threatened area. The right side is what we want: the threatened data has been compacted and put in front of the return value, and the whole thing is wrapped in a tuple that declares a total_size covering both the ghost data and the original return value.
This is straightforward for tuples, which already declare a total_size. Arrays need to be wrapped in a 1-element tuple whose total_size is larger than the single element, to accommodate the ghost data.
In some cases the entire threatened area is still referenced by the return value, so nothing needs to move at all, we just inflate the return value's size to wrap everything. But in most cases the threatened area contains data that isn't referenced by the return value. We need to compact and keep only the reachable parts and throw the rest away.
We can do this in four linear passes over the threatened area and return value, using the total_size / elem_size fields of tuples/arrays as scratch space.
Here's a concrete example. A function creates three tuples and returns references to the first and third. Tuple B is garbage:
0: 1 |
1: 2 |
2: Tuple { 2, 2 } | A
3: 5 |
4: 6 |
5: Tuple { 2, 2 } | B (garbage)
6: 3 |
7: 4 |
8: Tuple { 2, 2 } | C
---- threatened area ----
9: @2 |
10: @8 |
11: Tuple { 2, 2 } | return value
Pass 1: Mark (top to bottom). We walk from the top of the stack (end of the return value) down to the start of the threatened area. Whenever we encounter a reference pointing into the threatened area, we know its target is a tuple or array. We set the high bit of the target's total_size / elem_size (the “mark bit”), preserving the original size in the lower bits.
Slot 10 is @8, pointing to tuple C. Mark it. Slot 9 is @2, pointing to tuple A. Mark it.
Marking is transitive in two ways. First, through references: if a marked tuple's elements contain a reference to another tuple in the threatened area, that target must also be marked. Second, structurally: if a marked tuple contains a child tuple as an inline element (not via a reference, but physically within its extent), that child must also be marked, otherwise later passes would mistake it for garbage. To handle structural transitivity without allocation, we maintain an inside_floor variable: the lowest stack position that falls within any marked tuple's extent. When we mark a tuple at position i with size s, we set inside_floor = min(inside_floor, i - s). Any tuple encountered at a position ≥ inside_floor is structurally inside a marked parent and gets marked too. An unmarked tuple below inside_floor is garbage, we skip past its elements.
Since refs always point to older (lower) positions on the stack, a top-to-bottom walk encounters referencing tuples before their targets, so both kinds of transitivity resolve in a single linear pass.
2: Tuple { 2, 2|M } <- marked (reachable)
5: Tuple { 2, 2 } <- unmarked (garbage)
8: Tuple { 2, 2|M } <- marked (reachable)
(M means the mark bit is set. The original size is preserved in the lower bits.)
Pass 2: Compute gaps (bottom to top). We walk from the start of the threatened area back toward the return value, maintaining a running gap counter. When we encounter a tuple/array with the mark bit set (reachable), we clear the mark bit and store the current gap in its size field, so it now records how far this block needs to shift during compaction. When we encounter a tuple/array without the mark bit (garbage), we add its full extent to the gap and set its size to usize::MAX.
Slot 2 has mark bit (reachable), gap is 0, store 0. Slot 5 has no mark bit (garbage), add 3 (size + marker) to gap, set size to usize::MAX. Gap is now 3. Slot 8 has mark bit (reachable), store 3.
2: Tuple { 2, 0 } <- gap = 0, no shift
5: Tuple { 2, usize::MAX } <- garbage
8: Tuple { 2, 3 } <- gap = 3, shift down by 3
Pass 3: Fix references (top to bottom). We walk the return value again from top to bottom. Each reference points to a tuple/array whose size field now records the gap. We adjust the reference by that gap amount. After this pass, all references point to where the data will be after compaction.
Slot 10 is @8, target's gap is 3, becomes @5. Slot 9 is @2, target's gap is 0, stays @2.
Pass 4: Compact (bottom to top). We walk the threatened area one last time from bottom to top, maintaining a write position. Non-marker slots are eagerly copied to the write position. When we encounter a tuple/array marker with size == usize::MAX, it's garbage, so we roll the write position back by the element extent (undoing the eager copies of this tuple's elements) and skip the marker. For all other markers (reachable), the elements were already eagerly copied. We write the marker at the write position with its total_size / elem_size recomputed from the elements below it.
Slots 0-2 (tuple A): gap is 0, already in place. Restore size to 2. Slots 3-5 (tuple B): garbage, skip. Slots 6-8 (tuple C): gap is 3, shift down by 3 to slots 3-5. Restore size to 2. Finally, move the return value down and inflate its total_size:
0: 1 |
1: 2 |
2: Tuple { 2, 2 } | A (ghost)
3: 3 |
4: 4 |
5: Tuple { 2, 2 } | C (ghost, shifted)
6: @2 |
7: @5 |
8: Tuple { 2, 8 } | return value (inflated total_size)
The stack shrank by 3 slots. From the caller's perspective, this is just a tuple of size 8.
A child tuple/array might be shifted before its parent (since we walk bottom to top). But because children always precede their parents on the stack and we're shifting everything toward the bottom (closing gaps), children will already be in their final position by the time we reach the parent.
Boxed values never contain stack references. When a value is boxed, all of its contents are deep-copied onto the heap. If the value contains references to stack data, those targets are recursively copied as well:
STACK: HEAP:
3 | 3 |
4 | 4 |
__| __| <-- heap location `x`
@ + => @x +
@ + @x +
____| ____| <-- heap location `y`
&y // stack now holds a pointer to `y` on the heap
The internal references inside a boxed structure (pointing to a heap fragment + offset) use exactly the same ref representation as stack references that point into a heap fragment. This means the stack can also hold a reference into a boxed structure, for example from an index lookup into a boxed tuple.
Ideally, the same compaction algorithm should work for both returning from a function (compacting the threatened area on the stack) and boxing (deep-copying referenced data onto the heap). The core logic is the same: find reachable data via references, fix the references, compact it.
But on the stack, the return value sits at the top and the threatened area is below it, compaction shrinks the stack downward. For boxing, the referenced data needs to be placed before the boxed structure on the heap. One option is to grow boxed heap allocations from the end toward the start, so that the compacted data ends up in front of the boxed structure, mirroring the stack layout.
In practice, boxing collects scattered data from across the stack rather than compacting a contiguous range, so the two operations may end up with different implementations despite their conceptual similarity.
What's appealing about using ghost data for function returns is that it keeps the stack as the default home for data. Heap allocation only happens when the programmer explicitly asks for it via boxing. The four-pass compaction is more complex than a simple heap move, but it preserves the property that most data never leaves the stack (which was the whole point of flat tuples and arrays to begin with).