"Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array."
Zig compiler pipeline (AST, Zir, Air, Sema) does exactly this on all layers. Not only contiguous, but instead of array-of-structs it is struct-of-arrays, so walking the tree is even more cache friendly. For AST see: https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...
Makes me wonder if people in APL/J/K community have not been influenced or influencing this kind of technique. IIRC Aaron Hsu does tree processing through arrays (but i'm not skilled enough to analyze his code)
Amazing article, very good advice to keep your data structures flat.
Adding to that, it also makes editing the AST vastly more efficient.
I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.
As the article mentions, this makes it quite similar to a bytecode vm. I think the traditional wisdom is that an AST walker is easy to write, but for speed you'd want a bytecode interpreter. It'd be interesting to see how close the performance gets with this flattened AST.
In practice I think there are more differences. E.g. AST interpreters tend to pass environments around while bytecode interpreters often store these on a stack (though I guess there's nothing stopping you from doing this with an AST either). I wonder if there's some goldilocks zone for ease of implementation with decent performance.
If you instead flatten the expression tree into RPN, then you can execute it like that, with a stack machine.
I seem to recall that the Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman [1988]) describes a technique whereby intermediate code is represented in RPN, and transformations are performed by pattern matches on it.
> Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array.
This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.
Don't make me bring out the L word for the billionth time.
> A flat array of Exprs can make it fun and easy to implement hash consing
OK, it's not a case of L-ignorance, just willful neglect.
This is a fantastic idea. AST works well in an array based allocation block since it has no need for freeing individual nodes. It’s an add-only allocation.
Cool! Carbon is doing exactly this. I had asked leads if there was a paper on this approach, but they didn't have anything for me. I'll send them this post!
Adding to that, it also makes editing the AST vastly more efficient.
I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.
In practice I think there are more differences. E.g. AST interpreters tend to pass environments around while bytecode interpreters often store these on a stack (though I guess there's nothing stopping you from doing this with an AST either). I wonder if there's some goldilocks zone for ease of implementation with decent performance.
I seem to recall that the Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman [1988]) describes a technique whereby intermediate code is represented in RPN, and transformations are performed by pattern matches on it.
This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.
Don't make me bring out the L word for the billionth time.
> A flat array of Exprs can make it fun and easy to implement hash consing
OK, it's not a case of L-ignorance, just willful neglect.
> A sufficiently smart memory allocator might achieve the same thing, especially if you allocate the whole AST up front and never add to it
> Again, a really fast malloc might be hard to compete with—but you basically can’t beat bump allocation on sheer simplicity.