Congratulations to Michael Knyszek and Austin Clements for writing an absolutely top tier blog post that is as clear as it gets. I wish my writing was this good. I don't even use Go and it was still 100% a great use of my time to read this.
Wow… this is an excellent article. I’ve always been fascinated by GCs (well, as long as I’ve known what they are), and I just love seeing this kind of technical but accessible explanation of how they work, their bottlenecks, and a great new idea about solving those bottlenecks. This is exactly the kind of article that I hope to see every time I load up hacker news
If you haven't read about the CHICKEN lisp gc, ask your fave AI "How does the CHICKEN lisp Cheney on the M.T.A. garbage collector work?" It allocates heap objects on the stack and when out of stack, starts a new heap/stack copying live objects. Everything runs as continuations which is the reference to the song with similar lyrics. Since both stack and recent heap objects are contiguous it has great CPU cache use.
Search doesn't work as well as it used to--I couldn't find the original article I had read. AI explanations are often better than random search results, but I don't really want to copy/paste.
Acceleration by using the x86 AVX-512 extensions is especially compelling. Since ARM64 processors are becoming pervasive in server-side systems, is-there/will-there-be any optimization using the ARM64 NEON vector instructions in current or future Go versions? (The NEON instructions are 128-bit, instead of 512 bits in the AVX-512 set, but may still be useful.)
the two little slide decks showing each garbage collector in action are simply wonderful, and really help communicate how this improves go's GC situation
It's also a great CS primer on garbage collection; Go has made me interested in that aspect of software engineering again, it feels importaint again unlike with higher level languages like Java / JS.
A (usually) small amount of memory that is the standard size all the memory management hardware and software use. Often 16Kb or 4Kb. If physical memory gets mapped to logical address space, address space marked read only, data swapped in or out, or logical address space gets mapped to other hardware (say GPU memory or a network card's buffer) it's usually done by page.
FTA: “The implementation of Green Tea has a special case for pages that have only a single object to scan. This helps reduce regressions, but doesn’t completely eliminate them.”
Also FTA: “One surprise result of this work was that scanning a mere 2% of a page at a time can yield improvements over the graph flood.”
⇒ I think you’d have to try and get two objects on each page, and they would have to be small (you’d have to be able to fit over 100 objects in a page to have 2 live objects be <2% of all objects in the page)
Google Cloud products including GKE (Kubernetes), Cloud Run/Functions, the gcloud CLI, and a number of other utilities and control plane components sit it direct revenue paths. In the case of Cloud Run/Functions (Go support) and GKE, those products generate direct revenue, and the amount is much higher than you would think.
Kubernetes as a whole is the best example I can think of, given that it's deployed in most modern tech companies and every cloud provider offers a managed service.
That's an application (as is Docker, also built in Go), but the question was about internal Google services and... we don't know because company secrets, but it's likely on the rise as it was written as a replacement for C++ which was their previous main language for backend services alongside Java/Kotlin. One source with the charming name "assbuttass" [0] says all new services are written in Go, with a follow-up by "deathmaster99" saying only 10% of code is Go, but this was a year ago and even 10% at Google's scale probably represents tens of millions of LOC.
https://www.more-magic.net/posts/internals-gc.html
I've already been using bitvector SIMD for the sweep portion of mark/sweep. It's neat to see that tracing can be done this way.
VGF2P8AFFINEQB FTW
there's also the inverse: https://www.felixcloutier.com/x86/gf2p8affineinvqb
00: white
10: gray
11: black
then we cam describe it as a very cool variation of the tri-color gc algorithm.
https://en.wikipedia.org/wiki/Tracing_garbage_collection#Tri...
https://en.wikipedia.org/wiki/Page_%28computer_memory%29
Also FTA: “One surprise result of this work was that scanning a mere 2% of a page at a time can yield improvements over the graph flood.”
⇒ I think you’d have to try and get two objects on each page, and they would have to be small (you’d have to be able to fit over 100 objects in a page to have 2 live objects be <2% of all objects in the page)
[0] https://www.reddit.com/r/golang/comments/1c9fhet/how_much_go...