So nice to see this get picked up, and honestly surprised to see the interest in what I think of as an extremely esoteric area. Few things:
- Just released an Edition 1.1 that fixed some small errors, amended a few chapters content, and removed some general bluster. I'm going to try and, well, version these.
- New things are coming to Git, and I suspect I'll be talking about Git Futures or A Post-Git World soon enough.
- I'll have a couple more highly practical chapters coming soon, focused on pragmatic organizational adoption, e.g., on wrapping the git CLI to best practices
I never faced git performance issues when working with code. Guess my repos weren't bit. But when I tried to use git as a versioned database of changes in my pet project, I learned a lot about indexes, compacting, etc. Article covers a lot and is very helpful!
Surprise, surprise, another piece of LLM-generated slop on the front page of HN.
From chapter 1:
> When Git slows down, engineers adapt in bad ways. They stop asking questions the history could answer. They batch work to avoid sync cost. They keep messy branches alive longer, postpone cleanup, and treat the repository like something slightly dangerous.
> Once machines start producing code at machine cadence, the model from this book does not break. What changes is the pace: more branches, more commits, more automation, and more surrounding metadata. The traffic gets louder, and the features that keep Git legible under pressure move from "nice to have" to "essential."
> These stop looking like side optimizations. They are what keep machine-scale Git traffic usable.
I had the same thought. TBH there is nothing in those individual sentences that read like AI but when you read them all together I could see it too. I dunno what it is, only way I can describe it is that it does not sound like a normal human but rather a monologue from a character trying to sound impressive with each successive sentence.
I think it’s likely there will be methods to fix this soon, some de-slop algorithms, or is there a deep reason it will always be detectable? Perhaps there are some PhD linguists who have figured out how to quantify the “slop” effect and are writing their thesis on it. Once that is done it will be possible to smooth it away.
The book is definitely LLM assisted authoring yet it also has great content, so not sure we can immediately jump to shaming it entirely for being slop.
Thanks for the kind words, and checking out the book here.
I'd written this piecemeal over the last year or so (originally a series of blog posts), and was happy to release it all for free in a single edition, and under CC.
I'll release an Edition 1.1 soon with some errata, adjustments. There's already a free PDF for the on-the-go -> https://gitperf.com/pdf.html
Regarding the cherry-picking of fragments of an LLM: of course an LLM (in fact several!) were used to stitch together those disparate blog posts into a more coherent whole. And they certainly left an imprint in places. Otherwise, as a solo writer with a full-time job putting together a 200-page book, I'd have to pay an editor, or work with O'Reilly (did this in 2010 on a Redis book; never again!); and perhaps the book wouldn't be free!
LLMs will continue to leave imprints in our work. Some words will, over time, be edited and whittled away. Other words, when the LLM writes well enough to convey a useful point, will be kept.
I think it’s great and you should be doing it, I have no problem at all if there is LLM assistance in authoring, I think it’s a good thing because like you said it enables solo writers with good ideas to produce valuable work that they otherwise wouldn’t!
What I’m interested in is how to address the “grating” or whatever characteristics the readers detect to have them focus on the LLM aspect. I feel it’s probably soon or already removable with some methods.
Ignore the haters they are just wrong to blanket criticize, however their observations are helpful to try and improve the process. We want LLMs to assist in creating useful and effective content for humans.
Slop is content not written by a human. By definition, there can be no de-slop algorithms. There can only be algorithms that remove certain telltale signs, fraudulently attempting to present non-human-generated content as human-generated.
It's fairly easy to quite thoroughly "de-slop" writing: Just feed chunk by chunk to a an agent that you make compare the writing to a good piece of human writing, and adjust the writing to match. It won't address structural/content issues, but all the major models are perfectly capable of copying the tone and style of a particular style of writing, and in doing so it tends to remove most of the rough edges.
(The corollary is that the LLM writing you notice is mostly going to be from people who aren't actively trying to hide it from you)
> The book is definitely LLM assisted authoring yet it also has great content, so not sure we can immediately jump to shaming it entirely for being slop.
Personally I have an extremely hard time reading text like this and it makes me lose trust in the author. Publishing potentially useful Git knowledge this way is a shame.
"Shame" is a strong word to describe a free ebook written for the general good. Happy to have a live conversation with you anytime to discuss Git and its internals to ensure your trust; I have some experience with it.
You probably have a great deal of understanding and knowledge about Git, and this book might be a good resource.
I'm not asking you to do anything differently, and yet I think it's important to realize that people have a deep aversion to text that appears to be LLM generated.
By "shame", I meant that just from a skim of the contents of this book, it can be hard to distinguish it from any other LLM generated text by any other author who has no idea what they're talking about.
That makes people (like me) inclined to discount what it has to say, potentially losing out on good technical content.
Yep, signals are signals, but I think it's quite complicated now. (In any case, this is still the embryonic era of LLMs).
An interesting point to consider: an author that goes out of their way to hide any LLM influence may actually be degrading the signal. Because in that case, you'll not see the LLM's etchings, and misattribute skill to the author under the belief an LLM was not involved. Complicated times.
They wouldn’t be able to publish this useful knowledge easily without it though. And it’s the author’s guidance and vision which the LLM just helps materialize and so I think we should be studying how to generate content with less “slop” features and make it more natural and satisfactory for human readers, not discouraging it.
Although this LLMisms also still stand out to me, I find them bearable as the glue part of this kind of technical/white paper like content.
Maybe I'm already lost in the AI psychosis, maybe some of us are in a transition phase trying to separate from pure synthetic "unmanned slop" to "acceptable slop", maybe someone could derive the same or more value getting the prompts that hold the industry experience the author seems to hold and pointing them to the git codebase/docs herself...
In my case (not seriously engaged in git performance since my git game is trivial) I find the explanations from the sections I have limited knowledge of to be very informative.
Similarly, if not performance-focused, I can wholeheartedly recommend Building Git[0], which walks you through building your own git clone in Ruby (although the language is immaterial).
Git is industry standard, because for what it give you it's a remarkably robust and simple program to use. We're all vaguely aware that the internals are complex, but the UX is clean and usable enough that the complexity usually doesn't leak out.
But the day this breaks down and I have to deal with bloom filters, packfiles, maintaining the git garbage collector or rerere cleanup, is the day I switch our codebase to a centralized VCS.
This stuff is cool to learn about; but it's 5 layers removed from anything I want to be thinking about in my day to day work.
i think it is the other way around. Git is pretty simple internally, and its ui is just knobs and levers to reach into that simple reliable internal structure. This is why for some people it seems like a mess - they want button "do what I want" (and all people and their needs are different), and for other people it's clean - open the throttle, engine will rev.
Agree, the insides are fairly simple and cleanly designed, you could explain exactly how almost everything works in a 1 hour presentation, and most people will grok the main ideas fairly easily.
The tooling on top is inconsistent and kind of messy though, and harder to explain than the internals. I recall hearing somewhere that the tooling we see today as the user tooling was really supposed to just be the tooling for messing with git directly, with the expectation that something would sit above and make it actually user-friendly. I don't remember where I recall this from though, so could be just a post-justification from my own brain to explain the situation :)
> Git is pretty simple internally, and its ui is just knobs and levers to reach into that simple reliable internal structure.
that's not true either. originally it was simple internally - it was mostly shell scripts! writing text files! - but now it has all sorts of complicated optimisations.
the "middle" is somewhat simple for CS people, though - a graph of commits, you can put labels on them, you can send and receive strict appends to the graph to another repository. both the stuff under and above that is quite complicated in practice, but the UI does continue to improve - e.g. editing a past commit message until the release last week was ... complicated.
> editing a past commit message until the release last week was ... complicated
Was it? ‘git log —-oneline’ to figure the commit id if it’s not really recent. ‘git rebase -i <commit-id>^’ and then apply the reword action to your commit.
I'm pretty sure git is industry standard almost entirely entirely because GitHub exists. And I very much disagree that the UX is clean. The cli is more than a bit of a mess.
> I'm pretty sure git is industry standard almost entirely entirely because GitHub exists.
Nah, I remember that time vividly, Github became a thing about a year or two after it was already very much taking the lead.
GitHub became GitHub because git was the winner. There were alternative hubs that supported bazaar and mercurial and whatnot, but git won because for most people, Linus and the kernel team being behind it was reason enough to trust it.
(and I say this as someone who liked hg more than git)
I mean, I don't think anyone can say for sure if "GitHub became GitHub because git was the winner" or "Git became mainstream because GitHub won the developer mindshare", pretty much everyone I knew used GitHub for everything besides the actual VCS protocol, although a lot of us early users were users of GitHub especially because of git.
Most people just wanted to collaborate on the platform other people were on, and where the popular projects were, that it used git was just an implementation detail at that point for most I think.
Git was blazingly fast when it came out, faster than hg (C vs Python) and of course a different order of complexity to svn, which was the actual existing alternative it supplanted.
> Anyone who has ever used Mercurial knows very well what a good versioning tool UX looks like...
So true. I used Mercurial back in the day and also used Darcs before it, and it helped me realize that the best versioning tool UX that exists is still the one Git provides.
PS: Also CVS, SVN, Perforce, and Clear Case professionally, and gave a try to Fossil. None of them even close to Git usability-wise.
What is worse is that for about half a year or so, I now have to authenticate my ed25519-sk key with my Yubikey thrice (!) when using LFS. On every push.
I would guess that for at least 90% of the repos I clone, I just want to install something. Even for the rest, I might hack on the code but seldom look into the history. If I do then I could do a `git fetch` at that point and save the bandwidth and disk space the rest of the time.
Thanks. That's great! I especially like that it then lazy loads the blobs as you need them.
I was going to ask if there's a way to set that as the default but I guess I'll just set up an alias like I have for most of the subcommands I use daily.
This! The default was to have a link to download a tarball of the source. And if the user wanted to contribute (or check the devel version), you would add a link to the vcs.
Grabbing git repos instead of just tarballs is useful.
A) You can update them, because you can git pull to fetch changes.
B) If you want to apply patches on top, its better to have version control so you can keep track of what you changed, especially useful if you want to rebase.
A) only valid if you want to stay with the devel version
B) See A
I use OpenBSD and before that, I was on alpine, debian, and arch. Of it was a software I want to try, I downloaded the tarball. if it’s something I wanted to keep for longer, I created a port or a custom packages.
I think gitignore solves a problem that is hard to solve with the traditional tarball approach.
Downloading a tarball and running ./configure or make, editing a config file here or there, etc then running `make install` is the most common flow. Now days I find myself frequently editing the Dockerfile to make it to my liking. With a git repo, the owners of the repo have excluded all the local files, build caches, etc and you can keep pulling to get updates stashing and reapplying your local changes. With tarballs, you have to figure it out all over again. Lose your build cache (language dependent maybe), lose a change you made here or there, etc.
Fair enough. I also work with a monorepo at work but that I cloned like 5 years ago.
If I think about what I've cloned over the last week or so (LazyVim, gstack, my dotfiles), most of the time I just want the current state and be able to pull updates. Even for my dotfiles or projects that I fork and hack on, most of the time I'm just adding commits and it's seldom that I want to go back to historical ones.
Given how often I see `git clone ...` instructions in Github README.md files, I was just wondering how many other people felt the same?
So my contention is that most of the time, `git clone --depth 1` or `git clone --filter=blob:none` is what you actually want, and in the case that you want the full history then you could do `git clone --depth 0` (or `git clone -full` for even better UX, not that the git cli is known for it's UX).
Its not the default because that'd be counter-productive to developers who use git with larger repositories, which is how git started life in the first place - your clone depth would be entirely useless for Linux kernel developers, for example, if it were default ..
I've always wanted to see a book that describes git for the common man and gives them tons of examples for how to use it to do productive things.
Even for a small office, git can be immensely useful. Entire production line workflows can be implemented with git .. if only folks would learn to use it productively.
Its not just for development. Writers can use it productively. Accountants too.
It always kind of irks me that Git hasn't just been folded into the OS front-end UI by any of the OS vendors .. it'd be so revolutionary to give common folks an easy way to manage the timeline/history of their computer use using git.
So? Doesn't matter. Git in that case still provides valuable historical archiving and versioning that is still more useful than the option, without it.
Plus, its chicken and egg. If the OS had a great interface to Git as part of its responsibilities in the Explorer/Finder interface, folks would be more inclined to use text-based file format standards that are coherent with the Git methodology.
- Just released an Edition 1.1 that fixed some small errors, amended a few chapters content, and removed some general bluster. I'm going to try and, well, version these.
- New things are coming to Git, and I suspect I'll be talking about Git Futures or A Post-Git World soon enough.
- There's now a free PDF, https://gitperf.com/pdf.html
- I'll have a couple more highly practical chapters coming soon, focused on pragmatic organizational adoption, e.g., on wrapping the git CLI to best practices
From chapter 1:
> When Git slows down, engineers adapt in bad ways. They stop asking questions the history could answer. They batch work to avoid sync cost. They keep messy branches alive longer, postpone cleanup, and treat the repository like something slightly dangerous.
From https://gitperf.com/epilogue.html
> Once machines start producing code at machine cadence, the model from this book does not break. What changes is the pace: more branches, more commits, more automation, and more surrounding metadata. The traffic gets louder, and the features that keep Git legible under pressure move from "nice to have" to "essential."
> These stop looking like side optimizations. They are what keep machine-scale Git traffic usable.
The book is definitely LLM assisted authoring yet it also has great content, so not sure we can immediately jump to shaming it entirely for being slop.
I'd written this piecemeal over the last year or so (originally a series of blog posts), and was happy to release it all for free in a single edition, and under CC.
I'll release an Edition 1.1 soon with some errata, adjustments. There's already a free PDF for the on-the-go -> https://gitperf.com/pdf.html
Regarding the cherry-picking of fragments of an LLM: of course an LLM (in fact several!) were used to stitch together those disparate blog posts into a more coherent whole. And they certainly left an imprint in places. Otherwise, as a solo writer with a full-time job putting together a 200-page book, I'd have to pay an editor, or work with O'Reilly (did this in 2010 on a Redis book; never again!); and perhaps the book wouldn't be free!
LLMs will continue to leave imprints in our work. Some words will, over time, be edited and whittled away. Other words, when the LLM writes well enough to convey a useful point, will be kept.
What I’m interested in is how to address the “grating” or whatever characteristics the readers detect to have them focus on the LLM aspect. I feel it’s probably soon or already removable with some methods.
Ignore the haters they are just wrong to blanket criticize, however their observations are helpful to try and improve the process. We want LLMs to assist in creating useful and effective content for humans.
(The corollary is that the LLM writing you notice is mostly going to be from people who aren't actively trying to hide it from you)
Personally I have an extremely hard time reading text like this and it makes me lose trust in the author. Publishing potentially useful Git knowledge this way is a shame.
You probably have a great deal of understanding and knowledge about Git, and this book might be a good resource.
I'm not asking you to do anything differently, and yet I think it's important to realize that people have a deep aversion to text that appears to be LLM generated.
By "shame", I meant that just from a skim of the contents of this book, it can be hard to distinguish it from any other LLM generated text by any other author who has no idea what they're talking about.
That makes people (like me) inclined to discount what it has to say, potentially losing out on good technical content.
An interesting point to consider: an author that goes out of their way to hide any LLM influence may actually be degrading the signal. Because in that case, you'll not see the LLM's etchings, and misattribute skill to the author under the belief an LLM was not involved. Complicated times.
Maybe I'm already lost in the AI psychosis, maybe some of us are in a transition phase trying to separate from pure synthetic "unmanned slop" to "acceptable slop", maybe someone could derive the same or more value getting the prompts that hold the industry experience the author seems to hold and pointing them to the git codebase/docs herself...
In my case (not seriously engaged in git performance since my git game is trivial) I find the explanations from the sections I have limited knowledge of to be very informative.
[0]: https://shop.jcoglan.com/building-git/
But the day this breaks down and I have to deal with bloom filters, packfiles, maintaining the git garbage collector or rerere cleanup, is the day I switch our codebase to a centralized VCS.
This stuff is cool to learn about; but it's 5 layers removed from anything I want to be thinking about in my day to day work.
The tooling on top is inconsistent and kind of messy though, and harder to explain than the internals. I recall hearing somewhere that the tooling we see today as the user tooling was really supposed to just be the tooling for messing with git directly, with the expectation that something would sit above and make it actually user-friendly. I don't remember where I recall this from though, so could be just a post-justification from my own brain to explain the situation :)
that's not true either. originally it was simple internally - it was mostly shell scripts! writing text files! - but now it has all sorts of complicated optimisations.
the "middle" is somewhat simple for CS people, though - a graph of commits, you can put labels on them, you can send and receive strict appends to the graph to another repository. both the stuff under and above that is quite complicated in practice, but the UI does continue to improve - e.g. editing a past commit message until the release last week was ... complicated.
Was it? ‘git log —-oneline’ to figure the commit id if it’s not really recent. ‘git rebase -i <commit-id>^’ and then apply the reword action to your commit.
Nah, I remember that time vividly, Github became a thing about a year or two after it was already very much taking the lead.
GitHub became GitHub because git was the winner. There were alternative hubs that supported bazaar and mercurial and whatnot, but git won because for most people, Linus and the kernel team being behind it was reason enough to trust it.
(and I say this as someone who liked hg more than git)
Most people just wanted to collaborate on the platform other people were on, and where the popular projects were, that it used git was just an implementation detail at that point for most I think.
So true. I used Mercurial back in the day and also used Darcs before it, and it helped me realize that the best versioning tool UX that exists is still the one Git provides.
PS: Also CVS, SVN, Perforce, and Clear Case professionally, and gave a try to Fossil. None of them even close to Git usability-wise.
Seemingly seconds on every remote-touching command, even on a very small repo.
Why isn't
the default?I would guess that for at least 90% of the repos I clone, I just want to install something. Even for the rest, I might hack on the code but seldom look into the history. If I do then I could do a `git fetch` at that point and save the bandwidth and disk space the rest of the time.
https://github.blog/open-source/git/get-up-to-speed-with-par...
https://gitperf.com/chapter-11.html
I was going to ask if there's a way to set that as the default but I guess I'll just set up an alias like I have for most of the subcommands I use daily.
A) You can update them, because you can git pull to fetch changes.
B) If you want to apply patches on top, its better to have version control so you can keep track of what you changed, especially useful if you want to rebase.
B) See A
I use OpenBSD and before that, I was on alpine, debian, and arch. Of it was a software I want to try, I downloaded the tarball. if it’s something I wanted to keep for longer, I created a port or a custom packages.
Downloading a tarball and running ./configure or make, editing a config file here or there, etc then running `make install` is the most common flow. Now days I find myself frequently editing the Dockerfile to make it to my liking. With a git repo, the owners of the repo have excluded all the local files, build caches, etc and you can keep pulling to get updates stashing and reapplying your local changes. With tarballs, you have to figure it out all over again. Lose your build cache (language dependent maybe), lose a change you made here or there, etc.
If I think about what I've cloned over the last week or so (LazyVim, gstack, my dotfiles), most of the time I just want the current state and be able to pull updates. Even for my dotfiles or projects that I fork and hack on, most of the time I'm just adding commits and it's seldom that I want to go back to historical ones.
Given how often I see `git clone ...` instructions in Github README.md files, I was just wondering how many other people felt the same?
So my contention is that most of the time, `git clone --depth 1` or `git clone --filter=blob:none` is what you actually want, and in the case that you want the full history then you could do `git clone --depth 0` (or `git clone -full` for even better UX, not that the git cli is known for it's UX).
and also git
which makes more sense i guess
Even for a small office, git can be immensely useful. Entire production line workflows can be implemented with git .. if only folks would learn to use it productively.
Its not just for development. Writers can use it productively. Accountants too.
It always kind of irks me that Git hasn't just been folded into the OS front-end UI by any of the OS vendors .. it'd be so revolutionary to give common folks an easy way to manage the timeline/history of their computer use using git.
Plus, its chicken and egg. If the OS had a great interface to Git as part of its responsibilities in the Explorer/Finder interface, folks would be more inclined to use text-based file format standards that are coherent with the Git methodology.