I work on a revision control system project, except merge is CRDT. On Feb 22 there was a server break-in (I did not keep unencrypted sources on the client, server login was YubiKey only, but that is not 100% guarantee). I reported break-in to my Telegram channel that day.
I used tree-sitter for coarse AST. Some key parts were missing from the server as well, because I expected problems (had lots of adventures in East Asia, evil maids, various other incidents on a regular basis).
When I saw "tree-sitter in go" title, I was very glad initially. Solves some problems for me. Then I saw the full picture.
AI often produces nonsense that a human wouldn't. If a project was written using AI the chances that it is a useless mess are significantly higher than if it was written by a human.
> Pure-Go tree-sitter runtime — no CGo, no C toolchain, WASM-ready.
No you didn't. The readme is obvious LLM slop. Em-dash, rule of three, "not x, y". Why should anyone spend effort reading something you couldn't be bothered to write? Why did you post it to HN from a burner account?
Oh this is really neat for the Bazel community, as depending on tree-sitter to build a gazelle language extension, with Gazelle written in Go, requires you to use CGO.
Now perhaps we can get rid of the CGO dependency and make it pure Go instead.
I have pinged some folks to take a look at it.
Do you have an equivalent of TreeCursors or tree-sitter-generate?
There are at least some use cases where neither queries nor walks are suitable. And I have run into cases where being able to regenerate and compile grammars on the fly is immeasurably helpful.
At least for my use cases, this would be unusable.
Also, what the hell is this:
> partial [..] missing external scanner
Why do you have a parsing mode that guarantees incorrect outputs on some grammars (html comes to mind) and then use it as your “90x faster” benchmark figure?
the 90x figure is on Go source for apples to apples against CGO bound tree-sitter.
your use case is not one i designed for although yeah maybe the readme has some sections too close. the only external scanner missing atm is norg. now that i know your use case i can probably think of a way to close it
206 binary blobs = 15MB, so not crazy but i built for this use case where you can declare the registry of languages you want to load and not have to own all the grammar binaries by default
primarily, got is structural VCS intended for concurrent edits of the same file.
it does this via gotreesitter and gts-suite abstractions that enable it to:
- have entity-aware diffs
- not line by line but function by function
- structural blame
- attribution resolution for the lifetime of the entity
- semver from structure
- it can recommend bumps because it knows what is breaking change vs minor vs patch
- entity history
- because entities are tracked independently, file renames or moves dont affect the entity's history
when gotreesitter cant parse a language, the 3way text merge happens as a fallback. what the structural merge enables is no conflicts unless same entity has conflicting changes
it is interoperable with git. we like git when its good but attempted to ease the pains in UX somewhat. you can take advantage of got locally but still push it to git remote forges jsut the same. when you pull stuff in this way, got will load the entity history into the git repo ensuring that you can still do got stuff locally (inspect entity histories, etc)
yeah the tests live with the implementation code always (Go thing) and the repo root thing is like a preference, main is an acceptable package to put stuff in (Go thing), i see this a lot with smaller projects or library type projects
Better title
My design docs https://replicated.wiki/blog/partII.html
I used tree-sitter for coarse AST. Some key parts were missing from the server as well, because I expected problems (had lots of adventures in East Asia, evil maids, various other incidents on a regular basis).
When I saw "tree-sitter in go" title, I was very glad initially. Solves some problems for me. Then I saw the full picture.
i needed this project so i made it for my use case and had to build on top of it. the only way to ensure quality is to read it all line by line.
if you give me code that you yourself have not reviewed i will not review it for you.
No you didn't. The readme is obvious LLM slop. Em-dash, rule of three, "not x, y". Why should anyone spend effort reading something you couldn't be bothered to write? Why did you post it to HN from a burner account?
Now perhaps we can get rid of the CGO dependency and make it pure Go instead. I have pinged some folks to take a look at it.
There are at least some use cases where neither queries nor walks are suitable. And I have run into cases where being able to regenerate and compile grammars on the fly is immeasurably helpful.
At least for my use cases, this would be unusable.
Also, what the hell is this:
> partial [..] missing external scanner
Why do you have a parsing mode that guarantees incorrect outputs on some grammars (html comes to mind) and then use it as your “90x faster” benchmark figure?
your use case is not one i designed for although yeah maybe the readme has some sections too close. the only external scanner missing atm is norg. now that i know your use case i can probably think of a way to close it
I imagine this can very useful for Go-based forges that need syntax highlighting (i.e. Gitea, Forgejo).
I have a strict no-cgo requirement, so I might use it in my project, which is Git+JJ forge https://gitncoffee.com.
Are these pretty up-to-date grammars? I'm awfully tempted to switch to your project
How large are your binaries getting? I was concerned about the size of some of the grammars
It means the CLI I am working on can ship support for many languages whilst still being a smallish (sub 50mb) download
I shall definitely check it out!
I use CRDT merge though, cause 3-way metadata-less merges only provide very incremental improvements over e.g. git+mergiraf.
How do you see got's main improvement over git?
it does this via gotreesitter and gts-suite abstractions that enable it to: - have entity-aware diffs - not line by line but function by function - structural blame - attribution resolution for the lifetime of the entity - semver from structure - it can recommend bumps because it knows what is breaking change vs minor vs patch - entity history - because entities are tracked independently, file renames or moves dont affect the entity's history
when gotreesitter cant parse a language, the 3way text merge happens as a fallback. what the structural merge enables is no conflicts unless same entity has conflicting changes