Improving the performance of WAT parser

(blog.gplane.win)

109 points | by gplane 22 days ago

9 comments

  • thramp 17 days ago
    We (the rust-analyzer team) have been aware of the slowness in Rowan for a while, but other things always took priority. Beyond allocation, Rowan is structured internally as a doubly-linked list to support mutating trees, but: 1. Mutation isn’t really worth it; the API isn’t user-friendly. 2. In most cases, it’s straight up faster to create a new parse tree and replace the existing one. Cache effects of a linked list vs. an arena!

    In fairness, I don’t think we predicted just how large L1/L2 caches would get over the coming years.

    • testdelacc1 17 days ago
      Is a rewrite of Rowan on the horizon?
      • thramp 15 days ago
        One of the rust-analyzer co-maintainers (Chayim Friedman) already rewrote it, but we can’t integrate it yet, as about 40 assists (the little lightbulbs?) still rely on mutation. If you want something Rowan-like, I think syntree and Biome’s rowan fork are good options to look into.
        • gplane 8 days ago
          How about cstree? https://crates.io/crates/cstree Recently I found that traversal in rowan tree can cost much time. No sure if there's a cheaper way to achieve that. Or, integrate rowan with bump allocator (bumpalo or oxc_allocator)?
  • vjerancrnjak 17 days ago
    It’s funny how there is continuous reinvention of parsing approaches.

    Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.

    • embedding-shape 17 days ago
      Hardware also changes across time, so while something that was initially fast, people with new hardware tries it, finds it now so fast for them, then create their own "fast X". Fast forward 10 more years, someone with new hardware finds that, "huh why isn't it using extension Y" and now we have three libraries all called "Fast X".
    • zahlman 17 days ago
      Because you have to learn how to use any given parser generator, naive code is easy to write, and there are tons of applications for parsing that aren't really performance critical.
    • high_na_euv 17 days ago
      I'd say because parsing is very specific kind of work heavily dependent on the grammar you're dealing with
    • munificent 17 days ago
      A parser spends time:

      1. Consuming tokens.

      2. Recognizing the grammar.

      3. Producing AST nodes.

      Steps 1 and 3 are heavily dependent on the data types that make the most sense for the previous (lexing) and next (semantic analysis) phases of the compiler. There is no one Token type that works for every language, nor one AST type.

      The recognizing the grammar part is relatively easy, but since so much of the code is consuming and producing datatypes that are unique to a given implementation, it's hard to have very high performance reusable libraries.

    • mgaunard 17 days ago
      There are good parser generators, but potentially not as Rust libraries.
  • writebetterc 17 days ago
    So it went from parsing at 25MiB/s to 115MiB/s. I feel like 115MiB/s is very slow for a Rust program, I wonder what it's up to that makes it so slow now. No diss to the author, good speedup, and it might be good enough for them.
    • mananaysiempre 17 days ago
      115 MiB/s is something like 20 to 30 cycles per byte on a laptop, 50 on a desktop. That’s definitely quite slow as far as a CPU’s capacity to ingest bytes, but unfortunately about as fast as it gets for scalar (machine) code that does meaningful work per byte. There may be another factor of 2 or 3 to be had somewhere, or there may not be. If you want to go meaningfully faster, as in at least at the speed of your disk[1], you need to stop doing work per byte and start vectorizing. For parsers, that is possible but hard.

      [1] https://www.youtube.com/watch?v=p6X8BGSrR9w

      • zozbot234 17 days ago
        A quick rule of thumb is that one or two bytes per peak clock cycle per core or so (not unlike an old 8 bit or 16 bit machine!) is the worst case for memory bandwidth when running highly multithreaded workloads that heavily access main RAM outside cache. So there's a lot of gain to be had before memory bandwidth is truly saturated, and even then one can plausibly move to GPU-based compute and speed things up further. (Unified memory+HBM may potentially add a 2x or 3x multiplier to this basic figure, but either way it's in the ballpark.)
    • high_na_euv 17 days ago
      "for Rust program"?

      Isnt it more about the grammar than the prog lang?

      • writebetterc 17 days ago
        The grammar matters also, of course. A pure Python program is going to be much slower than the equivalent Rust program, just because CPython is so slow.

        I don't know if this does semantic analysis of the program as well.

  • dfajgljsldkjag 17 days ago
    The performance gain from using a single shared vector for the nodes is pretty crazy. It just goes to show how much allocation overhead can slow things down if you are not careful.
  • epage 17 days ago
    > Use hand-written parser > > The old parser was written with winnow which is a parser combinator library. While it’s easy to create a parser with parser combinators, it’s generally slower than a hand-written parser, so the first step is to write the parser by hands. Hand-written parser is not only faster but also allows to do more optimizations in the future.

    Maintainer of Winnow here. I wish there were more details on this. I switched `toml` / `toml_edit` to being hand written and got some performance boost but I feel like the other things listed would have dwarfed the gains that I got. I wonder if there were sub optimal patterns they employeed that we could find ways to help better guide people.

    For anyone going on about "hand written is always better", I disagree. Parser combinators offer a great way to map things back to grammar definitions which makes them much easier to maintainer. Only in extreme circumstances of features and/or performance does it seem worth going hand-written to me.

    • gplane 17 days ago
      > "hand written is always better", I disagree. - Yep. As far as I know, winnow provides SIMD in some cases, while for hand written parsers, writing SIMD can be very hard.
  • taylorallred 17 days ago
    It seems to me like parser combinators are always more trouble than they're worth. People often have the impression that parsing is difficult and should be outsourced to another library, but often it's pretty simple to hand-roll and usually it makes faster code.
  • shevy-java 17 days ago
    Anyone using WebAssembly yet? HTML, CSS, JavaScript - all there.

    Just about nobody uses WebAssembly. It first appeared almost ten years ago. This is snail-speed evolution at best.

    • anonymous908213 17 days ago
      People use wasm for things that need wasm. My use case is my cross-platform game engine, because running both natively and in the browser was a priority for me. It is a wonderful tool and it is a truly magical feeling to see my native games running in the browser. But 99% of web developers are developing ordinary websites, so they don't need it. That's not an indictment of wasm.
    • circuit10 17 days ago
      This is like saying "HTML, CSS and JavaScript are all widely used, but the webcam capture API is used way less, so obviously it's a failure"

      In its current scope, WASM is a way to port existing code or accelerate certain computations, which only some applications need. Most websites don't need it, like how most sites don't need to use webcam capture; that doesn't mean it's not useful for those that do

    • miki_oomiri 17 days ago
      You have the wrong understanding about wasm. It's absolutely not supposed to be replacing HTML, CSS or JS.

      And yes wasm is used wildly. On the web for expensive computation (Google earth, figma, autocad, unity games) or server side for portability and sandboxing (Cloudflare workers, fastly, …)

      • IshKebab 17 days ago
        It is definitely meant to replace JS in some applications. It isn't quite there yet for normal web pages but it will be eventually. There are a few front-end web frameworks written in Rust that use WASM.

        The whole "it's not meant to replace JS" thing was just to reduce pushback from JS devs.

        • miki_oomiri 17 days ago
          > The whole "it's not meant to replace JS" thing was just to reduce pushback from JS devs.

          It was born at the same time as webgl, at the time of Jit optimisation for js engines. As a subset of js first, then as wasm as we know it. It was originally for games and performance on the web.

          At no point there was a conversation about "replacing js", but more like, "js can't do these stuff. let's have something else".

          • IshKebab 17 days ago
            > At no point there was a conversation about "replacing js"

            There absolutely was. This famous talk was made even when it was still Asm.js:

            https://www.destroyallsoftware.com/talks/the-birth-and-death...

            • miki_oomiri 17 days ago
              Not gonna argue with "who said what".

              All I can tell you with certainty is that the people who designed and funded webgl/asmjs/llvm efforts at Mozilla (Alon, Vlad, Brendan, …) clearly understood that wasm was a needed companion alongside JS (and its DOM&co bindings). Not a replacement. I was part of these conversations.

              I understand why people would think it was a JS killer, but that's a naive way of looking at it.

            • Dylan16807 17 days ago
              That talk is intentionally silly and at the end is talking about replacing all binaries, not javascript in particular, and yes that does strongly change the meaning.
    • embedding-shape 17 days ago
      > Anyone using WebAssembly yet?

      Yes, tons. Obviously not all, but large parts of these are WASM: https://itch.io/games/platform-web

      Tools like Figma are only performant because of WASM.

    • mickael-kerjean 17 days ago
      I use it extensively in my OSS work (https://github.com/mickael-kerjean/filestash) mostly around:

      1. creating plugins that get executed in the browser to render files like Parquet, PSD, TIFF, SQLite, EPS, ZIP, TGZ, GIS related files and many more, where C libraries are almost always the reference implementations. There are almost a hundred supported file formats, most of which are supported through WASM

      2. creating plugins that get executed in the server to generate your own endpoint or middleware while being sure you can't start exfiltrating data (which can be other people's files, and other sensitive stuff)

      3. in the workflow engine to enable people to run their own sandboxed scripts without giving those a blank check to go crazy

      • ChadNauseam 17 days ago
        It also simply lets you use rust on the web. That's why I use wasm. It's actually an extremely nice experience. I write all my business logic in rust, and all my UI logic in javascript. There is rust tooling that automatically generates typescript types and APIs for you that make interoperating the two languages basically effortless. And by using rust/wasm, I can reach a level of performance that would be hilariously impossible in javascript
    • adzm 17 days ago
      I use a wasm xxhash implementation that is 40x faster than the fastest JavaScript version I can find. Drop in replacement. Call overhead is minimal, could be better with stringref if that ever gets available. Also some other audio analysis stuff in wasm I've been using is 400x faster than the JavaScript implementation but admittedly I just went straight to wasm rather than try to optimize the js in that case.
    • onion2k 17 days ago
      I'm writing a point and click adventure game, and for that I've built a dialogue editor that uses a local text-to-speech model to turn speech into audio that runs in WASM (or WebGPU if it's available).

      From what I can tell WASM is mostly being used to run big libraries from other languages in web apps. That's not a particularly common thing to need, so it's not commonly used. That doesn't mean it's moving too slowly.

    • TkTech 17 days ago
      Sure, here's a Rust/WASM procedural skybox generator I threw together the other day, and is much, much faster at 16k renders then Javascript. https://tkte.ch/night-sky/
    • demaga 17 days ago
      I saw a few web apps that use Rust crates for physics. I guess they must be using wasm?
    • taminka 17 days ago
      wasm isn't meant to supersede html/css/js (unfortunately) and it's regularly used for high performance applications in the browser, web-based cad software, figma, youtube (i think they use wasm for codec fallback when support is spotty) etc

      there is also games, stuff to do with video (ffmpeg built for wasm), ml applications (mlc), in fact it's currently impossible to use wasm w/o js to load the wasm binary

      as a result, the web stack is a bit upside down now, w/o the seemingly "low level" and "high performance" parts over the slow bits (javascript)

    • flohofwoe 17 days ago
      WebAssembly is a virtual ISA, not a replacement for HTML and CSS. It was also never meant to kill Javascript (which is actually a pretty nice language if you stick to the 'good parts' via Typescript and linting), but at most as an alternative or complement to JS, and as that WASM works really well.
    • dagi3d 17 days ago
      Figma