Vector graphics on GPU

(gasiulis.name)

170 points | by gsf_emergency_6 35 days ago

12 comments

Lichtso 31 days ago
> but [analytic anti-aliasing (aaa)] also has much better quality than what can be practically achieved with supersampling
What this statement is missing is that aaa coverage is immediately resolved, while msaa coverage is resolved later in a separate step with extra data being buffered in between. This is important because msaa is unbiased while aaa is biased towards too much coverage once two paths partially cover the same pixel. In other words aaa becomes incorrect once you draw overlapping or self-intersecting paths.
Think about drawing the same path over and over at the same place: aaa will become darker with every iteration, msaa is idempotent and will not change further after the first iteration.
Unfortunately, this is a little known fact even in the exquisite circles of 2D vector graphics people, often presenting aaa as the silver bullet, which it is not.
reallynattu 30 days ago
For anyone looking at this space: ThorVG is worth checking out.
Open-source vector engine with GPU backends (WebGPU, OpenGL), runs on microcontrollers to browsers. Now a Linux Foundation project.
https://github.com/thorvg/thorvg
(Disclosure: CTO at LottieFiles, we build and maintain ThorVG in-house, with community contributions from individuals and companies like Canva)
[-]
- erichocean 30 days ago
  How does ThorVG's GPU implementation compare to Impeller (Flutter's new-ish GPU rendering backend)?
virtualritz 31 days ago
Unless I miss something I think that this describes box filtering.
It should probably mention that that this is only sufficient for some use cases but not for high quality ones.
E.g. if you were to use this e.g. for rendering font glyphs into something like a static image (or a slow rolling title/credits) you probably want a higher quality filter.
[-]
- jstimpfle 31 days ago
  What type of filter do you mean? Unless I'm misunderstanding/missing something, the approach described doesn't go into the details of how coverage is computed. If the input image is only simple lines whose coverage can be correctly computed (don't know how to do this for curves?) then what's missing?
  I'd be interested how feasible complete 2D UIs using dynamically GPU rendered vector graphics are. I've played with vector rendering in the past, using a pixel shader that more or less implemented the method described in the OP. Could render the ghost script tiger at good speeds (like 1-digit milliseconds at 4K IIRC), but there is always an overhead to generating vector paths, sampling them into line segments, dispatching them etc... Building a 2D UI based on optimized primitives instead, like axis-aligned rects and rounded rects, mostly will always be faster, obviously.
  Text rendering typically adds pixel snapping, possibly using byte code interpreter, and often adds sub-pixel rendering.
  [-]
  - dahart 31 days ago
    > What type of filter do you mean? […] the approach described doesn’t go into the details of how coverage is computed
    This article does clip against a square pixel’s edges, and sums the area of what’s inside without weighting, which is equivalent to a box filter. (A box filter is also what you get if you super-sample the pixel with an infinite number of samples and then use the average value of all the samples.) The problem is that there are cases where this approach can result in visible aliasing, even though it’s an analytic method.
    When you want high quality anti-aliasing, you need to model pixels as soft leaky overlapping blobs, not little squares. Instead of clipping at the pixel edges, you need to clip further away, and weight the middle of the region more than the outer edges. There’s no analytic method and no perfect filter, there are just tradeoffs that you have to balance. Often people use filters like Triangle, Lanczos, Mitchell, Gaussian, etc.. These all provide better anti-aliasing properties than clipping against a square.
  - jlokier 31 days ago
    > If the input image is only simple lines whose coverage can be correctly computed (don't know how to do this for curves?) then what's missing?
    Computing pixel coverage accurately isn't enough for the best results. Using it as the alpha channel for blending forground over background colour is the same thing as sampling a box filter applied to the underlying continuous vector image.
    But often a box filter isn't ideal.
    Pixels on the physical screen have a shape and non-uniform intensity across their surface.
    RGB sub-pixels (or other colour basis) are often at different positions, and the perceptual luminance differs between sub-pixels in addition to the non-uniform intensity.
    If you don't want to tune rendering for a particular display, there are sometimes still improvements from using a non-box filter
    An alternative is to compute the 2D integral of a filter kernel over the coverage area for each pixel. If the kernel has separate R, G, B components, to account for sub-pixel geometry, then you may require another function to optimise perceptual luminance while minimising colour fringing on detailed geometries.
    Gamma correction helps, and fortunately that's easily combined with coverage. For example, slow rolling tile/credits will shimmer less at the edges if gamme is applied correctly.
    However, these days with Retina/HiDPI-style displays, these issues are reduced.
    For example, MacOS removed sub-pixel anti-aliasing from text rendering in recent years, because they expect you to use a Retina display, and they've decided regular whole-pixel coverage anti-aliasing is good enough on those.
jesse__ 31 days ago
Interestingly they do not cite calculating a signed distance to the surface of the shape as an approach to doing AA, as described in the Valve paper [1]. I suppose this is more targeted at offline baking, but given they're suggesting iterating every curve at every pixel, I'm not sure why you wouldn't.
[1] https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007...
jayd16 31 days ago
So without blowing up the traditional shader pipeline, why is it not trivial to add a path stage as an alternative to the vertex stage? It seems like GPUs and shader language could implement a standard way to turn vector paths into fragments and keep the rest of the pipeline.
In fact, you could likely use the geometry stage to create arbitrarily dense vertices based on path data passed to the shader without needing any new GPU features.
Why is this not done? Is the CPU render still faster than these options?
[-]
- exDM69 31 days ago
  > why is it not trivial to add a path stage as an alternative to the vertex stage?
  Because paths, unlike triangles are not fixed size or have screen space locality. Paths consist of multiple contours of segments, typically cubic bezier curves and a winding rule.
  You can't draw one segment out of a contour on the screen and continue to the next one, let alone do them in parallel. A vertical line segment on the left hand side going bottom to top of your screen will make every pixel to the right of it "inside" the path, but if there's another line segment going top to bottom somewhere the pixel and it's outside again.
  You need to evaluate the winding rule for every curve segment on every pixel and sum it up.
  By contrast, all the pixels inside the triangle are also inside the bounding box of the triangle and the inside/outside test for a pixel is trivially simple.
  There are at least four popular approaches to GPU vector graphics:
  1) Loop-Blinn: Use CPU to tessellate the path to triangles on the inside and on the edges of the paths. Use a special shader with some tricks to evaluate a bezier curve for the triangles on the edges.
  2) Stencil then cover: For each line segment in a tessellated curve, draw a rectangle that extends to the left edge of the contour and use two sided stencil function to add +1 or -1 to the stencil buffer. Draw another rectangle on top of the whole path and set the stencil test to draw only where the stencil buffer is non-zero (or even/odd) according to the winding rule.
  3) Draw a rectangle with a special shader that evaluates all the curves in a path, and use a spatial data structure to skip some. Useful for fonts and quadratic bezier curves, not full vector graphics. Much faster than the other methods for simple and small (pixel size) filled paths. Example: Lengyel's method / Slug library.
  4) Compute based methods such as the one in this article or Raph Levien's work: use a grid based system with tessellated line segments to limit the number of curves that have to be evaluated per pixel.
  Now this is only filling paths, which is the easy part. Stroking paths is much more difficult. Full SVG support has both and much more.
  > In fact, you could likely use the geometry stage to create arbitrarily dense vertices based on path data passed to the shader without needing any new GPU features.
  Geometry shaders are commonly used with stencil-then-cover to avoid a CPU preprocessing step.
  But none of the GPU geometry stages (geometry, tessellation or mesh shaders) are powerful enough to deal with all the corner cases of tessellating vector graphics paths, self intersections, cusps, holes, degenerate curves etc. It's not a very parallel friendly problem.
  > Why is this not done?
  As I've described here: all of these ideas have been done with varying degrees of success.
  > Is the CPU render still faster than these options?
  No, the fastest methods are a combination of CPU preprocessing for the difficult geometry problems and GPU for blasting out the pixels.
Dwedit 31 days ago
According to the page here: https://www.humus.name/index.php?page=News&ID=228
The best way to draw a circle on a GPU is to start with a large triangle, and keep adding additional triangles on the edges until you've reached the point where you do not need to add any more triangles (smaller than a pixel)
[-]
- jesse__ 31 days ago
  I'd put money on that the best way is actually to draw a quad, or single triangle, and draw the circle as a SDF in the fragment shader
- dahart 30 days ago
  Fwiw, that’s not the best way to draw a circle in general, the test shows it’s the fastest way to tessellate a circle, among the methods the author tried, and using a specific GPU. You don’t have to use triangles to draw a circle, and the author didn’t try all possible tessellations, and the author there didn’t compare perf against any other method (a shader, for example), and the also didn’t investigate accuracy. Their fast method might have numerical accuracy issues with thin sliver triangles at some point.
masswerk 31 days ago
May require "(2022)" in the title.
xattt 31 days ago
Tangential, but was this not the goal of Quartz 2D? The idea of everyday things running on the GPU seemed very attractive.
There is some context in this 13-year-old discussion: https://news.ycombinator.com/item?id=5345905#5346541
I am curious if the equation of CPU-determined graphics being faster than being done on the GPU has changed in the last decade.
Did Quartz 2D ever become enabled on macOS?
[-]
- kllrnohj 31 days ago
  When things like this (or Vello or piet-gpu or etc...) talk about "vector graphics on GPU" they are near exclusively talking only about essentially a full solve solution. A generic solution that handles fonts and svgs and arbitrarily complex paths with strokes and fills and the whole shebang.
  These are great goals, but also largely inconsequential with nearly all UI designs. The majority of systems today (like skia) are hybrids. Things like simple shapes (eg, round rects) have analytical shaders on the GPU and complex paths (like fronts) are just done on the CPU once and cached on the GPU in a texture. It's a very robust, fast approach to the wholistic problem, at the cost of not being as "clean" of a solution like a pure GPU renderer would be.
- jacobp100 31 days ago
  > I am curious if the equation of CPU-determined graphics being faster than being done on the GPU has changed in the last decade
  If you look at Blend2D (a CPU rasterizer), they seem to outperform every other rasterizer including GPU-based ones - according to their own benchmarks at least
  [-]
  - miguel_martin 31 days ago
    Blaze outperforms Blend2D - by the same author as the article: https://gasiulis.name/parallel-rasterization-on-cpu/ - but to be fair, Blend2D is really fast.
    [-]
    - Asm2D 31 days ago
      You need to rerun the benchmarks if you want fresh numbers. The post was written when Blend2D didn't have JIT for AArch64, which penalized it a bit. Also on X86_64 the numbers are really good for Blend2D, which beats Blaze in some tests. So it's not black&white.
      And please keep in mind that Blend2D is not really in development anymore - it has no funding so the project is basically done.
      [-]
      - coffeeaddict1 31 days ago
        > And please keep in mind that Blend2D is not really in development anymore - it has no funding so the project is basically done.
        That's such a shame. Thanks a lot for Blend2D! I wish companies were less greedy and would fund amazing projects like yours. Unfortunately, I do think that everyone is a bit obsessed with GPUs nowadays. For 2D rendering the CPU is great, especially if you want predictable results and avoid having to deal with the countless driver bugs that plague every GPU vendor.
      - miguel_martin 30 days ago
        That is fair - sorry for spreading mis-information! That's unfortunate to hear about Blend2D.
  - Asm2D 31 days ago
    Blend2D doesn't benchmark against GPU renderers - the benchmarking page compares CPU renderers. I have seen comparisons in the past, but it's pretty difficult to do a good CPU vs GPU benchmarking.
- pjmlp 31 days ago
  Not sure what you mean, it can make use of accelerated graphics,
  https://developer.apple.com/library/archive/documentation/Gr...
  [-]
  - xattt 30 days ago
    I’ve explored it for a few years, but all I could tell that it was never actually fully enabled. You can enable it through debugging tools, but it was never on by default for all software.
- willtemperley 31 days ago
  Quartz 2D is now CoreGraphics. It's hard to find information about the backend, presumably for commercial reasons. I do know it uses the GPU for some operations like magnifyEffect.
  Today I was smoothly panning and zooming 30K vertex polygons with SwiftUI Canvas and it was barely touching the CPU so I suspect it uses the GPU heavily. Either way it's getting very good. There's barely any need to use render caches.
- samiv 31 days ago
  The issue is not performance the issue is that pixel precise operations are difficult on the GPU using graphics features such as shaders.
  You don't normally work with pixels but you work with polygonal geometry (triangles) and the GPU does the pixel (fragment) rasterization.
  [-]
  - zozbot234 30 days ago
    Surely you could at least draw arbitrary rectilinear polygons and expect that they're going to be pixel perfect? After all the GPU is routinely used for compositing rectangular surfaces (desktop windows) with pixel-perfect results.
nubskr 31 days ago
Turns out the best GPU optimization is just being too scared of graphics drivers to do the fancy stuff, 10-15x faster and you can actually debug it.
larodi 31 days ago
Really, inst there anything which comes Slug-level of capabilities and is not super expensive?
[-]
- coffeeaddict1 31 days ago
  Vello [0] might suit you although it's not production grade yet.
  [0] https://github.com/linebender/vello
- miguel_martin 31 days ago
  Just use blend2d - it is CPU only but it is plenty fast enough. Cache the rasterization to a texture if needed. Alternatively, see blaze by the same author as this article: https://gasiulis.name/parallel-rasterization-on-cpu/
- hermet 29 days ago
  Check this out: https://thorvg-perf-test.vercel.app/ https://www.youtube.com/watch?v=jdnnzmtHy9k
- reallynattu 30 days ago
  ThorVG might be worth a look - open source (MIT), ~150KB core, GPU backends (WebGPU, OpenGL).
  We are using it as official dotLottie runtimes, now a Linux Foundation project. Handles SVG, Lottie, fonts, effects.
  https://github.com/thorvg/thorvg/
  [-]
  - coffeeaddict1 30 days ago
    In terms of performance, it's quite far from something like Blend2D or Vello though.
    [-]
    - hermet 30 days ago
      Blend2D is a CPU-only rendering engine, so I don't think it's a fair comparison to ThorVG. If we're talking about CPU rendering, ThorVG is faster than Skia. (no idea about Blend2d) But at high resolutions, CPU rendering has serious limitations anyway. Blend2D is still more of an experimental project that JIT kills the compatiblity and Vello is not yet production-ready and webgpu only. No point of arguing fast today if it's not usable in real-world scenarios.
      [-]
      - Asm2D 29 days ago
        How JIT kills compatibility if it's only enabled on x86 and aaarch64? You can compile Blend2D without it and it would just work.
        So no, it doesn't kill any compatibility - it only shows a different approach.
        BTW GPU-only renderers suck, and many renderers that have GPU and CPU engines suck when GPU is not available or have bugs. Strong CPU rendering performance is just necessary for any kind of library if you want true compatibility across various platforms.
        I have seen many many times broken rendering on GPU without any ability to switch to CPU. And the biggest problem is that more exotic HW you run it on, less chance that somebody would be able to fix it (talking about GPUs).
badlibrarian 31 days ago
Author uses a lot of odd, confusing terminology and brings CPU baggage to the GPU creating the worst of both worlds. Shader hacks and CPU-bound partitioning and choosing the Greek letter alpha to be your accumulator in a graphics article? Oh my.
NV_path_rendering solved this in 2011. https://developer.nvidia.com/nv-path-rendering
It never became a standard but was a compile-time option in Skia for a long time. Skia of course solved this the right way.
https://skia.org/
[-]
- exDM69 31 days ago
  > NV_path_rendering solved this in 2011.
  By no means is this a solved problem.
  NV_path_rendering is an implementation of "stencil then cover" method with a lot of CPU preprocessing.
  It's also only available on OpenGL, not on any other graphics API.
  The STC method scales very badly with increasing resolutions as it is using a lot of fill rate and memory bandwidth.
  It's mostly using GPU fixed function units (rasterizer and stencil test), leaving the "shader cores" practically idle.
  There's a lot of room for improvement to get more performance and better GPU utilization.
- bsder 31 days ago
  While the author doesn't seem to be aware of state of the art in the field, vector rendering is absolute NOT a solved problem whether on CPU or GPU.
  Vello by Raph Levien seems to be a nice combination of what is required to pull this off on GPUs. https://www.youtube.com/watch?v=_sv8K190Zps
  [-]
  - lukan 31 days ago
    Yeah, I have high hopes for Vello to take off. I could throw away lots of hacks and caching and whatnot if I could do fast vector rendering reliable on the GPU.
    I think Rive also does vector rendering on the GPU
    https://rive.app/renderer
    But it is not really meant (yet?) as a general graphics libary, but just a renderer for the rive design tools.
    [-]
    - pier25 31 days ago
      AFAIK you can use the Rive renderer in your C++ app.
      http://github.com/rive-app/rive-runtime
  - bean469 31 days ago
    > While the author doesn't seem to be aware of state of the art in the field
    The blog post is from 2022, though
- Asm2D 31 days ago
  You know nothing.
  Skia is definitely not a good example at all. Skia started as a CPU renderer, and added GPU rendering later, which heavily relies on caching. Vello, for example, takes a completely different approach compared to Skia.
  NV path rendering is a joke. nVidia though that ALL graphics would be rendered on GPU within 2 years after making the presentation, and it took 2 decades and 2D CPU renderers still shine.
  [-]
  - nicoburns 31 days ago
    I believe Skia's new Graphite architecture is much more similar to Vello
    [-]
    - badlibrarian 31 days ago
      Right. The question is does Skia grows its broad and useful toolkit with an eye toward further GPU optimization? Or does Vello (broadened and perhaps burdened by Rust and the shader-obsessive crowd) grow a broad and useful API?
      There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution, but I'll leave those discussions to dark-gray Discord forums rendered by Skia in a browser.
      [-]
      - coffeeaddict1 31 days ago
        > There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution
        IMO, one of biggest benefit of a high performance renderer would be power savings (very important for laptops and phones). If I can run the same work but use half the power, then by all means I'd be happy to deal with the complications that the GPU brings. AFAIK though, no one really cares about that and even efforts like Vello are just targeting fps gains, which do correlate with reduced power consumption but only indirectly.
        [-]
        Asm2D 31 days ago
        Adding a power draw into the mix is pretty interesting. Just because a GPU can render something 2x faster in a particular test doesn't mean you have consumed 50% less power, especially when we talk about dedicated GPUs that can have power draw in hundreds of watts.
        Historically 2D rendering on CPU was pretty much single-threaded. Skia is single-threaded, Cairo too, Qt mostly (they offload gradient rendering to threads, but it's painfully slow for small gradients, worse than single-threaded), AGG is single-threaded, etc...
        In the end only Blend2D, Blaze, and now Vello can use multiple threads on CPU, so finally CPU vs GPU comparisons can be made more fairy - and power draw is definitely a nice property of a benchmark. BTW Blend2D was probably the first library to offer multi-threaded rendering on CPU (just an option to pass to the rendering context, same API).
        As far as I know - nobody did a good benchmarking between CPU and GPU 2D renderers - it's very hard to do completely unbiased comparison, and you would be surprised how good the CPU is in this mix. Modern CPU cores consume maybe few watts and you can render to a 4K framebuffer with that single CPU core. Put rendering text to the mix and the numbers would start to be very interesting. Also GPU memory allocation should be included, because rendering fonts on GPU means to pre-process them as well, etc...
        2D is just very hard, on both CPU and GPU you would be solving a little bit different problems, but doing it right is insane amount of work, research, and experimentation.
        [-]
        nicoburns 31 days ago
        It's not a formal benchmark, but my Browser Engine / Webview (https://github.com/DioxusLabs/blitz/) has pluggable rendering backends (via https://github.com/DioxusLabs/anyrender) with Vello (GPU), Vello CPU, Skia (various backends incl. Vulkan, Metal, OpenGL, and CPU) currently implemented
        On my Apple M1 Pro, the Vello CPU renderer is competitive with the GPU renderers on simple scenes, but falls behind on more complex ones. And especially seems to struggle with large raster images. This is also without a glyph cache (so re-rasterizing every glyph every time, although there is a hinting cache) which isn't implemented yet. This is dependent on multi-threading being enabled and can consume largish portions of all-core CPU while it runs. Skia raster (CPU) gets similarish numbers, which is quite impressive if that is single-threaded.
        [-]
        Asm2D 30 days ago
        I think Vello CPU would always struggle with raster images, because it does a bounds check for every pixel fetched from a source image. They have at least described this behavior somewhere in Vello PRs.
        The obsession for memory safety just doesn't pay off in some cases - if you can batch 64 pixels at once with SIMD it just cannot be compared to a per-pixel processor that has a branch in a path.
        badlibrarian 31 days ago
        It's an argument you can make in any performance effort. But I think the "let's save power using GPUs" ship sailed even before Microsoft started buying nuclear reactors to power them.
- sirwhinesalot 31 days ago
  So what is the right way that Skia uses? Why is there still discussion on how to do vector graphics on the GPU right if Skia's approach is good enough?
  Not being sarcastic, genuinely curious.
  [-]
  - cyberax 31 days ago
    The major unsolved problem is real-time high-quality text rendering on GPU. Skia just renders fonts on the CPU with all kinds of hacks ( https://skia.org/docs/dev/design/raster_tragedy/ ). It then renders them as textures.
    Ideally, we want to have as much stuff rendered on the GPU as possible. Ideally with support for glyph layout. This is not at all trivial, especially for complex languages like Devanagari.
    In the perfect world, we want to be able to create a 3D cube and just have the renderer put the text on one of its facets. And have it rendered perfectly as you rotate the cube.