How We Made IPFS Content Publishing 10x Faster

(probelab.io)

138 points | by dennis-tra 8 hours ago

7 comments

  • embedding-shape 7 hours ago
    > Return control back to the user after most (not all) of the PUT RPCs have succeeded and continue with the remaining ones in the background.

    Making things faster by doing less (and not the same) been speeding up computing since forever! Can't help but feel like it's slightly misleading to call the providing ("publishing") faster when it's not actually doing the same, it's just that most parts turned async instead of waiting for confirmation.

    Wouldn't this lead to the problem where the user things everything been provided properly, but once others try to find it, the records haven't yet been published? As far as I understand, it'd still take mostly the same amount of time until the entire CID (not just some of them) are available to others, the only thing that got "faster" is the end-user UX of the one providing?

    • Groxx 6 hours ago
      The "Early Return" sections describe it more, I don't think it's as bad as it sounds in that first bullet. They're returning after 15 out of 20 complete,and it sounds like even if only those 15 end up succeeding it'll still generally be fine. (Exactly how fine / is that violating some common expectations and will cause problems: I dunno. Not familiar enough with IPFS's internals)

      That said:

      >In practice, at least one of the 20 follow-up requests fails in the vast majority of operations, and a single unresponsive peer can stall the entire phase waiting for a timeout.

      It continually surprises me how often systems lack a Fast Fallback-like strategy¹. Or at least sound like it. Just an absolute flood of apps and websites and systems that try to do something once and then never tried an alternate route until that finishes, something like a minute or two later... for a process that usually takes less than a second. It's maddening. By the time you're considering one to be "stalled" and delaying everything unnecessarily, you probably should've already started trying two or three alternate routes!

      https://wikipedia.org/wiki/Happy_Eyeballs

      • WorldMaker 4 hours ago
        > (Exactly how fine / is that violating some common expectations and will cause problems: I dunno. Not familiar enough with IPFS's internals)

        I felt the article addressed that a bit further down. 20 copies is a somewhat arbitrary knob in the Kademlia DHT design IPFS is based on and this lab's research suggested that 15 was probably closer to good enough for GET requests to succeed at about the same time cost. Rather than dropping the knob for the entire DHT, because redundancy is always useful in the long run they went with the Early Return and a secondary process called the Reprovide Sweep that still tries to push the network towards the 20 live copies minimum it desires.

        I'm assuming the Reprovide Sweep was work previously done/documented because it seems like something that might have been more interesting to discuss at longer length in relevant parts of the article.

    • pocksuppet 6 hours ago
      As far as I understand, the producer is publishing to the 20 nearest nodes it finds, but the consumer is also searching the 20 nearest nodes it finds, and there is quite a big safety margin built into that number 20. Almost all consumers should still be able to find your object once it has only published to 10 or 15.

      This is a probabilistic system anyway. Even if publication finishes to 20 nodes, why is that enough to return to the caller? Shouldn't it be 30, or 50, just in case?

      I'd say it makes sense to return control once zero PUTs have been made and do the whole thing in the background, to avoid serializing operations that usually don't need to be serialized, such as publishing multiple objects.

  • boramalper 5 hours ago
    Is anyone still (or has anyone ever) used IPFS in production?

    I’m not talking about technology demos such as Wikipedia-on-IPFS (which indeed worked and was impressive) but where IPFS is actually being relied on for some functionality.

    • OneDeuxTriSeiGo 10 minutes ago
      It's not literally IPFS but atproto/bluesky is using most of the bones of IPFS (IPLD) to do their entire data propagation and event broadcasting.

      And tbh it shouldn't be terribly difficult to extend the existing infra to supporting a full IPFS based system but I don't think anybody has considered it worthwhile yet.

      ATproto uses just the bits it immediately needs even if it could probably benefit from the other parts long term (ex for archival relay stream preservation).

    • ValdikSS 1 hour ago
      I use it for about 5 years, to publish javascript file (proxy auto-configuration) and serve the contents over different gateways.

      It is a huge server traffic saver.

      Used to use it to host different static websites on custom domains while Cloudflare's gateway was working, stopped using it for this purpose since its sunset in 2024.

      Neocities used to publish their websites over IPFS, this feature got broken (for years) and finally got removed: https://github.com/neocities/neocities/issues/352

      • ValdikSS 1 hour ago
        Also used it several times for almost-live video broadcasts, served over cloudflare ipfs gateway while it was working.

        It used HLS with .ts files, in a special way which circumvented cloudflare protection against .mp4 and .ts files, or something along the lines. Don't remember the details, but it was a cheap way to deliver your stream to any video player using standard protocols only.

    • phae 12 minutes ago
      the thing that prevented me using ipfs in anger.. (granted i may not have looked hard enough) was that i couldn't have stuff in ipfs, and access it via posix filesystem at he same time. i'd have to store things twice.

      fine for publishing, but not for having a live data set that is both used and published at the same time, as you can do with torrent.

    • ydj 4 hours ago
      At meta, there was a project for delivering binaries of internally built libraries / binaries to dev laptops using a private ipfs network. This was live for at least some period of time.
      • boramalper 3 hours ago
        Very interesting! I wonder if it’s still live and there is any writing on it?
    • errpunktjose 5 hours ago
      https://swap.cow.fi uses it for order metadata registering iirc
    • MattCruikshank 5 hours ago
      It doesn't seem like it's popular to put old game ROMs on IPFS...? And that surprises me...
      • boramalper 5 hours ago
        And why would you do that? As opposed to, say, distributing via BitTorrent or serving them using a good-old HTTP server?

        edit: Not opposed to the idea, just curious what makes you pick IPFS over the existing alternatives.

        • ValdikSS 1 hour ago
          IPFS (at least initially) was designed to be a BitTorrent replacement, a new version of it, which you can use not only with a special software, but also via HTTP and also directly inside the browser.

          It basically works as BitTorrent, but also provides HTTP access to the files.

          In fact, many pirate websites use IPFS in one way or another (either directly, by serving the downloads over one of the public gateway, or indirectly, for internal needs).

          • Gigachad 1 hour ago
            Couldn't you also just build a bittorrent client that hosts a local webserver to provide http access? I could never get what IPFS actually did that bittorrent didn't.
            • ValdikSS 51 minutes ago
              1. IPFS has addressable content

              The main drawback of BitTorrent v1 file hashing scheme inside the .torrent file is that it makes a virtual stream of the directory you want to share, splits that stream into blocks, and hash it altogether. That means that each file inside the torrent is unique, even if your directory have very common files, each is which exist in BitTorrent network per se.

              IPFS solves this issue by just hashing the contents of the file. The "directory" of the files is a list of each individual file. If somebody has created a directory of 100 mp3 files and you happen to have one of it and added it into your IPFS daemon, you will be serving this file even if you've downloaded it from elsewhere (not from this "directory").

              2. IPFS has both immutable and mutable files and directories

              In BitTorrent, if you want to update your torrent file (in Russian we use «раздача» (bittorrent "upload"/"seeding"), a very precise word for bittorrent which doesn't have direct equivalent in English, so I'll stick to "torrent file), all the swarm needs to be switched to the updated version due to #1.

              This means you have split swarm: the majority of people still seed the old version of torrent, and the minority seed the new version. Because of non-content addressation, all the old, already exising files in new .torrent file are treated as "new", and the old swarm can't seed them.

              In IPFS, you can either create new immutable directory with all the old files and 1 new file, and all the old files would be seeded by the existing peers, or you can create mutable directory, and you can just modify it to your like without the need to update the link.

              ---

              Both of these issues are solved in BitTorrent v2 more or less, but it's still not very popular, even if the specification is from 2017.

              IPFS however is much more featureful than that. It allows to build decentralized distributed (serverless) websites and services, with user data and such.

              There used to be a ZeroNet project which directly aimed at decentralized distributed web services, and is was very cool, there were many blogs, forums, boards. It all could be implemented in IPFS, but I saw only very simple text editors over IPFS and such, much simpler applications than it was in ZeroNet.

              BitTorrent had their own version of distributed websites, Maelstrom: https://www.ctrl.blog/entry/bittorrent-maelstrom.html

        • topgrain2 3 hours ago
          The idea of simply mounting a filesystem and selecting from a list of titles which roms to download and add to your local games, unloading them and transparently re-downloading when you need to free up space, all without relying on a centralized host even for the file index, is pretty appealing. You can do similar things with torrents but it's not quite as "natural".

          Most of the emulator frontends I've seen are pretty against integrating this kind of ease-of-piracy stuff, though, accepting recognizing and filling in metadata for well-known roms, but not making it easy to integrate with remote libraries of roms... except tools that run on "hacked" consoles, which seem to love just giving you a list of games with a "tap A/X to pirate" UI.

          • Gigachad 1 hour ago
            It's because that crosses the line of plausibly legal. In theory you could use an emulator with only titles you copied from your own physical copies which is legal. But if they implement a download mechanism it's clearly illegal.

            At any rate you can replicate the same thing by just hosting the ROMs on your own cloud storage and using something like macos virtual files which will do this transprent download/delete to manage storage.

          • boramalper 3 hours ago
            > The idea of simply mounting a filesystem

            You can use fuse-btfs [0] for mounting torrents as filesystems! Last I checked it was a fairly mature piece of software so hopefully it doesn’t feel unnatural.

            [0] https://github.com/johang/btfs

        • darkwater 3 hours ago
          Maybe fear of Nintendo coming to bite you?
    • Borg3 4 hours ago
      Yeah.. IPFS is a bit disappointement. I was a bit exceited about it back in the day. Recently, I wanted to download sth large from archive.org, I used torrent (and my legacy torrent client) and it worked like a charm!

      It seems pure HTTP tracker + Torrent is good enough.

    • preisschild 30 minutes ago
      the containerd stargz snapshotter has an IPFS integration so you can use IPFS instead of a traditional OCI registry to store your OCI containers

      Not using it in production but i found it pretty cool to test

    • pixel_popping 5 hours ago
      It's funny because even in Piracy, IPFS has never really taken off and that's a massive use case.
      • boramalper 5 hours ago
        It slowly was taking off—e.g. Library Genesis on IPFS[0]—but then IPFS introduced Bad Bits Denylist [1] which killed it on arrival.

        [0] https://freeread.org/ipfs.html

        [1] https://badbits.dwebops.pub/

        • RobotToaster 3 hours ago
          Suddenly it looks a lot less decentralised.
          • msm_ 20 minutes ago
            I don't know, it looks pretty decentralised to me?

            >The purpose of this list is to allow IPFS node operators (e.g. someone running a public IPFS gateway) to opt into not hosting previously flagged content.

            IPFS node operators, who are supposedly interested in hosting malicious content (and i2p-hosted phishings are a real problem) can OPT INTO using this list.

            In this case, I don't see how that's any problem for piracy - people can just use one of the bad/unfiltered nodes.

          • throwaway8388 2 hours ago
            Well, badbits are only enforced on the centralized http gateways. LibGen CIDs would still resolve fine using the DHT as the decentralised discovery mechanism
      • frollogaston 5 hours ago
        Also public key lists like what Whatsapp now publishes
    • frollogaston 5 hours ago
      NFT artwork, if you count that. Briefly checked, the ones that were traded for the most were using IPFS rather than HTTP. But I also don't trust that these aren't self-wash sales (easy given the "NF" part), also NFTs are dumb.
      • boramalper 5 hours ago
        I don’t think NFTs (should) count: My first impressions of web3 by Moxie Marlinspike

        https://moxie.org/2022/01/07/web3-first-impressions.html

        • tehjoker 19 minutes ago
          His statement at the end was pretty interesting:

          "If we do want to change our relationship to technology, I think we’d have to do it intentionally. My basic thoughts are roughly:

              We should accept the premise that people will not run their own servers by designing systems that can distribute trust without having to distribute infrastructure. This means architecture that anticipates and accepts the inevitable outcome of relatively centralized client/server relationships, but uses cryptography (rather than infrastructure) to distribute trust. One of the surprising things to me about web3, despite being built on “crypto,” is how little cryptography seems to be involved!
              We should try to reduce the burden of building software. At this point, software projects require an enormous amount of human effort. Even relatively simple apps require a group of people to sit in front of a computer for eight hours a day, every day, forever. This wasn’t always the case, and there was a time when 50 people working on a software project wasn’t considered a “small team.” As long as software requires such concerted energy and so much highly specialized human focus, I think it will have the tendency to serve the interests of the people sitting in that room every day rather than what we may consider our broader goals. I think changing our relationship to technology will probably require making software easier to create, but in my lifetime I’ve seen the opposite come to pass. Unfortunately, I think distributed systems have a tendency to exacerbate this trend by making things more complicated and more difficult, not less complicated and less difficult."
          
          Funnily enough, later that year ChatGPT came out and blew away the excitement around cryptocurrencies by making software easier to manufacture to some degree. Though even with the latest LLM tools, I don't think this has changed at all so far: "Even relatively simple apps require a group of people to sit in front of a computer for eight hours a day, every day, forever." Maybe those people can program by texting from the coffee machine, but they're still working.
        • benatkin 2 hours ago
          Moxie doesn't trash NFTs or Web3 in that article. He just points out some limitations of the ecosystem.

          Also, ipfs directly fixes one of the bigger issues:

          > Instead of storing the data on-chain, NFTs instead contain a URL that points to the data.

          If it's ipfs, it points to the content. If it's ipns, it points to a changeable link to the content, but one that is made consistent through the network, preventing the trick of making it differ based on the referrer.

  • hannesfur 5 hours ago
    Having worked on libp2p‘s DHT (Double Hashing for rust-libp2p) for a bit two years ago, it’s really great to see that there are improvements. To get to CDN level speeds though on dense networks, I still see it as an architectural flaw to not somehow encode network topology into the PeerID / identity in the DHT. A start would be to use the five RIRs. If you want to be more sophisticated, and I spent a lot of time theorising about this, you could have a dezentrally governed anycast IP address of Geo DNS to bootstrap new peers into their neighbourhood and couple that into their DHT identity. But do you want to put BGP into the hands of a decentralised system? Could you even do it in the governance structure of the internet?

    Btw when we were working on our project HyveOS, we used Batman-advs routing table to quickly (really really quick) bootstrap new peers into the system.

    Ah… sometimes i really miss working on this.

  • someonebaggy 7 hours ago
    Is it also possible to speed up lookup? I never used IPFS much as it took several minutes to find a cid.
  • davidwritesbugs 5 hours ago
    Slightly tangential to the article, which seems interesting, but the main issue with IPFS was the horrendous performance of clients which I seem to recall related to having a refresh storms, sparse routing tables, unreachable peers as well as lookup speeds. Mostly the reputation was so bad that people didn't bother with it, I dismissed it for my own project. If your only users are crypto-grift projects you're in a bad place.
  • nekusar 6 hours ago
    Are the defaults still leaking your whole internal and external IP allocations to the dHT still?

    Its security posture was absolutely fucking gross the last time I reviewed it.

    And of course, there's a shitcoin bolted on as well. Last thing I want to do is feed into FileCoin. Of course, everything new these days has some financial interaction crap bolted on to entice speculators and ilk.

  • catapart 6 hours ago
    I'll add to the "is it still...?" questions.

    Last I was told about it, there was no way to delete stuff from IPFS. Nothing enforceable, at least. Setting aside that public stuff is "impossible" to delete on the internet, there's something appealing to me about being able to shut off my server. Feels like that is less possible with IPFS hosted content.

    Does anyone have some perspective for me about removing content?

    • deno 6 hours ago
      Imagine you created a torrent (and/or magnet link) with a file and then stopped seeding after some time. If it was popular it will probably live on, if not then eventually it disappears.
      • catapart 5 hours ago
        Thanks! Yeah, I kind of figured that was still the case. Not really any use cases I have that I would feel comfortable with that paradigm, but I'm glad it's available!
        • somat 5 hours ago
          Is that not the same with anything published to the internet. For example I could keep your comment published and available for as long as I had interest in doing so despite any effort you may take to remove it from HN. I mean I guess tech like ipfs and bittorrent try to automate this process(keeping something on the internet as long as there is interest) but you let something out on the internet it could stay there a long while. Or it could go poof and disappear, it depends on how much interest there is in the subject.
          • iamnothere 1 hour ago
            True, and in fact HN will not allow you to remove your past comments after the delete period has passed. People who are obsessed with deleting their past history should probably not be posting here! (Not even a GDPR request will help you, for those in Europe.)