I love this idea. Unfortunately, it says "Unsupported browser/GPU" for me. This is Desktop Chrome version 147 (page says it requires 134+) and I have a 1060 card with 6 GB of RAM on this specific device, so it should fit. I have more than 4 GB of free RAM as well.
firefox has webgpu already, but the subgroups extension isn't in yet. every matmul / softmax kernel here leans on subgroupShuffleXor for reductions, that's the blocker. same reason mlc webllm and friends don't run on firefox either. once mozilla ships it this should work
so multiple of these browser wasm demos make me re-download the models, can someone make a cdn for it or some sort u uberfast downloader? just throw some claude credits against it ty!
CDN wouldn't help much. These days browsers partition caches by origin, so if two different tools (running on different domains) fetch the same model from the CDN, the browser would download it twice.
Adding a file input where users can upload files to the frontend directly from their file manager would probably work as a stop-gap measure, for the ones who want something quick that let people manage their own "cache" of model files.
Would you be okay with it using your upload at the same time, then a p2p model would work. (This is potentially a good match for p2p because edge connections are very fast, they don't have to go across the whole Internet). You could be downloading from uploaders in your region. Let me know if you would be okay with uploading at the same time, then this model works and I can build it for you for people to use this way.
Ah let me clarify, many of the in the browser demos make me download certain models even if I already have them It would be great if there was a way that I don't have to redownload them across demos so that I just have a cache. or an in browser model manager. hope this makes sense.
Or indeed use some sort of huggingface model downloader (if that exist with XET)
Or indeed use some sort of huggingface model downloader (if that exist with XET)