The Prompt API

(developer.chrome.com)

26 points | by gslin 2 hours ago

9 comments

  • avaer 51 minutes ago
    It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search. The main win is that it's free and privacy preserving, and (mostly) transparent to users in that they don't have to do anything, which is great for giving non-technical users local inference without making them do scary native things.

    But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back. That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.

    • Yokohiii 42 minutes ago
      > That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.

      Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.

    • subhobroto 35 minutes ago
      > It works, I've shipped this as a "local inference"/poor person's ollama for low-end llm tasks like search

      fantastic!

      > the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back

      sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?

      that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?

      also, any idea what the on disk impact is?

  • jameslk 44 minutes ago
    Seems like a good way for a rogue JS script to offload token generation to a bunch of unsuspecting visitors

    It would actually be pretty interesting to see if its possible to decentralize the compute to generate something useful from a larger prompt broken down and sent to a bunch of browsers using a subagent pattern or something like RLM, each working on a smaller part of the prompt

    • varun_ch 20 minutes ago
      This feels like a lot of work for low reward, the technical/business infrastructure would be wild. And if anyone wants to offload their prompts to users browsers, they might as well just use the Chrome API correctly? How many server side prompts would realistically be useful to offload to a low end model like this?

      Plus even if you really wanted to do that, WebGPU exists and has for a while right?

      • jameslk 11 minutes ago
        > How many server side prompts would realistically be useful to offload to a low end model like this?

        There's a lot of ways this API could go, e.g. more powerful models eventually, or perhaps integration with cloud models. For example, I could see Google trying to default Gemini as the model for users signed into Chrome

        • varun_ch 7 minutes ago
          I think we’ll get more powerful models when they become reasonable to run on regular people’s computers, in which case the compute costs would hopefully fall enough that people don’t need to resort to this kind of weird stuff.

          As for cloud models, that would be interesting, although I guess then the fraud would be easier in spoofing whatever parameters (ip address? domain name? some Chrome install identifier?) to get around whatever rate limiting they come up with, rather than actually using people’s computers.

          Anyways I’m sure if it ends up being abused, they can throw a permissions dialog in front of it. Just need to figure out a way to make normal people understand.

  • gorgoiler 15 minutes ago
    Imagine a Vendor API that adds a way to link from the page straight into a device purchase workflow. As a trial of the API in Chrome you can order a new Google Pixel 9b directly from any page with the word Android in it!

    Or a LocalNet API that integrates with trusted hardware devices on your local network. As a trial (Chrome beta programme — strictly limited but here’s 3x signup links to share with your friends) you can adjust your Google Next Mini underfloor heating directly from Chrome!

    Or a DirectCast API that lets you stream <video> elements to a device of your choice even over a VPN. As a Chrome trial, you can use your Google Cloud account to stream directly from YouTube Premium to any linked Google Chromecast devices you own!

  • nl 42 minutes ago
    The model this uses is useless for anything beyond 2 round chat at the most.

    If you want to do anything interesting you need transformers.js and a decent mode. Qwen 0.9B is where things start working usefully

  • skybrian 54 minutes ago
    Still in origin trial? Looks like they're adding a temperature parameter:

    https://chromestatus.com/feature/6325545693478912

  • fg137 54 minutes ago
    "sorry, to use our website, you must have at least 22 GB of free disk space."
  • iggerews 45 minutes ago
    [dead]
  • iggerews 46 minutes ago
    [dead]