On-device inference with Chrome's Prompt API and React

After diving into Firebase AI Logic for a video on the Firebase YouTube channel, I wanted to step back and learn about how AI Logic’s hybrid inference works under the hood. In this post, I’ll explore using Chrome’s built-in Prompt API to use Chrome’s provided on-device model with React components.

Want to see my AI Logic video?

Caveats of the on-device model

It’s just cool to be able to run the local model (Gemini Nano) so easily on an end-user’s device. At the moment, though, there are a lot of caveats:

Browser compatibility

This is a Chrome-only API for now, and not yet rolled out to general availability in Chrome. It requires signing your site up for an origin trial to enable it for your users. I’ve signed my site up in order to make the interactive demo below work.

You can see the full standardization details at the Chrome Platform Status page for the Prompt API.

Device capability

Even among users that visit your site with Chrome, there will be variability in quality. Chrome will use different models based on the device’s GPU capabilities.

This means on-device inference could work great when I run it locally, but may break for a user on a lower-end device.

Model download

Chrome also doesn’t ship with the model installed, meaning using the Prompt API could trigger a multi-GB download for a user.

Thankfully, Chrome requires an unmetered internet connection for model download, and will stop the download if a metered connection is detected. But still, as a web developer that’s used to caring about shaving KB off of a site, I’ll always feel guilty kicking off a download this big.

If the Prompt API becomes more widely used, it becomes more and more likely that a user already has the model installed from another site that kicked it off, but at the moment it’s a big blocker to the local inference user experience.

Demo

With all the caveats behind me, it’s time to try it out!

Each section also contains TypeScript source code for the interactive component. I’m getting types from @types/dom-chromium-ai.

Check compatibility

Here is the output of LanguageModel.availability() for your browser:

View source

export function usePromptApiAvailability(): "loading" | Availability {
  if (!("LanguageModel" in window)) {
    return "unavailable";
  }

  const [availability, setAvailability] = useState<"loading" | Availability>(
    "loading",
  );
  useEffect(() => {
    LanguageModel.availability().then((availability) =>
      setAvailability(availability),
    );
  }, []);

  return availability;
}

Download the model

Remember that caveat about how big the model download can be? Proceed at your own risk!

View source

export function ModelDownloader({
  availability,
}: {
  availability: Availability;
}) {
  // Return early if we don't need to start the download
  if (availability === "unavailable") {
    return <span>A local model is not available in your browser.</span>;
  } else if (availability === "available") {
    return <span>Model is ready.</span>;
  }

  const [progress, setProgress] = useState<"not-started" | number>(
    "not-started",
  );

  async function handleDownload() {
    if (availability === "downloading") {
      setProgress(5);
    } else {
      setProgress(0);
    }
    await LanguageModel.create({
      monitor(m) {
        m.addEventListener("downloadprogress", (e) => {
          setProgress(Math.round(e.loaded * 100));
        });
      },
    });
    setProgress(100);
  }

  if (progress === "not-started") {
    const buttonPrompt =
      availability === "downloading"
        ? "Continue model download"
        : "Download model";
    return <button onClick={handleDownload}>{buttonPrompt}</button>;
  } else if (progress < 100) {
    return (
      <>
        <label htmlFor="progress-bar">Downloading model...</label>
        <progress id="progress-bar" value={progress} max="100">
          {progress}%
        </progress>
      </>
    );
  }

  return <span>Model is ready.</span>;
}

Interact with the model

Enter a prompt to get a response from Gemini Nano running locally on your machine (if you’re on a compatible browser and have the model downloaded).

View source

export function PromptLocalModel({
  availability,
}: {
  availability: Availability;
}) {
  if (availability !== "available") {
    return <span>No model available.</span>;
  }

  const [output, setOutput] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);

  useEffect(() => {
    console.log({ isStreaming });
  }, [isStreaming]);

  async function handleSubmit(formData: FormData) {
    const prompt = formData.get("prompt") as string;
    if (!prompt) return;

    // Calling setOutput a bunch of times in the for await loop
    // seems to cause these state updates to get ignored, so
    // this forces them to render immediately
    flushSync(() => {
      setIsStreaming(true);
      setOutput("");
    });

    performance.mark("send-prompt-to-model");
    try {
      const session = await LanguageModel.create();
      const stream = session.promptStreaming(prompt);
      for await (const chunk of stream) {
        setOutput((prev) => prev + chunk);
      }
    } catch (e) {
      setOutput(`Error: ${e instanceof Error ? e.message : "unknown"}`);
      console.error(e);
    } finally {
      console.log("finally");
      setIsStreaming(false);
    }
    performance.mark("model-sent-response");

    const { duration } = performance.measure(
      "model-response-time",
      "send-prompt-to-model",
      "model-sent-response",
    );
    console.log(`completed in ${(duration / 1000).toPrecision(3)} seconds`);
  }

  return (
    <>
      <form action={handleSubmit}>
        <textarea name="prompt" rows={4} placeholder="Enter a prompt..." />
        <br />
        <button type="submit" disabled={isStreaming}>
          {isStreaming ? "Generating..." : "Submit"}
        </button>
      </form>
      {output && <Markdown>{output}</Markdown>}
    </>
  );
}

Summary

The Prompt API has a lot of caveats right now, but it is also an exciting glimpse into a future where running a local AI model in a web app only takes a few lines of code. And, it’s fun to experiment with as part of the origin trial.

For production web apps, hybrid inference with Firebase AI Logic is a safer bet to ensure consistent quality across all browsers. Maybe with Remote Config to swap the mode between on-device and Cloud models as you experiment.