在前端实现 AI 推理

前言

Transformers.js 是一个机器学习工具库，允许开发者在浏览器中直接运行 Hugging Face 的预训练模型，无需服务器支持。它支持多种任务，如自然语言处理、计算机视觉、音频处理等。本文将介绍 Transformers.js 的基本使用方法，并通过示例展示如何在不同任务中使用它。

Transformers.js

目前，Transformers.js 有两个 npm 包：@xenova/transformers 和 @huggingface/transformers。这两个包实际上是同一个仓库，最初由 xenova 开发并维护在个人仓库中。在 v3.0.0 版本时，该项目被转移到 Hugging Face 组织下，并进行了重大更新，包括对 WebGPU 的支持、新模型和任务的引入、新量化方法、以及对 Deno 和 Bun 的兼容性。

安装

通过 npm 安装

1	`npm install @huggingface/transformers`

通过 CDN 引入

1
2
3

<script type="module">
    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.3.1';
</script>

API 介绍

Transformers.js 提供了丰富的 API，但本文将重点介绍两个核心方法：pipeline 和 <Task>Pipeline，它们足以满足大多数推理任务的需求。

pipeline

pipeline 方法接收三个参数：task（任务类型）、model（模型 ID）、options（其他可选参数）。

1
2
3

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('image-to-text', 'Xenova/llama2.c-stories15M', { device: 'webgpu' });

<Task>Pipeline

pipeline 返回一个 Promise<AllTasks[T]>，其中 AllTasks[T] 是 <Task>Pipeline 的实例。例如，pipeline 执行 image-to-text 任务时，返回的是一个 ImageToTextPipeline。

1	`const imageToText = await pipeline('image-to-text');`

返回的 <Task>Pipeline 实例的第一个参数是输入文本，第二个参数是配置选项。

1	`const output = await imageToText('http://xxx.xx.xx/xxx.jpg', { more: '' });`

使用示例

文本生成

代码：

以下代码使用了 Xenova/llama2.c-stories15M 模型进行文本生成。该模型的推理速度较快，生成的文本质量也较好。

import { pipeline } from 'https://unpkg.com/@huggingface/transformers/dist/transformers.js';

(async function run() {
  const pipe = await pipeline('text-generation', 'Xenova/llama2.c-stories15M');
  const output = await pipe("Tell a joke", { max_new_tokens: 200 });
  console.log(output.map(x => x.generated_text).join('\n'));
})();

推理结果：

翻译

代码：

使用 Xenova/opus-mt-en-zh 模型进行翻译。该模型的推理速度较快，但翻译效果一般，尤其是对于长句子和专有名词的处理可能不够准确。

(async function run() {
  const pipe = await pipeline('translation', 'Xenova/opus-mt-en-zh');
  const output = await pipe("why are you so angry?");
  console.log(output.map(x => x.translation_text).join('\n'));
})();

输出结果：

1	`你为什么这么生气?`

文本总结

代码：

使用 Xenova/distilbart-cnn-6-6 模型进行文本总结。

(async function run() {
  const pipe = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6');
  const text = `
  Run 🤗 Transformers directly in your browser, with no need for a server!
  Transformers.js is designed to be functionally equivalent to Hugging Face’s transformers python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as:
  📝 Natural Language Processing: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.
  🖼️ Computer Vision: image classification, object detection, segmentation, and depth estimation.
  🗣️ Audio: automatic speech recognition, audio classification, and text-to-speech.
  🐙 Multimodal: embeddings, zero-shot audio classification, zero-shot image classification, and zero-shot object detection.
  Transformers.js uses ONNX Runtime to run models in the browser. The best part about it, is that you can easily convert your pretrained PyTorch, TensorFlow, or JAX models to ONNX using 🤗 Optimum.
  `;
  const output = await pipe(text);
  console.log(output.map(x => x.summary_text).join('\n'));
})();

推理结果：

The best part of the browser is that you can run your pretrained models using ONNX.js. The best thing to do is run yourpretrained models to ONNx using OnnX. The browser can run models using the same API as the Python library.

图生文

代码：

使用 Mozilla/distilvit 模型进行图像到文本的生成。该模型的推理速度较快，但生成效果一般。

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Document</title>

  <style>
    /* 整体容器 */
    body {
      font-family: Arial, sans-serif;
      padding: 20px;
      background-color: #f9f9f9;
    }

    /* 文件输入框容器 */
    .file-upload-container {
      margin-bottom: 20px;
      text-align: center;
    }

    .file-upload-container input {
      padding: 10px;
      font-size: 16px;
    }

    /* 图像和输出框容器 */
    .content-container {
      display: flex;
      justify-content: center;
      align-items: flex-start;
      gap: 20px;
    }

    /* 图像容器 */
    .image-container {
      width: 50%;
      max-width: 700px;
      overflow: hidden;
      padding: 0 10px;
    }

    .image-container img {
      width: 100%;
      height: auto;
      display: block;
      border: 1px solid #ccc;
      border-radius: 8px;
      box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
    }

    .output-container {
      width: 50%;
    }

    /* 输出框 */
    .output-container textarea {
      width: 100%;
      height: 100%;
      padding: 10px;
      font-size: 14px;
      border: 1px solid #ccc;
      border-radius: 8px;
      resize: none;
      box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
    }
  </style>
</head>

<body>
  <div>
    <!-- 文件输入框 -->
    <div class="file-upload-container">
      <input id="file-upload" type="file" />
    </div>

    <!-- 图像和输出框 -->
    <div class="content-container">
      <!-- 原图容器 -->
      <div class="image-container">
        <h3>加载图片:</h3>
        <img id="img" alt="Uploaded Image" />
      </div>
      <!-- AI 输出解释框 -->
      <div class="output-container">
        <h3>AI 输出：</h3>
        <textarea id="output" rows="10"></textarea>
      </div>
    </div>
  </div>

  <script type="module">
    import { pipeline } from 'https://unpkg.com/@huggingface/transformers/dist/transformers.js';
    const fileInput = document.getElementById('file-upload');
    const img = document.getElementById('img');
    const textarea = document.getElementById('output');

    async function handleUpload(evt) {
      const file = evt.target.files[0];
      if (!file) return;
      textarea.value = '';
      // 将图片转换为 url
      const blobUrl = URL.createObjectURL(file)
      img.src = blobUrl;
      // 加载模型
      const pipe = await pipeline('image-to-text', 'Mozilla/distilvit');
      // 执行推理
      const output = await pipe(blobUrl);
      textarea.value = output.map(x => x.generated_text).join('\n');
    }
    fileInput.addEventListener('change', handleUpload);
  </script>
</body>

</html>

推理结果：

文字转音频

如果出现自动播放被禁用的错误，可以手动创建一个 <audio> 元素，然后将生成的音频 URL 设置进去，再手动播放。

import { pipeline } from 'https://unpkg.com/@huggingface/transformers/dist/transformers.js';

(async function run() {
  const pipe = await pipeline('text-to-audio', 'Xenova/speecht5_tts');
  const speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin'
  const output = await pipe("Hello, I'm Ziyang, a front-end development engineer.", { speaker_embeddings });
  const url = URL.createObjectURL(output.toBlob());
  const audio = new Audio(url);
  audio.play();
  audio.addEventListener('ended', () => URL.revokeObjectURL(url));
})();

常见问题

1. 执行报错

按照官方示例执行时，可能会遇到语法错误。

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('sentiment-analysis');

const out = await pipe('I love transformers!');

这个问题可能与 Transformers.js 库的某些 bug 有关。在 HTML 中运行该代码时不会报错，但在 Vue 中运行时可能会出现此错误。

解决方法

可以尝试指定一个模型来解决此问题。

1	`const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english')`

2. 不支持的模型

并非所有模型都支持 Transformers.js。当尝试使用不支持的模型时，会报错。

解决方法

可以在 Hugging Face 官网的 Libraries 中筛选 Transformers.js，查看支持的模型列表。

结语

本文展示了如何使用 Transformers.js 在浏览器中运行多种 AI 模型，涵盖了文本生成、翻译、文本总结、图生文以及文字转音频等任务。通过这些示例，我们可以看到 Transformers.js 的强大功能和灵活性，它使得开发者能够直接在浏览器中运行复杂的 AI 模型，而无需依赖服务器。

然而，在浏览器中运行模型也有一些局限性。首次加载时，模型需要从网络下载，大小从几十兆到上千兆不等，这可能会导致用户需要等待一段时间。此外，由于模型的下载依赖于网络环境，科学上网可能是必要的，否则会遇到下载失败的情况。

尽管存在这些问题，Transformers.js 仍然是一个非常有潜力的工具，特别适合需要在客户端进行实时推理的应用场景。随着 Web 技术的不断进步，未来在浏览器中运行 AI 模型的门槛将会进一步降低，用户体验也会得到显著提升。

[!note]
Transformers.js 加载模型需要科学上网才能正常下载，否则可能无法使用。

在前端实现 AI 推理

前言

Transformers.js

安装

通过 npm 安装

通过 CDN 引入

API 介绍

pipeline

<Task>Pipeline

使用示例

文本生成

翻译

文本总结

图生文

文字转音频

常见问题

1. 执行报错

解决方法

2. 不支持的模型

解决方法

结语

相关链接