巧用 Puppeteer + Cheerio：批量生成高质量 Emoji 图片

1、前言

在开发过程中，笔者遇到了一个需求：需要使用大量的emoji图片资源。联系设计同学帮忙提供一下，设计同学二话不说丢过来一个网站 getemoji.com ，说需要的emoji上面都有。

作为程序员，我们第一步肯定想到，写个爬虫进行获取，但我发现了一个更简便的解决方案。通过观察 getemoji.com 这个网站的前端 HTML 结构，我发现所有渲染emoji的节点都使用了相同的类名emoji emoji-button，且渲染的是真实的 Unicode Emoji 字符，而不是图片。

基于这个发现，决定直接将网站的 HTML 复制下来，然后通过解析 HTML 结构来提取emoji，最后使用Puppeteer将这些emoji渲染成高质量的 PNG 图片。

这种方法相比传统爬虫有以下优势：

避免了复杂的反爬虫机制
可以一次性获取所有 emoji 数据
渲染质量可控，支持自定义尺寸

2、技术实现

2-1 Puppeteer和Cheerio 介绍

Puppeteer是 Google 开发的 Node.js 库，提供了一套高级 API 来控制 Chrome/Chromium 浏览器。它可以做任何你在浏览器中手动做的事情。
Cheerio是一个专为服务端设计的核心 jQuery 实现，它可以在 Node.js 环境中解析和操作 HTML/XML 文档。可以把它理解为"没有浏览器的jQuery"。
两者搭配使用可以在 Node 环境下实现浏览器渲染和操作DOM节点.

2-2 关键功能实现

2-2-1 HTML 解析与 Emoji 提取

javascript 复制代码

async function readAllEmojisFromIndex() {
  const indexPath = path.resolve(__dirname, 'index.html');
  const html = await fs.readFile(indexPath, 'utf8');
  const $ = loadHtml(html);
  const nodes = $('div.emoji.emoji-button');
  const emojis = [];
  nodes.each((_, el) => {
    const text = $(el).text().trim();
    if (!text) return;
    // 只保留非 ASCII 字符（可能是 emoji）
    const graphemes = [...text].filter((ch) => !/^[\u0000-\u007F]$/.test(ch));
    if (graphemes.length > 0) emojis.push(graphemes.join(''));
  });
  return [...new Set(emojis)];
}

这个函数：

读取本地保存的 index.html 文件
使用 CSS 选择器定位所有 emoji 节点
过滤掉 ASCII 字符，只保留 emoji
去重处理，避免重复下载

2-2-2 Unicode 码点转换

javascript 复制代码

function toCodepoints(input) {
  const codepoints = [];
  for (const char of input) {
    const cp = char.codePointAt(0);
    if (cp !== undefined && cp !== null) {
      codepoints.push(cp.toString(16));
    }
  }
  return codepoints.join('-');
}

这个函数将 emoji 字符转换为 Unicode 码点的十六进制表示，用作文件名。例如：

🐶 → 1f436
👨‍💻 → 1f468-200d-1f4bb

2-2-3 高质量 Emoji 渲染成 Png 图片

javascript 复制代码

async function renderEmojiToPng(page, emoji, targetSize) {
  const deviceScaleFactor = Math.max(1, Math.ceil(targetSize / 72));
  await page.setViewport({ width: targetSize, height: targetSize, deviceScaleFactor });

  const padding = Math.floor(targetSize * 0.01);
  const effectiveSize = targetSize - (padding * 2);

  await page.setContent(`<!doctype html>
    <html>
      <head><meta charset="utf-8"/></head>
      <body style="margin:0; padding:${padding}px; background:transparent; width:${targetSize}px; height:${targetSize}px; box-sizing:border-box;">
        <div id="renderer" style="
          width: ${effectiveSize}px; height: ${effectiveSize}px;
          display: flex; align-items: center; justify-content: center;
          background: transparent;
          font-size: ${Math.floor(effectiveSize * 0.8)}px;
          line-height: 1;
          font-family: 'Apple Color Emoji', 'Noto Color Emoji', 'Segoe UI Emoji', 'EmojiOne Color', 'Twemoji Mozilla', sans-serif;
          user-select: none; -webkit-user-select: none; pointer-events: none;
          text-align: center;
        ">${emoji}</div>
      </body>
    </html>`);

  const renderer = await page.$('#renderer');
  if (!renderer) throw new Error('Renderer element not found');
  const buffer = await renderer.screenshot({ type: 'png', omitBackground: true });
  return buffer;
}

这个函数通过浏览器页面将emoji字符渲染成PNG图片。

首先设置页面视口和HTML内容，让emoji在指定尺寸的容器中居中显示。

最后截取包含emoji的元素生成透明背景的PNG图片并返回。