微信小程序实现身份证识别与裁剪（基于 VisionKit）

微信小程序 VisionKit 实战（二）：静态图片人脸检测与人像区域提取

一、前言

在小程序的实名认证场景中，我们常常需要用户上传身份证照片。

但光靠上传还不够 ------

我们希望 自动检测身份证是否有效、自动识别身份证区域、自动裁剪、返回标准化图片。

过去，这类需求要么依赖服务端 OCR，要么需要接第三方识别 SDK。

现在，微信官方在小程序中提供了 VisionKit (VK) 视觉能力，让我们能在前端就完成这一系列操作。

本文将带你一步步实现：

📷 身份证自动识别 → 🎯 自动裁剪 → 🖼 返回标准化 Base64 图片。

二、完整代码

javascript 复制代码

/**
 * 初始化 VKSession
 */
function createVKSession(gl) {
  const session = wx.createVKSession({
    track: {
      IDCard: { mode: 2 }, // 照片模式
    },
    version: "v1",
    gl,
  })
  return session
}

/**
 * 根据仿射矩阵裁剪身份证区域
 * @param {Image} img 原始图片对象
 * @param {number} width 原图宽度
 * @param {number} height 原图高度
 * @param {Array<number>} affineMat 仿射矩阵（6个数值）
 * @param {number} affineImgWidth 目标图宽
 * @param {number} affineImgHeight 目标图高
 * @returns {string} 裁剪后的 base64 图片
 */
function cropIDCard(img, width, height, affineMat, affineImgWidth, affineImgHeight) {
  // 创建离屏 canvas
  const canvas = wx.createOffscreenCanvas({
    type: "2d",
    width: affineImgWidth,
    height: affineImgHeight,
  })
  const ctx = canvas.getContext("2d")

  // 清空画布
  ctx.clearRect(0, 0, affineImgWidth, affineImgHeight)

  /**
   * setTransform(a, b, c, d, e, f)
   * 对绘制内容应用仿射矩阵：
   * [ a  c  e ]
   * [ b  d  f ]
   * [ 0  0  1 ]
   *
   * affineMat 数组中存的就是这个 3x3 矩阵的前 6 个元素：
   * [a, c, e, b, d, f]
   */
  ctx.setTransform(
    Number(affineMat[0]), // a：水平缩放
    Number(affineMat[3]), // b：垂直倾斜
    Number(affineMat[1]), // c：水平倾斜
    Number(affineMat[4]), // d：垂直缩放
    Number(affineMat[2]), // e：水平位移
    Number(affineMat[5])  // f：垂直位移
  )

  // 绘制原图 ------ 经过矩阵变换后会只显示身份证区域
  ctx.drawImage(img, 0, 0, width, height)

  // 将裁剪后的区域导出为 base64
  return canvas.toDataURL()
}

/**
 * 将 base64 转为小程序临时文件路径
 */
function base64ToTempFilePath(base64Data) {
  const base64 = base64Data
  const time = new Date().getTime()
  const imgPath = wx.env.USER_DATA_PATH + "/poster" + time + "share" + ".png"
  const imageData = base64.replace(/^data:image/\w+;base64,/, "")
  const file = wx.getFileSystemManager()
  file.writeFileSync(imgPath, imageData, "base64")
  return imgPath
}

/**
 * 识别身份证并返回裁剪后的 base64
 * @param {Object} options
 * @param {string} options.imgUrl 图片路径
 * @param {number} options.width 原图宽
 * @param {number} options.height 原图高
 * @param {*} options.gl 小程序 gl 对象
 * @returns {Promise<string>} base64
 */
function detectIDCard({ imgUrl, width, height, gl }) {
  return new Promise(async (resolve, reject) => {
    try {
      const session = createVKSession(gl)

      // 监听识别结果
      session.on("updateAnchors", async (anchors) => {
        if (anchors && anchors[0]) {
          const anchor = anchors[0]
          const isComplete = anchor.isComplete
          if (!isComplete) return resolve(false)
          const { affineImgWidth, affineImgHeight, affineMat, box } = anchor
          if (affineImgWidth && affineImgHeight && affineMat) {
            const cropImg = cropIDCard(
              img,
              width,
              height,
              affineMat,
              affineImgWidth,
              affineImgHeight
            )
            resolve({
              cropImg: base64ToTempFilePath(cropImg),
              affineImgWidth,
              affineImgHeight,
              box,
            })
          } else {
            resolve(false)
          }
        } else {
          resolve(false)
        }
      })

      // 未识别到身份证
      session.on("removeAnchors", () => {
        resolve(false)
      })

      // 准备图片 buffer
      const canvas = wx.createOffscreenCanvas({
        type: "2d",
        width,
        height,
      })
      const ctx = canvas.getContext("2d")
      const img = canvas.createImage()
      await new Promise((r) => {
        img.onload = r
        img.src = imgUrl
      })
      ctx.drawImage(img, 0, 0, width, height)
      const imgData = ctx.getImageData(0, 0, width, height)

      // 启动识别
      session.start(() => {
        session.detectIDCard({
          frameBuffer: imgData.data.buffer,
          width,
          height,
          getAffineImg: true,
        })
      })
    } catch (err) {
      reject(err)
    }
  })
}

module.exports = {
  detectIDCard,
}

三、逻辑拆解

1️⃣ 创建 VKSession

javascript 复制代码

const session = createVKSession(gl)

gl 是 WebGL 上下文对象。
VK 会基于 WebGL 加速图像分析与识别。
我们这里指定 track: { IDCard: { mode: 2 } }，表示识别模式为「身份证照片识别」。

2️⃣ 注册识别事件

VK 的识别结果以"锚点（Anchor）"形式返回。

javascript 复制代码

session.on("updateAnchors", async (anchors) => { ... })

每个 anchor 就是识别到的一个身份证对象。

我们重点关注两个字段：

javascript 复制代码

const anchor = anchors[0]
const isComplete = anchor.isComplete
if (!isComplete) return resolve(false)

只有 isComplete 为 true 时，身份证识别才算完成。

3️⃣ 获取仿射矩阵并裁剪图像

识别完成后，我们会得到：

javascript 复制代码

const { affineImgWidth, affineImgHeight, affineMat, box } = anchor

这些字段的含义如下：

字段	含义
`affineMat`	身份证的透视变换矩阵
`affineImgWidth` / `affineImgHeight`	透视矫正后的图像尺寸
`box`	身份证在原图中的矩形边界

利用这些信息，我们可通过 cropIDCard() 对图像进行仿射变换和裁剪：

javascript 复制代码

/**
 * 根据仿射矩阵裁剪身份证区域
 * @param {Image} img 原始图片对象
 * @param {number} width 原图宽度
 * @param {number} height 原图高度
 * @param {Array<number>} affineMat 仿射矩阵（6个数值）
 * @param {number} affineImgWidth 目标图宽
 * @param {number} affineImgHeight 目标图高
 * @returns {string} 裁剪后的 base64 图片
 */
function cropIDCard(img, width, height, affineMat, affineImgWidth, affineImgHeight) {
  // 创建离屏 canvas
  const canvas = wx.createOffscreenCanvas({
    type: "2d",
    width: affineImgWidth,
    height: affineImgHeight,
  })
  const ctx = canvas.getContext("2d")

  // 清空画布
  ctx.clearRect(0, 0, affineImgWidth, affineImgHeight)

  /**
   * setTransform(a, b, c, d, e, f)
   * 对绘制内容应用仿射矩阵：
   * [ a  c  e ]
   * [ b  d  f ]
   * [ 0  0  1 ]
   *
   * affineMat 数组中存的就是这个 3x3 矩阵的前 6 个元素：
   * [a, c, e, b, d, f]
   */
  ctx.setTransform(
    Number(affineMat[0]), // a：水平缩放
    Number(affineMat[3]), // b：垂直倾斜
    Number(affineMat[1]), // c：水平倾斜
    Number(affineMat[4]), // d：垂直缩放
    Number(affineMat[2]), // e：水平位移
    Number(affineMat[5])  // f：垂直位移
  )

  // 绘制原图 ------ 经过矩阵变换后会只显示身份证区域
  ctx.drawImage(img, 0, 0, width, height)

  // 将裁剪后的区域导出为 base64
  return canvas.toDataURL()
}

透视矫正的原理

假设用户拍照时身份证倾斜了，那么原图中的身份证矩形其实是一个"四边形"。

VK 通过图像检测算法，推算出该四边形与标准矩形之间的映射关系（仿射矩阵）。你可以把 affineMat 理解为一个"变换模板"：

它能让一个倾斜的矩形被"拉平"；
同时保持宽高比；
输出为标准身份证比例的图像。

矫正过程如下图所示（示意逻辑）：

复制代码

原图（倾斜）           →      矫正后（拉平）
┌────────┐               ┌────────┐
│ \      │               │        │
│  \     │   变换矩阵     │  身份证 │
│   \    │  ─────────▶   │        │
└────\───┘               └────────┘

为什么用 `setTransform()` 而不是手动计算？

因为 CanvasRenderingContext2D.setTransform() 本身就能直接接收一个仿射矩阵并自动完成坐标变换。

这意味着我们不需要自己计算透视点映射或三角形插值，微信的底层 Canvas 渲染器会直接完成 GPU 加速的几何变换。

换句话说：
affineMat 就像一把钥匙，告诉 Canvas：

"把身份证这块区域拉平、居中、输出。"

最终生成的 Base64 图片就是裁剪+透视矫正后的身份证标准图。

4️⃣ 构建原始图像 buffer

在调用识别前，我们需先将原图加载进离屏 canvas：

javascript 复制代码

const canvas = wx.createOffscreenCanvas({ type: "2d", width, height })
const ctx = canvas.getContext("2d")
const img = canvas.createImage()
await new Promise(r => { img.onload = r; img.src = imgUrl })
ctx.drawImage(img, 0, 0, width, height)
const imgData = ctx.getImageData(0, 0, width, height)

这样就能得到识别所需的 frameBuffer。

5️⃣ 启动身份证检测

最后，通过 session.start() 启动识别：

javascript 复制代码

session.start(() => {
  session.detectIDCard({
    frameBuffer: imgData.data.buffer,
    width,
    height,
    getAffineImg: true,
  })
})

VK 会在后台自动分析图像，当识别到身份证后触发 updateAnchors 回调。

四、VisionKit 原理概述

createVKSession(gl) 是微信小程序内置的 AR/视觉识别接口。

它的核心能力包括：

📷 图像对象检测（如身份证、人脸、名片）
🧠 GPU 加速的矩阵计算与透视变换
🧩 返回锚点对象（Anchor），包含识别区域、矩阵、完整度状态等

常用事件如下：

事件名	含义
`updateAnchors`	检测到新身份证或识别进度更新
`removeAnchors`	识别结果被移除（未检测到身份证）

五、总结

通过微信小程序的 VisionKit，我们可以在前端本地完成身份证自动识别与裁剪，无需上传图片到服务端即可拿到标准化结果。

这不仅提升了用户体验，也减少了隐私传输风险。

💡 最后

如果这篇文章对你有帮助，别忘了点个「赞」或收藏 ❤️

微信小程序实现身份证识别与裁剪（基于 VisionKit）

一、前言

二、完整代码

三、逻辑拆解

1️⃣ 创建 VKSession

2️⃣ 注册识别事件

3️⃣ 获取仿射矩阵并裁剪图像

透视矫正的原理

为什么用 setTransform() 而不是手动计算？

4️⃣ 构建原始图像 buffer

5️⃣ 启动身份证检测

四、VisionKit 原理概述

五、总结

💡 最后

为什么用 `setTransform()` 而不是手动计算？