pdf-engine发布

npm地址: @zouchengxin/pdf-engine - npm

demo地址: pdf-engine

介绍

pdf-engine内部使用pdfium编译的Webassembly二进制文件。

目前已支持功能:

  • 解析XObject和Annotation对象
  • 页面预览
  • 编辑保存(创建Link)

安装

bash 复制代码
# yarn add @zouchengxin/pdf-engine
# pnpm install @zouchengxin/pdf-engine
npm install @zouchengxin/pdf-engine

使用

初始化

javascript 复制代码
import { PdfEngine } from '@zouchengxin/pdfium-engine';

const pdfEngine = new PdfEngine();
// API_KEY not provided, validity period until 2026-01-01, You can adjust the system time for testing.
// Or contact the developer to obtain the API key.
await pdfEngine.init(API_KEY);

解析

javascript 复制代码
// Load PDF documents, parameter Uint8Array
const pdfDoc = pdfEngine.loadPdf(data);
// Get the number of pages
const count = pdfDoc.getPageCount();
// Retrieves PDF metadata.
// returning the fields Title, Author, Subject, Keywords, Creator, Producer, CreationDate, and ModDate.
const meta = pdfDoc.getMetaData();
console.log('Page Count:', count);
console.log('Pdf Meta:', meta);
pdfPages.value = [];
for (let i = 0; i < count; i++) {
  // Obtain the page proxy object and perform operations such as parsing and editing.
  const page = pdfDoc.getPageProxy(i);
  // Get page width
  const width = page.getPageWidth();
  // Get page height
  const height = page.getPageHeight();
  // Retrieves all xobject objects on the page.
  // including those of type TEXT, PATH, IMAGE, SHADING, and FORM.
  const objs = page.getObjects();
  // Retrieves all annotation objects on the page.
  // including those of type TEXT, LINK, FREETEXT, LINE, SQUARE, CIRCLE, HIGHLIGHT, UNDERLINE, STAMP, INK etc.
  const annots = page.getAnnotions();
  console.log('Page Size:', width, height);
  console.log('Page Objects:', objs);
  console.log('Page Annotions:', annots);
}

渲染

javascript 复制代码
// Retrieve the bitmap after page rendering; render only the xobject object, excluding annotations.
// Return value: ImageData object
const data = page.getBitmap();
// Retrieve page thumbnail; return empty if not stored.
// Return value: ImageData object
const data = page.getThumbnail();

编辑

javascript 复制代码
// Create a link annotation.
// rect: a rectangular area.
// url: the redirect link.
page.createLinkAnno({
  rect: [100, 100, 160, 27],
  url: 'https://www.baidu.com',
});

// More features are under development.

// Save the PDF data and return a Uint8Array.
const uint8Arr = pdfDoc.savePdf();

注意

  • color: [r, g, b, a], 红绿蓝透明通道组成的数组。
  • rect: [x, y, w, h], 矩形区域左下角x/y坐标,w/h代表宽度高度。
  • 坐标系: 页面左下角为原点,水平方向为x轴。
  • API_KEY: 联系开发者获取,或者修改系统时间至2026/01/01之前进行测试。
相关推荐
fl1768312 小时前
基于python实现PDF批量加水印工具
开发语言·python·pdf
lqj_本人2 小时前
Flutter PDF 渲染插件(pdf_image_renderer)适配鸿蒙 (HarmonyOS) 平台实战
flutter·pdf·harmonyos
普通网友1 天前
Docker 磁盘占用分析和清理方法
pdf
盐焗西兰花1 天前
鸿蒙学习实战之路-PDF转换指定页面或指定区域为图片
学习·pdf·harmonyos
切糕师学AI1 天前
轻量pdf阅读器推荐
pdf
一个无名的炼丹师1 天前
DeepSeek+LangGraph构建企业级多模态RAG:从PDF复杂解析到Agentic智能检索全流程实战
python·pdf·大模型·多模态·rag
开开心心_Every1 天前
时间自动校准工具:一键同步网络服务器时间
游戏·随机森林·微信·pdf·逻辑回归·excel·语音识别
weixin_318088111 天前
PDF订单数据和尺码对不上。怎么办?python说好办
pdf·pdfplumber
蓝净云1 天前
如何从pdf中提取带层级的标题结构
python·pdf