分享：Docling：本地自动注释 PDF 图像

技术与健康2025-11-30 22:24

问题：

PDF 文件中的图表、示意图和图形等图像无法被搜索和分析。手动为数百个图形编写描述是不切实际的。

您可以使用 Gemini 或 ChatGPT 等云 API，但这意味大规模应用会产生 API 成本，并且您的文档会离开您的基础设施。

解决方案：

bash 复制代码

pip install docling

python 复制代码

from docling.document_converter import DocumentConverter
import pandas as pd

# Initialize converter with default settings
converter = DocumentConverter()

# Convert any document format - we'll use the Docling technical report itself
source_url = "https://arxiv.org/pdf/2408.09869"
result = converter.convert(source_url)

# Access structured data immediately
doc = result.document
print(f"Successfully processed document from: {source_url}")

Docling 运行本地视觉语言模型（Granite Vision、SmolVLM），自动为文档中的每张图片生成描述性注释，同时保护数据隐私。

主要优势：

隐私保护：数据保留在本地，可离线使用
费用：无每张图片 API 费用
灵活性：可自定义提示，支持任何 HuggingFace 模型