python 提取PDF文字

柚见2024-02-24 20:05

使用pdfplumber，不能提取扫描的pdf和插入的图片。

python 复制代码

import pdfplumber

file_path = r'D:\UserData\admindesktop\官方文档\1903_Mesh-Models-Overview_FINAL.pdf'
with pdfplumber.open(file_path) as pdf:
    page = pdf.pages[0]
    print(page.extract_text()) # 所以文字
    print([word["text"] for word in page.extract_words()]) # 提取存在的文字

上一篇：Java导出pdf格式文件

下一篇：分布式扫描bean问题

热门推荐

01GitHub 镜像站点 02Qwen3.5 开源全解析：从 0.8B 到 397B，代际升级 + 全场景选型指南 03OpenClaw 使用和管理 MCP 完全指南 04UV安装并设置国内源 05AI 编程三剑客：Spec-Kit、OpenSpec、Superpowers 深度对比与实战指南 06Claude Code + GLM4.7 避坑指南：解决 Unable to connect to Anthropic services 07OpenClaw Control UI安全上下文访问配置 08小黑课堂计算机二级WPSoffice题库软件下载安装教程（2026年3月最新版）09OpenClaw macOS 完整安装与本地模型配置教程（实战版）10让 Trae IDE 智能体 “读懂”文档 Excel+PDF+DOCX ：mcp-documents-reader 工具使用指南