依赖
- Python 库:fitz
shell
$ python -m pip install fitz
- 注:如果运行下述脚本过程中遇到
No module named 'frontend'
,可执行python -m pip install frontend
(要求 Python >=3.8)或python -m pip install PyMuPDF
Python 脚本
python
# extract_pdf_text.py
import fitz
def parsePDF(filePath):
with fitz.open(filePath) as doc:
text = ""
for page in doc.pages():
text += page.get_text()
if text:
return text
text = parsePDF(r'D:\downloads\intput.pdf')
with open('output.txt', mode='w', encoding='utf8') as f:
f.write(text)
参考
- https://blog.csdn.net/Achernar0208/article/details/129199937
Python--从PDF中提取文本的方法总结
- https://blog.csdn.net/yuan2019035055/article/details/127655766
已解决ModuleNotFoundError: No module named 'frontend'