pdf格式转换为txt格式

pdf文档转换为txt文档

首先在python3虚拟环境中安装PyPDF2

Python 3.6.8 (default, Jun 20 2023, 11:53:23)

[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import sys

>>> sys.path

['', '/usr/lib64/python36.zip', '/usr/lib64/python3.6', '/usr/lib64/python3.6/lib-dynload', '/home/clusteruser/env3/lib64/python3.6/site-packages', '/home/clusteruser/env3/lib64/python3.6/site-packages/setuptools-58.0.4-py3.6.egg', '/home/clusteruser/env3/lib64/python3.6/site-packages/selenium-3.141.0-py3.6.egg', '/home/clusteruser/env3/lib64/python3.6/site-packages/urllib3-1.26.6-py3.6.egg', '/home/clusteruser/env3/lib/python3.6/site-packages', '/home/clusteruser/env3/lib/python3.6/site-packages/setuptools-58.0.4-py3.6.egg', '/home/clusteruser/env3/lib/python3.6/site-packages/selenium-3.141.0-py3.6.egg', '/home/clusteruser/env3/lib/python3.6/site-packages/urllib3-1.26.6-py3.6.egg']

>>> quit();

(env3) [clusteruser@node0xc7 pdf-txt]$ pip3 install --target='/home/clusteruser/env3/lib64/python3.6/site-packages' PyPDF2

Collecting PyPDF2

Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)

|████████████████████████████████| 232 kB 407 kB/s

Collecting typing_extensions>=3.10.0.0

Downloading typing_extensions-4.1.1-py3-none-any.whl (26 kB)

Collecting dataclasses

Downloading dataclasses-0.8-py3-none-any.whl (19 kB)

Installing collected packages: typing-extensions, dataclasses, PyPDF2

Successfully installed PyPDF2-3.0.1 dataclasses-0.8 typing-extensions-4.1.1

***************************************************************************************

完成代码

(env3) [clusteruser@node0xc7 pdf-txt]$ cat pdf-text.py

import PyPDF2

def pdf_to_text(pdf_path, txt_path):

with open(pdf_path, 'rb') as pdf_file:

reader = PyPDF2.PdfReader(pdf_file)

text = ''

for page_number in range(len(reader.pages)):

text += reader.pages[page_number].extract_text()

with open(txt_path, 'w', encoding='utf-8') as txt_file:

txt_file.write(text)

调用函数进行转换

pdf_to_text('input.pdf', 'output.txt')

执行代码

python3 pdf-text.py

相关推荐
Adolf_199314 分钟前
Flask-SQLAlchemy一对多 一对一 多对多关联
后端·python·flask
维生素C++15 分钟前
【可变模板参数】
linux·服务器·c语言·前端·数据结构·c++·算法
Su4iky18 分钟前
(Python) Structured Streaming读取Kafka源实时处理图像
开发语言·python·kafka
vah10121 分钟前
python队列操作
开发语言·前端·python
A 八方1 小时前
Python MongoDB
开发语言·python·mongodb
sz66cm3 小时前
Python基础 -- 使用Python实现ssh终端并实现数据处理与统计功能
开发语言·python·ssh
CodeHackerBhx3 小时前
如何使用VMware安装Linux操作系统
linux·运维·服务器
小阿轩yx3 小时前
小阿轩yx-通过state模块定义主机状态
linux·云计算·运维开发·state定义主机状态·jinja模板
eybk4 小时前
改进拖放PDF转换为图片在转换为TXT文件的程序
pdf
ac-er88885 小时前
如何在Flask中实现国际化和本地化
后端·python·flask