python将pdf转换成word

说明:

我计划用python,把pdf文件转换成word文件

step1:把python环境安装好,然后把helloworld跑起来

step2:安装依赖: 首先需要安装必要的Python库,在终端中运行,会开始下载依赖包,等待下载完成

bash 复制代码
C:\Users\Administrator>pip --version
pip 25.0.1 from C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip (python 3.12)

C:\Users\Administrator>pip install pdf2docx python-docx

Collecting pdf2docx
  Using cached pdf2docx-0.5.8-py3-none-any.whl.metadata (3.2 kB)
  Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl.metadata (103 kB)
Collecting numpy>=1.17.2 (from pdf2docx)
  Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl.metadata (60 kB)
Collecting opencv-python-headless>=4.5 (from pdf2docx)
  Using cached opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting fire>=0.3.0 (from pdf2docx)
  Using cached fire-0.7.0.tar.gz (87 kB)
  Preparing metadata (setup.py) ... done
Collecting lxml>=3.1.0 (from python-docx)
  Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl.metadata (3.8 kB)
Collecting typing-extensions>=4.9.0 (from python-docx)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting termcolor (from fire>=0.3.0->pdf2docx)
  Using cached termcolor-2.5.0-py3-none-any.whl.metadata (6.1 kB)
Using cached pdf2docx-0.5.8-py3-none-any.whl (132 kB)
Using cached python_docx-1.1.2-py3-none-any.whl (244 kB)
Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl (2.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 104.8 kB/s eta 0:00:00
Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl (3.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 91.1 kB/s eta 0:00:00
Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl (12.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.9/12.9 MB 95.3 kB/s eta 0:00:00
Downloading opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl (39.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.4/39.4 MB 94.9 kB/s eta 0:00:00
Downloading pymupdf-1.25.3-cp39-abi3-win_amd64.whl (16.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 91.1 kB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading termcolor-2.5.0-py3-none-any.whl (7.8 kB)
Building wheels for collected packages: fire
  Building wheel for fire (setup.py) ... done
  Created wheel for fire: filename=fire-0.7.0-py3-none-any.whl size=114261 sha256=d635b5056afde4d2d8f1b04a79ad6c18207aef0d16c499c683ef7eac73194231
  Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\46\54\24\1624fd5b8674eb1188623f7e8e17cdf7c0f6c24b609dfb8a89
Successfully built fire
Installing collected packages: typing-extensions, termcolor, PyMuPDF, numpy, lxml, fonttools, python-docx, opencv-python-headless, fire, pdf2docx
Successfully installed PyMuPDF-1.25.3 fire-0.7.0 fonttools-4.56.0 lxml-5.3.1 numpy-2.2.3 opencv-python-headless-4.11.0.86 pdf2docx-0.5.8 python-docx-1.1.2 termcolor-2.5.0 typing-extensions-4.12.2

step3:C:\Users\Administrator\PycharmProjects\PythonProject2.venv\Scripts\activate_this.py

python 复制代码
# pdf_to_word.py
import os
from pdf2docx import Converter
from datetime import datetime


def convert_pdf_to_word(input_path, output_folder):
    """
    将PDF转换为Word文档

    参数:
    input_path (str): 输入的PDF文件路径
    output_folder (str): 输出文件夹路径

    返回:
    str: 生成的Word文件路径
    """
    try:
        # 生成输出文件名(带时间戳)
        filename = os.path.basename(input_path)
        output_name = f"{os.path.splitext(filename)[0]}_converted_{datetime.now().strftime('%Y%m%d%H%M%S')}.docx"
        output_path = os.path.join(output_folder, output_name)
        # 初始化转换器
        cv = Converter(input_path)

        # 设置进度回调函数
        def progress_callback(percentage):
            print(f"\r转换进度: {percentage:.0f}%", end='', flush=True)

        # 执行转换(多线程优化)
        cv.convert(output_path,
                   start=0,
                   end=None,
                   progress_callback=progress_callback)

        cv.close()
        print(f"\n转换成功!文件已保存至:{output_path}")
        return output_path
    except Exception as e:
        print(f"\n转换失败:{str(e)}")
        return None


if __name__ == "__main__":
    # 输入输出路径配置
    desktop = os.path.join(os.path.expanduser("~"), "Desktop")
    input_pdf = r"C:\Users\Administrator\Desktop\qe.pdf"
    output_dir = desktop
    # 检查输入文件
    if not os.path.exists(input_pdf):
        print(f"错误:输入的PDF文件不存在 - {input_pdf}")
        exit(1)
    # 创建输出目录(如果不存在)
    os.makedirs(output_dir, exist_ok=True)
    # 执行转换
    print("开始PDF转换...")
    result = convert_pdf_to_word(input_pdf, output_dir)

    if result:
        print("操作完成!")
    else:
        print("转换过程中出现问题。")

step4:点击run运行

bash 复制代码
C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\python.exe C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\activate_this.py 
[INFO] Start to convert C:\Users\Administrator\Desktop\files.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
开始PDF转换...
[INFO] [3/4] Parsing pages...
[INFO] (1/1) Page 1
[INFO] [4/4] Creating pages...
[INFO] (1/1) Page 1
[INFO] Terminated in 0.10s.

转换成功!文件已保存至:C:\Users\Administrator\Desktop\files_converted_20250306142813.docx
操作完成!

Process finished with exit code 0

end

相关推荐
声声codeGrandMaster18 分钟前
Django框架的前端部分使用Ajax请求一
前端·后端·python·ajax·django
卡尔曼的BD SLAMer30 分钟前
计算机视觉与深度学习 | Python实现EMD-SSA-VMD-LSTM时间序列预测(完整源码和数据)
python·深度学习·算法·cnn·lstm
nuclear20112 小时前
使用Python将 Excel 中的图表、形状和其他元素导出为图片
python·excel·将excel图表转换为图片·将excel文本框转换为图片
三少的笔记2 小时前
Windows中PDF TXT Excel Word PPT等Office文件在预览窗格无法预览的终级解决方法大全
pdf·word·excel
小袁拒绝摆烂4 小时前
OpenCV-python灰度变化和直方图修正类型
python·opencv·计算机视觉
SEO-狼术6 小时前
Easily Add Pages to PDF Documents
pdf
Dxy12393102167 小时前
Python 条件语句详解
开发语言·python
龙泉寺天下行走7 小时前
Python 翻译词典小程序
python·oracle·小程序
践行见远7 小时前
django之视图
python·django·drf
love530love8 小时前
Windows避坑部署CosyVoice多语言大语言模型
人工智能·windows·python·语言模型·自然语言处理·pycharm