python将pdf转换成word

说明:

我计划用python,把pdf文件转换成word文件

step1:把python环境安装好,然后把helloworld跑起来

step2:安装依赖: 首先需要安装必要的Python库,在终端中运行,会开始下载依赖包,等待下载完成

bash 复制代码
C:\Users\Administrator>pip --version
pip 25.0.1 from C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip (python 3.12)

C:\Users\Administrator>pip install pdf2docx python-docx

Collecting pdf2docx
  Using cached pdf2docx-0.5.8-py3-none-any.whl.metadata (3.2 kB)
  Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl.metadata (103 kB)
Collecting numpy>=1.17.2 (from pdf2docx)
  Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl.metadata (60 kB)
Collecting opencv-python-headless>=4.5 (from pdf2docx)
  Using cached opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting fire>=0.3.0 (from pdf2docx)
  Using cached fire-0.7.0.tar.gz (87 kB)
  Preparing metadata (setup.py) ... done
Collecting lxml>=3.1.0 (from python-docx)
  Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl.metadata (3.8 kB)
Collecting typing-extensions>=4.9.0 (from python-docx)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting termcolor (from fire>=0.3.0->pdf2docx)
  Using cached termcolor-2.5.0-py3-none-any.whl.metadata (6.1 kB)
Using cached pdf2docx-0.5.8-py3-none-any.whl (132 kB)
Using cached python_docx-1.1.2-py3-none-any.whl (244 kB)
Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl (2.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 104.8 kB/s eta 0:00:00
Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl (3.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 91.1 kB/s eta 0:00:00
Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl (12.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.9/12.9 MB 95.3 kB/s eta 0:00:00
Downloading opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl (39.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.4/39.4 MB 94.9 kB/s eta 0:00:00
Downloading pymupdf-1.25.3-cp39-abi3-win_amd64.whl (16.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 91.1 kB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading termcolor-2.5.0-py3-none-any.whl (7.8 kB)
Building wheels for collected packages: fire
  Building wheel for fire (setup.py) ... done
  Created wheel for fire: filename=fire-0.7.0-py3-none-any.whl size=114261 sha256=d635b5056afde4d2d8f1b04a79ad6c18207aef0d16c499c683ef7eac73194231
  Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\46\54\24\1624fd5b8674eb1188623f7e8e17cdf7c0f6c24b609dfb8a89
Successfully built fire
Installing collected packages: typing-extensions, termcolor, PyMuPDF, numpy, lxml, fonttools, python-docx, opencv-python-headless, fire, pdf2docx
Successfully installed PyMuPDF-1.25.3 fire-0.7.0 fonttools-4.56.0 lxml-5.3.1 numpy-2.2.3 opencv-python-headless-4.11.0.86 pdf2docx-0.5.8 python-docx-1.1.2 termcolor-2.5.0 typing-extensions-4.12.2

step3:C:\Users\Administrator\PycharmProjects\PythonProject2.venv\Scripts\activate_this.py

python 复制代码
# pdf_to_word.py
import os
from pdf2docx import Converter
from datetime import datetime


def convert_pdf_to_word(input_path, output_folder):
    """
    将PDF转换为Word文档

    参数:
    input_path (str): 输入的PDF文件路径
    output_folder (str): 输出文件夹路径

    返回:
    str: 生成的Word文件路径
    """
    try:
        # 生成输出文件名(带时间戳)
        filename = os.path.basename(input_path)
        output_name = f"{os.path.splitext(filename)[0]}_converted_{datetime.now().strftime('%Y%m%d%H%M%S')}.docx"
        output_path = os.path.join(output_folder, output_name)
        # 初始化转换器
        cv = Converter(input_path)

        # 设置进度回调函数
        def progress_callback(percentage):
            print(f"\r转换进度: {percentage:.0f}%", end='', flush=True)

        # 执行转换(多线程优化)
        cv.convert(output_path,
                   start=0,
                   end=None,
                   progress_callback=progress_callback)

        cv.close()
        print(f"\n转换成功!文件已保存至:{output_path}")
        return output_path
    except Exception as e:
        print(f"\n转换失败:{str(e)}")
        return None


if __name__ == "__main__":
    # 输入输出路径配置
    desktop = os.path.join(os.path.expanduser("~"), "Desktop")
    input_pdf = r"C:\Users\Administrator\Desktop\qe.pdf"
    output_dir = desktop
    # 检查输入文件
    if not os.path.exists(input_pdf):
        print(f"错误:输入的PDF文件不存在 - {input_pdf}")
        exit(1)
    # 创建输出目录(如果不存在)
    os.makedirs(output_dir, exist_ok=True)
    # 执行转换
    print("开始PDF转换...")
    result = convert_pdf_to_word(input_pdf, output_dir)

    if result:
        print("操作完成!")
    else:
        print("转换过程中出现问题。")

step4:点击run运行

bash 复制代码
C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\python.exe C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\activate_this.py 
[INFO] Start to convert C:\Users\Administrator\Desktop\files.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
开始PDF转换...
[INFO] [3/4] Parsing pages...
[INFO] (1/1) Page 1
[INFO] [4/4] Creating pages...
[INFO] (1/1) Page 1
[INFO] Terminated in 0.10s.

转换成功!文件已保存至:C:\Users\Administrator\Desktop\files_converted_20250306142813.docx
操作完成!

Process finished with exit code 0

end

相关推荐
ronindong29 分钟前
Cursor 插件分享 | md-couture:一键将 Markdown 转换成带精美样式的 HTML
人工智能·python·ai编程
智慧地球(AI·Earth)29 分钟前
规则引擎实战:Python中re库和pyknow库规则引擎实战教程
开发语言·python·程序人生
是小蟹呀^42 分钟前
【总结】LangChain中的中间件Middleware
python·中间件·langchain·agent
qq_3422958243 分钟前
如何为容器内多个列表实现统一滚动条.txt
jvm·数据库·python
FreakStudio1 小时前
MicroPython对接大模型:uopenai + 火山方舟实现文字聊天和图片理解
python·单片机·ai·嵌入式·面向对象·电子diy
Kimliao1661 小时前
TFT-LCD液晶显示模组常见驱动电压的作用
python
qq_206901391 小时前
CSS如何引入自适应图标_利用svg外链配合css控制颜色
jvm·数据库·python
weixin_408717771 小时前
Go语言怎么编译Linux程序_Go语言编译Linux可执行文件教程【避坑】
jvm·数据库·python
2501_944934732 小时前
财务岗位如何在工作中提升数据分析能力?从财务分析到经营分析这样练
python·信息可视化·数据分析
2401_865439632 小时前
CSS怎么在flex布局中实现项目均分间距_设置justify-content space-evenly
jvm·数据库·python