【python013】pyinstaller打包PDF提取脚本为exe工具

1.在日常工作和学习中,遇到类似问题处理场景,如pdf文件核心内容截取,这里将文件打包成exe可执行文件,实现功能简便使用。

2.欢迎点赞、关注、批评、指正,互三走起来,小手动起来!

3.欢迎点赞、关注、批评、指正,互三走起来,小手动起来!

文章目录

1.环境准备

  • 历史安装的环境打包失败,或环境包不兼容,或历史安装包太多等问题,新建环境可能会更快些。问题如4小节记录。

    bash 复制代码
    # 在 Anaconda Prompt 环境中创建虚拟环境
    conda create -n youli python==3.8.0
    
    # 激活新建的虚拟环境
    conda activate youli
    
    # 安装必要的Python环境包
    pip install fitz
    pip install pymupdf
    pip install wxpython
    pip install pyinstaller
    pip install frontend wxpython
    
    # 删除虚拟环境
    # conda create -n youli python==3.8.0
    
    # 查看当前存在哪些虚拟环境
    # conda env list 
    # conda info -e
  • 虚拟环境效果如下:

  • 环境包版本详情如下:

2.pyinstaller打包输出脚本工具

  • 命令行

    bash 复制代码
    pyinstaller -F -w ..\pdfextract.py --noconfirm --noconsole -p ..\Anaconda3\envs\python8\Lib\site-packages
  • 执行结果详情

    bash 复制代码
    (python8) C:\Users\Administrator>pyinstaller -F -w ..\pdfextract2.py --noconfirm --noconsole -p ..\Anaconda3\envs\python8\Lib\site-packages
    420 INFO: PyInstaller: 6.8.0, contrib hooks: 2024.7
    420 INFO: Python: 3.8.0 (conda)
    421 INFO: Platform: Windows-10-10.0.19041-SP0
    422 INFO: Python environment: ..\Anaconda3\envs\python8
    423 INFO: wrote C:\Users\Administrator\pdfextract2.spec
    429 DEPRECATION: Foreign Python environment's site-packages paths added to --paths/pathex:
    ['..\\Anaconda3\\envs\\python8\\Lib\\site-packages']
    This is ALWAYS the wrong thing to do. If your environment's site-packages is not in PyInstaller's module search path then you are running PyInstaller from a different environment to the one your packages are in. Run print(sys.prefix) without PyInstaller to get the environment you should be using then install and run PyInstaller from that environment instead of this one. This warning will become an error in PyInstaller 7.0.
    430 INFO: Module search paths (PYTHONPATH):
    ['..\\Anaconda3\\envs\\python8\\Scripts\\pyinstaller.exe',
     '..\\Anaconda3\\envs\\python8\\python38.zip',
     '..\\Anaconda3\\envs\\python8\\DLLs',
     '..\\Anaconda3\\envs\\python8\\lib',
     '..\\Anaconda3\\envs\\python8',
     '..\\Anaconda3\\envs\\python8\\lib\\site-packages',
     'E:\\PycharmSpace\\orclblobtest',
     '..\\Anaconda3\\envs\\python8\\Lib\\site-packages']
    733 INFO: checking Analysis
    733 INFO: Building Analysis because Analysis-00.toc is non existent
    733 INFO: Running Analysis Analysis-00.toc
    735 INFO: Target bytecode optimization level: 0
    735 INFO: Initializing module dependency graph...
    736 INFO: Caching module graph hooks...
    754 INFO: Analyzing base_library.zip ...
    1824 INFO: Loading module hook 'hook-heapq.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    1897 INFO: Loading module hook 'hook-encodings.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    3157 INFO: Loading module hook 'hook-pickle.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    3815 INFO: Caching module dependency graph...
    3965 INFO: Looking for Python shared library...
    3973 INFO: Using Python shared library: ..\Anaconda3\envs\python8\python38.dll
    3974 INFO: Analyzing E:\PycharmSpace\orclblobtest\pdfextract2.py
    5889 INFO: Loading module hook 'hook-PIL.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    5956 INFO: Loading module hook 'hook-PIL.Image.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    6952 INFO: Loading module hook 'hook-numpy.py' from '..\\Anaconda3\\envs\\python8\\Lib\\site-packages\\numpy\\_pyinstaller'...
    7029 WARNING: Conda distribution 'numpy', dependency of 'numpy', was not found. If you installed this distribution with pip then you may ignore this warning.
    7572 INFO: Loading module hook 'hook-multiprocessing.util.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    7681 INFO: Loading module hook 'hook-xml.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    8209 INFO: Loading module hook 'hook-difflib.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    8295 INFO: Loading module hook 'hook-platform.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    8669 INFO: Loading module hook 'hook-sysconfig.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    9621 INFO: Loading module hook 'hook-packaging.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    9745 INFO: Loading module hook 'hook-PIL.ImageFilter.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    10018 INFO: Loading module hook 'hook-pandas.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    11702 INFO: Loading module hook 'hook-pytz.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    12067 INFO: Loading module hook 'hook-pkg_resources.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    14388 INFO: Loading module hook 'hook-scipy.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    14539 INFO: Loading module hook 'hook-scipy.linalg.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    14909 INFO: Loading module hook 'hook-scipy.sparse.csgraph.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    15183 INFO: Loading module hook 'hook-scipy.special._ufuncs.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    15243 INFO: Loading module hook 'hook-scipy.special._ellip_harm_2.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    16644 INFO: Loading module hook 'hook-scipy.spatial.transform.rotation.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    17421 INFO: Loading module hook 'hook-scipy.stats._stats.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    18308 INFO: Loading module hook 'hook-pandas.io.formats.style.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    20782 INFO: Loading module hook 'hook-pandas.plotting.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    21009 INFO: Processing pre-safe import module hook six.moves from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\pre_safe_import_module\\hook-six.moves.py'.
    22334 INFO: Loading module hook 'hook-sqlite3.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    22879 INFO: Loading module hook 'hook-pandas.io.clipboard.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    23076 INFO: Loading module hook 'hook-xml.etree.cElementTree.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    23078 INFO: Loading module hook 'hook-lxml.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    23627 INFO: Loading module hook 'hook-lxml.etree.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    23633 INFO: Loading module hook 'hook-xml.dom.domreg.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    24205 INFO: Processing module hooks...
    24282 INFO: Loading module hook 'hook-lxml.isoschematron.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    24301 WARNING: Hidden import "jinja2" not found!
    24515 INFO: Loading module hook 'hook-PIL.SpiderImagePlugin.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    25007 INFO: Loading module hook 'hook-lxml.objectify.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    25027 INFO: Performing binary vs. data reclassification (624 entries)
    25172 INFO: Looking for ctypes DLLs
    25217 INFO: Analyzing run-time hooks ...
    25225 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_pkgutil.py'
    25229 INFO: Processing pre-find module path hook _pyi_rth_utils from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\pre_find_module_path\\hook-_pyi_rth_utils.py'.
    25230 INFO: Loading module hook 'hook-_pyi_rth_utils.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    25231 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_multiprocessing.py'
    25234 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_pkgres.py'
    25237 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_inspect.py'
    25287 INFO: Looking for dynamic libraries
    ..\Anaconda3\envs\python8\lib\site-packages\PyInstaller\building\build_main.py:205: UserWarning: The numpy.array_api submodule is still experimental. See NEP 47.
      __import__(package)
    26904 INFO: Extra DLL search directories (AddDllDirectory): ['..\\Anaconda3\\envs\\python8\\lib\\site-packages\\numpy\\.libs']
    26904 INFO: Extra DLL search directories (PATH): []
    30079 INFO: Warnings written to C:\Users\Administrator\build\pdfextract2\warn-pdfextract2.txt
    30264 INFO: Graph cross-reference written to C:\Users\Administrator\build\pdfextract2\xref-pdfextract2.html
    30336 INFO: checking PYZ
    30337 INFO: Building PYZ because PYZ-00.toc is non existent
    30337 INFO: Building PYZ (ZlibArchive) C:\Users\Administrator\build\pdfextract2\PYZ-00.pyz
    32361 INFO: Building PYZ (ZlibArchive) C:\Users\Administrator\build\pdfextract2\PYZ-00.pyz completed successfully.
    32416 INFO: checking PKG
    32416 INFO: Building PKG because PKG-00.toc is non existent
    32417 INFO: Building PKG (CArchive) pdfextract2.pkg
    59038 INFO: Building PKG (CArchive) pdfextract2.pkg completed successfully.
    59060 INFO: Bootloader ..\Anaconda3\envs\python8\lib\site-packages\PyInstaller\bootloader\Windows-64bit-intel\runw.exe
    59060 INFO: checking EXE
    59061 INFO: Building EXE because EXE-00.toc is non existent
    59061 INFO: Building EXE from EXE-00.toc
    59061 INFO: Copying bootloader EXE to C:\Users\Administrator\dist\pdfextract2.exe
    59067 INFO: Copying icon to EXE
    59072 INFO: Copying 0 resources to EXE
    59072 INFO: Embedding manifest in EXE
    59076 INFO: Appending PKG archive to EXE
    59148 INFO: Fixing EXE headers
    59664 INFO: Building EXE from EXE-00.toc completed successfully.

3.参数及生成文件释义

  • pyinstaller参数含义
  • 输出文件内容含义
    • Analysis:主要是分析py文件的依赖信息
    • PYZ:是一个.pyz的压缩包,包含程序运行需要的依赖
    • EXE:是根据上述两项内容而生成的
    • COLLECT:主要是输出信息dist文件夹:最终的exe文件存放位置
    • build文件夹:中间过程,创建好之后可以直接删除
      • 整体详情如下图所示:

4.打包错误示例及工具代码

  • 错误示例部分

  • 打包代码

    python 复制代码
    # -*- coding: utf-8 -*-
    
    import fitz
    import wx
    
    class PDFExtractor(wx.Frame):
        def __init__(self, parent, _title):
            wx.Frame.__init__(self, parent, id=wx.ID_ANY, title=_title, pos=wx.DefaultPosition,
                              size=wx.Size(500, 254), style=wx.DEFAULT_FRAME_STYLE | wx.TAB_TRAVERSAL )
    
            self.SetSizeHints(wx.DefaultSize, wx.DefaultSize)
            self.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_WINDOW))
            self.SetBackgroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_ACTIVECAPTION))
    
            bSizer2 = wx.BoxSizer(wx.VERTICAL)
    
            self.m_filePicker2 = wx.FilePickerCtrl(self, wx.ID_ANY, wx.EmptyString, u"Select a file", u"*.*",
                                                   wx.DefaultPosition, wx.DefaultSize, wx.FLP_DEFAULT_STYLE)
            self.m_filePicker2.SetFont(wx.Font(9, 74, 90, 92, False, "微软雅黑"))
            self.m_filePicker2.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_HIGHLIGHT))
            self.m_filePicker2.SetBackgroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_HIGHLIGHT))
    
            bSizer2.Add(self.m_filePicker2, 0, wx.ALL | wx.EXPAND, 5)
    
            self.m_staticText5 = wx.StaticText(self, wx.ID_ANY, u"Start Page:", wx.DefaultPosition, wx.DefaultSize, 0)
            self.m_staticText5.Wrap(-1)
            self.m_staticText5.SetFont(wx.Font(9, 74, 90, 92, True, "微软雅黑"))
            self.m_staticText5.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNTEXT))
    
            bSizer2.Add(self.m_staticText5, 0, wx.ALL, 5)
    
            self.m_textCtrl1 = wx.TextCtrl(self, wx.ID_ANY, wx.EmptyString, wx.DefaultPosition, wx.DefaultSize, 0)
            bSizer2.Add(self.m_textCtrl1, 0, wx.EXPAND, 5)
    
            self.m_staticText6 = wx.StaticText(self, wx.ID_ANY, u"End Page:", wx.DefaultPosition, wx.DefaultSize, 0)
            self.m_staticText6.Wrap(-1)
            self.m_staticText6.SetFont(wx.Font(9, 74, 90, 92, True, "微软雅黑"))
            self.m_staticText6.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNTEXT))
    
            bSizer2.Add(self.m_staticText6, 0, wx.ALL, 5)
    
            self.m_textCtrl2 = wx.TextCtrl(self, wx.ID_ANY, wx.EmptyString, wx.DefaultPosition, wx.DefaultSize, 0)
            bSizer2.Add(self.m_textCtrl2, 0, wx.EXPAND, 5)
    
            self.m_button18 = wx.Button(self, wx.ID_ANY, u"Extract", wx.DefaultPosition, wx.DefaultSize, wx.NO_BORDER)
            self.m_button18.SetFont(wx.Font(12, 74, 90, 92, False, "微软雅黑"))
            self.m_button18.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNTEXT))
            self.m_button18.SetBackgroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNHIGHLIGHT))
    
    
            self.m_button18.Bind(wx.EVT_BUTTON, self.extract_pages)
    
            bSizer2.Add(self.m_button18, 0, wx.ALIGN_CENTER_HORIZONTAL | wx.SHAPED, 5)
    
            self.SetSizer(bSizer2)
            self.Layout()
            self.Centre(wx.BOTH)
        def __del__(self):
            pass
    
        def extract_pages(self, event):
            file_path = self.m_filePicker2.GetPath()
            start_page = int(self.m_textCtrl1.GetValue())
            end_page = int(self.m_textCtrl2.GetValue())
    
            doc = fitz.open(file_path)
            output_doc = fitz.open()
    
            for page_num in range(start_page - 1, end_page):
                output_doc.insert_pdf(doc, from_page=page_num, to_page=page_num)
    
            output_path = file_path.replace(".pdf", "_extracted.pdf")
            output_doc.save(output_path)
            output_doc.close()
            doc.close()
    
            wx.MessageBox("Extraction complete!", "Success", wx.OK | wx.ICON_INFORMATION)
    
    # app = wx.App()
    # PDFExtractor(None, title="PDF Extractor")
    # app.MainLoop()
    
    if __name__ == '__main__':
        app = wx.App()  # 运行wx.App()方法
        title = "PDF Extractor"
        frame = PDFExtractor( None , title )  # 调用Frame类,并且不指定父类,当前就成为父类
        frame.Show()  # 运行展示界面的方法Show()
        app.MainLoop()  # 进入程序wx.App()循环
相关推荐
神色自若13 小时前
Net9为PDF文字替换,使用Spire.PDF版本10.12.4.1360
pdf
机器懒得学习15 小时前
解析交通事故报告:利用 PDF、AI 与数据标准化技术构建智能分析系统
pdf
合合技术团队1 天前
高效准确的PDF解析工具,赋能企业非结构化数据治理
人工智能·科技·pdf·aigc·文档
jingling5551 天前
如何使用免费资源--知网篇
开发语言·经验分享·搜索引擎·pdf·开源
haha_qasim2 天前
怎么将pdf中的某一个提取出来?介绍几种提取PDF中页面的方法
前端·pdf
m0_748249542 天前
前端预览pdf文件流
前端·pdf
百年孤独_2 天前
高阶:基于Python paddleocr库 提取pdf 文档高亮显示的内容
开发语言·python·pdf
m0_748236582 天前
前端如何将pdf等文件传入后端
前端·pdf·状态模式
翔云API2 天前
通用文档识别接口包含PDF文档识别么?集成方式是什么
pdf
觅远2 天前
python实现Word转PDF(comtypes、win32com、docx2pdf)
python·pdf·自动化·word