【python013】pyinstaller打包PDF提取脚本为exe工具

1.在日常工作和学习中,遇到类似问题处理场景,如pdf文件核心内容截取,这里将文件打包成exe可执行文件,实现功能简便使用。

2.欢迎点赞、关注、批评、指正,互三走起来,小手动起来!

3.欢迎点赞、关注、批评、指正,互三走起来,小手动起来!

文章目录

1.环境准备

  • 历史安装的环境打包失败,或环境包不兼容,或历史安装包太多等问题,新建环境可能会更快些。问题如4小节记录。

    bash 复制代码
    # 在 Anaconda Prompt 环境中创建虚拟环境
    conda create -n youli python==3.8.0
    
    # 激活新建的虚拟环境
    conda activate youli
    
    # 安装必要的Python环境包
    pip install fitz
    pip install pymupdf
    pip install wxpython
    pip install pyinstaller
    pip install frontend wxpython
    
    # 删除虚拟环境
    # conda create -n youli python==3.8.0
    
    # 查看当前存在哪些虚拟环境
    # conda env list 
    # conda info -e
  • 虚拟环境效果如下:

  • 环境包版本详情如下:

2.pyinstaller打包输出脚本工具

  • 命令行

    bash 复制代码
    pyinstaller -F -w ..\pdfextract.py --noconfirm --noconsole -p ..\Anaconda3\envs\python8\Lib\site-packages
  • 执行结果详情

    bash 复制代码
    (python8) C:\Users\Administrator>pyinstaller -F -w ..\pdfextract2.py --noconfirm --noconsole -p ..\Anaconda3\envs\python8\Lib\site-packages
    420 INFO: PyInstaller: 6.8.0, contrib hooks: 2024.7
    420 INFO: Python: 3.8.0 (conda)
    421 INFO: Platform: Windows-10-10.0.19041-SP0
    422 INFO: Python environment: ..\Anaconda3\envs\python8
    423 INFO: wrote C:\Users\Administrator\pdfextract2.spec
    429 DEPRECATION: Foreign Python environment's site-packages paths added to --paths/pathex:
    ['..\\Anaconda3\\envs\\python8\\Lib\\site-packages']
    This is ALWAYS the wrong thing to do. If your environment's site-packages is not in PyInstaller's module search path then you are running PyInstaller from a different environment to the one your packages are in. Run print(sys.prefix) without PyInstaller to get the environment you should be using then install and run PyInstaller from that environment instead of this one. This warning will become an error in PyInstaller 7.0.
    430 INFO: Module search paths (PYTHONPATH):
    ['..\\Anaconda3\\envs\\python8\\Scripts\\pyinstaller.exe',
     '..\\Anaconda3\\envs\\python8\\python38.zip',
     '..\\Anaconda3\\envs\\python8\\DLLs',
     '..\\Anaconda3\\envs\\python8\\lib',
     '..\\Anaconda3\\envs\\python8',
     '..\\Anaconda3\\envs\\python8\\lib\\site-packages',
     'E:\\PycharmSpace\\orclblobtest',
     '..\\Anaconda3\\envs\\python8\\Lib\\site-packages']
    733 INFO: checking Analysis
    733 INFO: Building Analysis because Analysis-00.toc is non existent
    733 INFO: Running Analysis Analysis-00.toc
    735 INFO: Target bytecode optimization level: 0
    735 INFO: Initializing module dependency graph...
    736 INFO: Caching module graph hooks...
    754 INFO: Analyzing base_library.zip ...
    1824 INFO: Loading module hook 'hook-heapq.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    1897 INFO: Loading module hook 'hook-encodings.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    3157 INFO: Loading module hook 'hook-pickle.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    3815 INFO: Caching module dependency graph...
    3965 INFO: Looking for Python shared library...
    3973 INFO: Using Python shared library: ..\Anaconda3\envs\python8\python38.dll
    3974 INFO: Analyzing E:\PycharmSpace\orclblobtest\pdfextract2.py
    5889 INFO: Loading module hook 'hook-PIL.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    5956 INFO: Loading module hook 'hook-PIL.Image.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    6952 INFO: Loading module hook 'hook-numpy.py' from '..\\Anaconda3\\envs\\python8\\Lib\\site-packages\\numpy\\_pyinstaller'...
    7029 WARNING: Conda distribution 'numpy', dependency of 'numpy', was not found. If you installed this distribution with pip then you may ignore this warning.
    7572 INFO: Loading module hook 'hook-multiprocessing.util.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    7681 INFO: Loading module hook 'hook-xml.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    8209 INFO: Loading module hook 'hook-difflib.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    8295 INFO: Loading module hook 'hook-platform.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    8669 INFO: Loading module hook 'hook-sysconfig.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    9621 INFO: Loading module hook 'hook-packaging.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    9745 INFO: Loading module hook 'hook-PIL.ImageFilter.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    10018 INFO: Loading module hook 'hook-pandas.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    11702 INFO: Loading module hook 'hook-pytz.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    12067 INFO: Loading module hook 'hook-pkg_resources.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    14388 INFO: Loading module hook 'hook-scipy.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    14539 INFO: Loading module hook 'hook-scipy.linalg.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    14909 INFO: Loading module hook 'hook-scipy.sparse.csgraph.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    15183 INFO: Loading module hook 'hook-scipy.special._ufuncs.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    15243 INFO: Loading module hook 'hook-scipy.special._ellip_harm_2.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    16644 INFO: Loading module hook 'hook-scipy.spatial.transform.rotation.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    17421 INFO: Loading module hook 'hook-scipy.stats._stats.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    18308 INFO: Loading module hook 'hook-pandas.io.formats.style.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    20782 INFO: Loading module hook 'hook-pandas.plotting.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    21009 INFO: Processing pre-safe import module hook six.moves from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\pre_safe_import_module\\hook-six.moves.py'.
    22334 INFO: Loading module hook 'hook-sqlite3.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    22879 INFO: Loading module hook 'hook-pandas.io.clipboard.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    23076 INFO: Loading module hook 'hook-xml.etree.cElementTree.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    23078 INFO: Loading module hook 'hook-lxml.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    23627 INFO: Loading module hook 'hook-lxml.etree.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    23633 INFO: Loading module hook 'hook-xml.dom.domreg.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    24205 INFO: Processing module hooks...
    24282 INFO: Loading module hook 'hook-lxml.isoschematron.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    24301 WARNING: Hidden import "jinja2" not found!
    24515 INFO: Loading module hook 'hook-PIL.SpiderImagePlugin.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    25007 INFO: Loading module hook 'hook-lxml.objectify.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\_pyinstaller_hooks_contrib\\hooks\\stdhooks'...
    25027 INFO: Performing binary vs. data reclassification (624 entries)
    25172 INFO: Looking for ctypes DLLs
    25217 INFO: Analyzing run-time hooks ...
    25225 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_pkgutil.py'
    25229 INFO: Processing pre-find module path hook _pyi_rth_utils from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\pre_find_module_path\\hook-_pyi_rth_utils.py'.
    25230 INFO: Loading module hook 'hook-_pyi_rth_utils.py' from '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks'...
    25231 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_multiprocessing.py'
    25234 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_pkgres.py'
    25237 INFO: Including run-time hook '..\\Anaconda3\\envs\\python8\\lib\\site-packages\\PyInstaller\\hooks\\rthooks\\pyi_rth_inspect.py'
    25287 INFO: Looking for dynamic libraries
    ..\Anaconda3\envs\python8\lib\site-packages\PyInstaller\building\build_main.py:205: UserWarning: The numpy.array_api submodule is still experimental. See NEP 47.
      __import__(package)
    26904 INFO: Extra DLL search directories (AddDllDirectory): ['..\\Anaconda3\\envs\\python8\\lib\\site-packages\\numpy\\.libs']
    26904 INFO: Extra DLL search directories (PATH): []
    30079 INFO: Warnings written to C:\Users\Administrator\build\pdfextract2\warn-pdfextract2.txt
    30264 INFO: Graph cross-reference written to C:\Users\Administrator\build\pdfextract2\xref-pdfextract2.html
    30336 INFO: checking PYZ
    30337 INFO: Building PYZ because PYZ-00.toc is non existent
    30337 INFO: Building PYZ (ZlibArchive) C:\Users\Administrator\build\pdfextract2\PYZ-00.pyz
    32361 INFO: Building PYZ (ZlibArchive) C:\Users\Administrator\build\pdfextract2\PYZ-00.pyz completed successfully.
    32416 INFO: checking PKG
    32416 INFO: Building PKG because PKG-00.toc is non existent
    32417 INFO: Building PKG (CArchive) pdfextract2.pkg
    59038 INFO: Building PKG (CArchive) pdfextract2.pkg completed successfully.
    59060 INFO: Bootloader ..\Anaconda3\envs\python8\lib\site-packages\PyInstaller\bootloader\Windows-64bit-intel\runw.exe
    59060 INFO: checking EXE
    59061 INFO: Building EXE because EXE-00.toc is non existent
    59061 INFO: Building EXE from EXE-00.toc
    59061 INFO: Copying bootloader EXE to C:\Users\Administrator\dist\pdfextract2.exe
    59067 INFO: Copying icon to EXE
    59072 INFO: Copying 0 resources to EXE
    59072 INFO: Embedding manifest in EXE
    59076 INFO: Appending PKG archive to EXE
    59148 INFO: Fixing EXE headers
    59664 INFO: Building EXE from EXE-00.toc completed successfully.

3.参数及生成文件释义

  • pyinstaller参数含义
  • 输出文件内容含义
    • Analysis:主要是分析py文件的依赖信息
    • PYZ:是一个.pyz的压缩包,包含程序运行需要的依赖
    • EXE:是根据上述两项内容而生成的
    • COLLECT:主要是输出信息dist文件夹:最终的exe文件存放位置
    • build文件夹:中间过程,创建好之后可以直接删除
      • 整体详情如下图所示:

4.打包错误示例及工具代码

  • 错误示例部分

  • 打包代码

    python 复制代码
    # -*- coding: utf-8 -*-
    
    import fitz
    import wx
    
    class PDFExtractor(wx.Frame):
        def __init__(self, parent, _title):
            wx.Frame.__init__(self, parent, id=wx.ID_ANY, title=_title, pos=wx.DefaultPosition,
                              size=wx.Size(500, 254), style=wx.DEFAULT_FRAME_STYLE | wx.TAB_TRAVERSAL )
    
            self.SetSizeHints(wx.DefaultSize, wx.DefaultSize)
            self.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_WINDOW))
            self.SetBackgroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_ACTIVECAPTION))
    
            bSizer2 = wx.BoxSizer(wx.VERTICAL)
    
            self.m_filePicker2 = wx.FilePickerCtrl(self, wx.ID_ANY, wx.EmptyString, u"Select a file", u"*.*",
                                                   wx.DefaultPosition, wx.DefaultSize, wx.FLP_DEFAULT_STYLE)
            self.m_filePicker2.SetFont(wx.Font(9, 74, 90, 92, False, "微软雅黑"))
            self.m_filePicker2.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_HIGHLIGHT))
            self.m_filePicker2.SetBackgroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_HIGHLIGHT))
    
            bSizer2.Add(self.m_filePicker2, 0, wx.ALL | wx.EXPAND, 5)
    
            self.m_staticText5 = wx.StaticText(self, wx.ID_ANY, u"Start Page:", wx.DefaultPosition, wx.DefaultSize, 0)
            self.m_staticText5.Wrap(-1)
            self.m_staticText5.SetFont(wx.Font(9, 74, 90, 92, True, "微软雅黑"))
            self.m_staticText5.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNTEXT))
    
            bSizer2.Add(self.m_staticText5, 0, wx.ALL, 5)
    
            self.m_textCtrl1 = wx.TextCtrl(self, wx.ID_ANY, wx.EmptyString, wx.DefaultPosition, wx.DefaultSize, 0)
            bSizer2.Add(self.m_textCtrl1, 0, wx.EXPAND, 5)
    
            self.m_staticText6 = wx.StaticText(self, wx.ID_ANY, u"End Page:", wx.DefaultPosition, wx.DefaultSize, 0)
            self.m_staticText6.Wrap(-1)
            self.m_staticText6.SetFont(wx.Font(9, 74, 90, 92, True, "微软雅黑"))
            self.m_staticText6.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNTEXT))
    
            bSizer2.Add(self.m_staticText6, 0, wx.ALL, 5)
    
            self.m_textCtrl2 = wx.TextCtrl(self, wx.ID_ANY, wx.EmptyString, wx.DefaultPosition, wx.DefaultSize, 0)
            bSizer2.Add(self.m_textCtrl2, 0, wx.EXPAND, 5)
    
            self.m_button18 = wx.Button(self, wx.ID_ANY, u"Extract", wx.DefaultPosition, wx.DefaultSize, wx.NO_BORDER)
            self.m_button18.SetFont(wx.Font(12, 74, 90, 92, False, "微软雅黑"))
            self.m_button18.SetForegroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNTEXT))
            self.m_button18.SetBackgroundColour(wx.SystemSettings.GetColour(wx.SYS_COLOUR_BTNHIGHLIGHT))
    
    
            self.m_button18.Bind(wx.EVT_BUTTON, self.extract_pages)
    
            bSizer2.Add(self.m_button18, 0, wx.ALIGN_CENTER_HORIZONTAL | wx.SHAPED, 5)
    
            self.SetSizer(bSizer2)
            self.Layout()
            self.Centre(wx.BOTH)
        def __del__(self):
            pass
    
        def extract_pages(self, event):
            file_path = self.m_filePicker2.GetPath()
            start_page = int(self.m_textCtrl1.GetValue())
            end_page = int(self.m_textCtrl2.GetValue())
    
            doc = fitz.open(file_path)
            output_doc = fitz.open()
    
            for page_num in range(start_page - 1, end_page):
                output_doc.insert_pdf(doc, from_page=page_num, to_page=page_num)
    
            output_path = file_path.replace(".pdf", "_extracted.pdf")
            output_doc.save(output_path)
            output_doc.close()
            doc.close()
    
            wx.MessageBox("Extraction complete!", "Success", wx.OK | wx.ICON_INFORMATION)
    
    # app = wx.App()
    # PDFExtractor(None, title="PDF Extractor")
    # app.MainLoop()
    
    if __name__ == '__main__':
        app = wx.App()  # 运行wx.App()方法
        title = "PDF Extractor"
        frame = PDFExtractor( None , title )  # 调用Frame类,并且不指定父类,当前就成为父类
        frame.Show()  # 运行展示界面的方法Show()
        app.MainLoop()  # 进入程序wx.App()循环
相关推荐
穆友航10 小时前
PDF内容提取,MinerU使用
数据分析·pdf
拾荒的小海螺1 天前
JAVA:探索 PDF 文字提取的技术指南
java·开发语言·pdf
村东头老张1 天前
Java 实现PDF添加水印
java·开发语言·pdf
好美啊啊啊啊!2 天前
Thymeleaf模板引擎生成的html字符串转换成pdf
pdf·html
zhentiya2 天前
曼昆《经济学原理》第八版课后答案及英文版PDF
大数据·pdf
三天不学习2 天前
如何解决pdf.js跨域从url动态加载pdf文档
javascript·pdf
吾店云建站2 天前
9个最佳WordPress PDF插件(查看器、嵌入和下载)
程序人生·pdf·创业创新·流量运营·程序员创富·教育电商
007php0072 天前
基于企业微信客户端设计一个文件下载与预览系统
开发语言·python·docker·golang·pdf·php·企业微信
慧都小妮子2 天前
Spire.PDF for .NET【页面设置】演示:更改 PDF 页面大小
前端·pdf·.net