使用python与Flask对pdf格式文件进行删改

我们在网上搜集的一些电子版资料多数是pdf格式,一些无良培训机构或者自媒体为了博取眼球、引流、会在倒手过程使用一些程式对一些文档进行批量添加水印,或者联系,以此原本干净整洁资料满屏"牛皮藓",简直是糟糕透了!

复制代码
from flask import Flask, request, send_file, render_template_string, jsonify
from PyPDF2 import PdfReader, PdfWriter
import os
from pdf2image import convert_from_path
import io
import base64

app = Flask(__name__)


# 根 URL 路由
@app.route('/')
def index():
    return render_template_string('''
        <!DOCTYPE html>
        <html>
        <head>
            <title>PDF Page Manager</title>
            <style>
                body {
                    font-family: Arial, sans-serif;
                }
                .grid-container {
                    display: grid;
                    grid-template-columns: repeat(5, 1fr);
                    grid-gap: 10px;
                    margin-bottom: 20px;
                }
                .grid-item {
                    text-align: center;
                }
                .grid-item img {
                    max-width: 100%;
                    height: auto;
                }
                .grid-item input[type="checkbox"] {
                    margin-top: 5px;
                }
            </style>
        </head>
        <body>
            <h1>Select Pages to Delete</h1>
            <div id="pageContainer"></div>
            <button onclick="loadPages()">Load Pages</button>
            <button onclick="submitForm()">Submit</button>

            <script>
                function loadPages() {
                    fetch('/get-pages', { method: 'GET' })
                        .then(response => response.json())
                        .then(data => {
                            const container = document.getElementById('pageContainer');
                            container.innerHTML = ''; // 清空容器
                            data.pages.forEach((page, index) => {
                                const item = document.createElement('div');
                                item.className = 'grid-item';
                                const img = document.createElement('img');
                                img.src = `data:image/png;base64,${page.image}`;
                                img.alt = `Page ${index + 1}`;
                                const checkbox = document.createElement('input');
                                checkbox.type = 'checkbox';
                                checkbox.name = 'page';
                                checkbox.value = index;
                                const label = document.createElement('label');
                                label.htmlFor = `page${index}`;
                                label.appendChild(document.createTextNode(`Page ${index + 1}`));
                                item.appendChild(img);
                                item.appendChild(checkbox);
                                item.appendChild(label);
                                container.appendChild(item);
                            });
                        });
                }

                function submitForm() {
                    const checkboxes = document.querySelectorAll('input[type=checkbox]:checked');
                    const selectedPages = Array.from(checkboxes).map(checkbox => checkbox.value);
                    fetch('/merge-pdf', {
                        method: 'POST',
                        headers: {
                            'Content-Type': 'application/json'
                        },
                        body: JSON.stringify({ selected_pages: selectedPages })
                    }).then(response => {
                        if (response.ok) {
                            alert('PDF has been modified and saved.');
                        } else {
                            alert('An error occurred while modifying the PDF.');
                        }
                    });
                }
            </script>
        </body>
        </html>
    ''')


@app.route('/get-pages', methods=['GET'])
def get_pages():
    file_path = r"D:\daku\python编辑pdf\2024年县域未成年人网络消费调研报告-佟毕铖.pdf"
    try:
        images = convert_from_path(file_path)
        page_data = []

        for i, image in enumerate(images):
            buffered = io.BytesIO()
            image.save(buffered, format="PNG")
            img_str = base64.b64encode(buffered.getvalue()).decode('utf-8')
            page_data.append({'index': i, 'image': img_str})

        return jsonify({'pages': page_data})
    except Exception as e:
        return jsonify({'error': str(e)}), 500


@app.route('/merge-pdf', methods=['POST'])
def merge_pdf():
    data = request.json
    selected_pages = data.get('selected_pages', [])

    file_path = r"D:\daku\python编辑pdf\2024年县域未成年人网络消费调研报告-佟毕铖.pdf"
    reader = PdfReader(file_path)

    writer = PdfWriter()

    for page_num in range(len(reader.pages)):
        if str(page_num) not in selected_pages:
            writer.add_page(reader.pages[page_num])

    output_path = r"D:\daku\python编辑pdf\output\modified_report.pdf"
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    with open(output_path, 'wb') as f:
        writer.write(f)

    return send_file(output_path, as_attachment=True)


if __name__ == '__main__':
    app.run(debug=True)

网页端代码:

复制代码
<!DOCTYPE html>
<html>
<head>
    <title>PDF Page Manager</title>
    <style>
        body {
            font-family: Arial, sans-serif;
        }
        .grid-container {
            display: grid;
            grid-template-columns: repeat(5, 1fr);
            grid-gap: 10px;
            margin-bottom: 20px;
        }
        .grid-item {
            text-align: center;
        }
        .grid-item img {
            max-width: 100%;
            height: auto;
        }
        .grid-item input[type="checkbox"] {
            margin-top: 5px;
        }
    </style>
</head>
<body>
    <h1>Select Pages to Delete</h1>
    <div id="pageContainer"></div>
    <button onclick="loadPages()">Load Pages</button>
    <button onclick="submitForm()">Submit</button>

    <script>
        function loadPages() {
            fetch('/get-pages', { method: 'GET' })
                .then(response => response.json())
                .then(data => {
                    const container = document.getElementById('pageContainer');
                    container.innerHTML = ''; // 清空容器
                    data.pages.forEach((page, index) => {
                        const item = document.createElement('div');
                        item.className = 'grid-item';
                        const img = document.createElement('img');
                        img.src = `data:image/png;base64,${page.image}`;
                        img.alt = `Page ${index + 1}`;
                        const checkbox = document.createElement('input');
                        checkbox.type = 'checkbox';
                        checkbox.name = 'page';
                        checkbox.value = index;
                        const label = document.createElement('label');
                        label.htmlFor = `page${index}`;
                        label.appendChild(document.createTextNode(`Page ${index + 1}`));
                        item.appendChild(img);
                        item.appendChild(checkbox);
                        item.appendChild(label);
                        container.appendChild(item);
                    });
                });
        }

        function submitForm() {
            const checkboxes = document.querySelectorAll('input[type=checkbox]:checked');
            const selectedPages = Array.from(checkboxes).map(checkbox => checkbox.value);
            fetch('/merge-pdf', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json'
                },
                body: JSON.stringify({ selected_pages: selectedPages })
            }).then(response => {
                if (response.ok) {
                    alert('PDF has been modified and saved.');
                } else {
                    alert('An error occurred while modifying the PDF.');
                }
            });
        }
    </script>
</body>
</html>

通过python抽取指定路径pdf格式文件,进行页面分割,将分割页面载入网页

勾选页脚下方小框框,在最下方点击提交保存就好啦!

相关推荐
雨田哥15 小时前
Qt Ironclad Reader (授权/加密/OFD签章/OFD验章/PDF/导出)
pdf·ofd·签章·验章·qt ofd·qt pdf·授权加密
狠学嵌入式16 小时前
耗时一个月整理了3款实用免费PDF处理网站
pdf·pdf转word·pdf添加水印·pdf处理·免费工具·免费网站·清页pdf
2501_930707782 天前
使用C#代码替换 PDF 文档中的文本
pdf
周末也要写八哥3 天前
Visual C++6.0下载安装流程及PDF学习手册资源
c++·学习·pdf
优化控制仿真模型3 天前
2026初中英语考纲词汇表(1600词)PDF电子版
经验分享·pdf
2401_876964133 天前
27考研优路|肖睿|唐辛|师大集训营|大牙|B站橙啦101公共课PDF
考研·pdf
2401_876964133 天前
27余峰|苏一|大李子|鹿吖101公共课托管班网课PDF
pdf
SEO-狼术3 天前
Visualize Trends with Bar Charts
pdf·.net
私人珍藏库3 天前
【PC】[吾爱大神原创工具] PDFImageViewer V1 永久免费的PDF图像查看和导出工具
windows·pdf·工具·软件·多功能
小饕3 天前
RAG 数据加载全攻略:从文本到 PDF 的 Loader 选型指南
人工智能·pdf