构建PaddleOCRv3的docker镜像

参考paddlehub_ppocr: 我们把 PaddleOCR 服务打包成一个镜像,以便在 Docker 或 k8s 环境里,快速发布到线上使用

但是这个参考的内容很多都失效了,镜像只留下了一个duolabmeng666/paddlehub_ppocr:1.0

几个云函数的镜像都失效了,readme的构建脚本也失效了paddlepaddle==2.0.2 这个版本在pip各镜像里已经不存在了。

对项目里的500M镜像持怀疑态度,dockerhub上的镜像是2个多G,自己构建完最小也是2个多G

花点时间重新构建了一下。

requirements.txt

复制代码
albumentations==1.3.1
astor==0.8.1
Babel==2.14.0
bce-python-sdk==0.9.35
certifi==2025.4.26
cfgv==3.3.1
charset-normalizer==3.4.2
click==8.1.8
colorama==0.4.6
colorlog==6.9.0
cycler==0.11.0
decorator==5.1.1
dill==0.3.7
distlib==0.3.9
easydict==1.13
filelock==3.12.2
flake8==5.0.4
Flask==2.2.5
flask-babel==3.1.0
fonttools==4.38.0
future==1.0.0
gitdb==4.0.12
GitPython==3.1.44
gunicorn==23.0.0
h5py==3.8.0
identify==2.5.24
idna==3.10
imageio==2.31.2
imgaug==0.4.0
importlib-metadata==4.2.0
itsdangerous==2.1.2
jieba==0.42.1
jinja2==3.1.6
joblib==1.3.2
kiwisolver==1.4.5
lmdb==1.3.0
MarkupSafe==2.1.5
matplotlib==3.5.3
mccabe==0.7.0
multiprocess==0.70.15
networkx==2.6.3
nodeenv==1.9.1
numpy==1.21.6
opencv-contrib-python==4.2.0.32
opencv-python==4.11.0.86
opencv-python-headless==4.11.0.86
opt-einsum==3.3.0
packaging==24.0
paddle-bfloat==0.1.7
paddle2onnx==0.5.1
paddlehub==2.1.0
paddlenlp==2.0.0
paddlepaddle==2.4.2
pandas==1.3.5
Pillow==9.5.0
platformdirs==2.6.2
pre-commit==2.21.0
protobuf==3.20.0
pyclipper==1.3.0.post2
pycodestyle==2.9.1
pycryptodome==3.23.0
pyflakes==2.5.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-Levenshtein==0.12.2
pytz==2025.2
PyWavelets==1.3.0
PyYAML==6.0.1
pyzmq==26.2.1
qudida==0.0.4
rarfile==4.2
requests==2.31.0
scikit-image==0.17.2
scikit-learn==1.0.2
scipy==1.7.3
seqeval==1.2.2
Shapely==1.8.1.post1
shellcheck-py==0.9.0.5
six==1.17.0
smmap==5.0.2
threadpoolctl==3.1.0
tifffile==2021.11.2
tqdm==4.64.0
typing-extensions==4.7.1
urllib3==2.0.7
virtualenv==20.16.2
visualdl==2.2.3
Werkzeug==2.2.3
zipp==3.15.0

Dockerfile

复制代码
# ================= 第一阶段:构建阶段(builder)=================
FROM python:3.7.10-slim as builder

# 替换 apt-get 源为阿里云镜像
RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \
    sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list

# 安装系统依赖
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        g++ \
        libglib2.0-dev \
        libgl1-mesa-glx \
        libsm6 \
        libxrender1 \
        wget \
        unzip && \
    rm -rf /var/lib/apt/lists/* && \
    mkdir -p /PaddleOCR/inference

# 设置工作目录
WORKDIR /app

# 创建离线包目录并复制 requirements.txt
COPY requirements.txt ./
RUN mkdir -p pg

# 下载 Python 包到本地(离线安装用)
RUN pip download -r requirements.txt -d ./pg --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple && \
    pip download paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple -d ./pg && \
    pip download paddlehub==2.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple -d ./pg

# 下载 OCR 模型文件(det, cls, rec) - 改为 v3 版本
RUN wget -O /PaddleOCR/inference/ch_PP-OCRv3_det_infer.tar https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && \
    wget -O /PaddleOCR/inference/ch_ppocr_mobile_v2.0_cls_infer.tar https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && \
    wget -O /PaddleOCR/inference/ch_PP-OCRv3_rec_infer.tar https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar

# 解压模型文件(为后续 COPY 准备)
RUN mkdir -p /tmp/ocr_det && tar xf /PaddleOCR/inference/ch_PP-OCRv3_det_infer.tar -C /tmp/ocr_det && \
    mkdir -p /tmp/ocr_cls && tar xf /PaddleOCR/inference/ch_ppocr_mobile_v2.0_cls_infer.tar -C /tmp/ocr_cls && \
    mkdir -p /tmp/ocr_rec && tar xf /PaddleOCR/inference/ch_PP-OCRv3_rec_infer.tar -C /tmp/ocr_rec

# ================= 第二阶段:运行阶段(runtime)=================
FROM python:3.7.10-slim

# 替换 apt-get 源为阿里云镜像
RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \
    sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list

# 安装必要系统依赖
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        libglib2.0-0 \
        libgl1 \
        libsm6 \
        libxrender1 \
        g++ && \
    rm -rf /var/lib/apt/lists/*

# 设置工作目录
WORKDIR /app

# 复制依赖包和 requirements.txt
COPY --from=builder /app/pg ./pg
COPY --from=builder /app/requirements.txt ./

# 安装 Python 依赖(从离线包中安装)
RUN pip install --no-cache-dir -r requirements.txt --find-links ./pg -i https://pypi.tuna.tsinghua.edu.cn/simple && \
    pip install --no-cache-dir paddlepaddle --find-links ./pg -i https://pypi.tuna.tsinghua.edu.cn/simple && \
    pip install --no-cache-dir paddlehub --find-links ./pg -i https://pypi.tuna.tsinghua.edu.cn/simple && \
    rm -rf ./pg && \
    rm -rf /root/.cache/pip

# 复制模型文件
COPY --from=builder /tmp/ocr_det/ch_PP-OCRv3_det_infer /PaddleOCR/inference/PP-OCRv3_mobile_det_infer
COPY --from=builder /tmp/ocr_cls/ch_ppocr_mobile_v2.0_cls_infer /PaddleOCR/inference/ch_ppocr_mobile_v2.0_cls_infer
COPY --from=builder /tmp/ocr_rec/ch_PP-OCRv3_rec_infer /PaddleOCR/inference/ch_PP-OCRv3_rec_infer

# 复制 PaddleOCR 项目源码(需要在构建上下文中提供该目录)
COPY PaddleOCR /PaddleOCR

# 设置工作目录为 PaddleOCR
WORKDIR /PaddleOCR

# 安装 Hubserving 模块
RUN hub install deploy/hubserving/ocr_system && \
    hub install deploy/hubserving/ocr_cls && \
    hub install deploy/hubserving/ocr_det && \
    hub install deploy/hubserving/ocr_rec

EXPOSE 9000

CMD ["bash", "-c", "cd /PaddleOCR && hub serving start --modules ocr_system ocr_cls ocr_det ocr_rec -p 9000"]

然后按下面的命令进行打包

复制代码
git clone https://gitee.com/paddlepaddle/PaddleOCR.git
rm -rf PaddleOCR/.git
docker build -t ppocr:1.0 .

运行

复制代码
docker run -itd --name ppocr -p 9000:9000  ppocr:1.0

测试访问

复制代码
curl --location 'http://192.168.36.128:9000/predict/ocr_system' \
--header 'Content-Type: application/json' \
--data '{
    "images": [
        "/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAjAGkDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3sxsxkDOSjjAA4K+uCP8A9fXn0I2GTHuLmMAM5I5OO+Ohxg9B1FNmjinkSOWEtsIlRiOAwPY9iP1z35r57+IV/p0vxiubHx1f3jeGhbJJZw2kv7sMFIBdUy2QxnGcbsleiHgA+gyty24GVFG9WVlXnbnlSD7DGffoMcymQBXOGwnXCn0zx6/hXinwks1j+IHiaHw+J5fBETZg85lkjS7Gz/VtzkgeYNyk5UIWJ+Un2lYEQMAZPmfecyMefbJ4HHQcUAORSqKpcuQACzYyfc44qrbTOsRe4kk2qzAM6bcr94M3Hy4HHp64PA8H8G+BdB8bePPiOmtWhmkttUb7O4ldfLLSz54VhnO1ev4YrQ+ElxDa/E7xNpfhWe5uPCKwrMiMS0ccxaMcMR1A80Dn51jHLbQaAPbHd7iINaTx7HVh5g+bB7MOxwe368YL4ZvNaUBcLG+wN2bgZP5kj6g1FbS2s0a3cD/u3yoOSqklv7vTJPfGa4X4VXHhHUPD89z4P0q/sbGG9IeC5cnMpRQXBLsPusBgH8OlAHoOZF8sbQ5Jw7DgDg84+uBj3qE3DMZBHsASRUDuflbkBgP9rqPTOOc5AgeWW5iI8u6hljALRRsmST0wT1AxnsD0OeRWb4m1qy8GeErvVLiOKSKygDiAAgyyAoiAHkgbigyQcZB7UAbvmOYRIw8gDlxLg4UdeQcD65NNjjby08mfKbgyk5fcvcEkknuQfp1A58k8J/D/AP4T3w7YeJPHl/daveXY86K2eYpBBERgBFiYAMwVWJGOcArkHMfiK3b4Q+INBvtPubj/AIQq5uhBd6dPJ50dm+GIeItucZ3PIQoyShBPzKAAexMHWTzA0jLgL5Y246/e55/Xp2zUlVI5Ab8jzH2yR741DBkcDGWHGQRkcZxznrnFugCvcLOZ4DEp27iHPmYwPpgg9PY/ma8U0xbLx98XvEGl+OnP2jTy9vpGluSkTQncHdQRlnKbHDZDEHcvCrs9xd9iFtrNjqFGTXGeI/hv4e8T373+saXbTzqrYnXcjPyMbxFtZyFUKMscAcDk0AcR4Ys5vAnxwfwj4eu5Lnw/f27Xl3ZtmT7A21iPmzlTlYxluqyIDuIVq9lvZJorSR7dN83ARducknHPI4/Hiszwz4Z0Dw1pyxeH9NgtIJVDF0BLyDJI3O2WbG443E4zgYq9NEuoQJvjjZBIHRg/UdQyMOQef5joc0AeE+AfBOgeM/HvxBPiPTvtclrqZ8sebJFsLyz7uEf/AGR3PTrXtGl6Np2g6YukaNp1rbxBMukcWI2YBAd5ySWZcctuJAyc45TTPDeh+GrvV9VsLT7NNqLm5vpfMd/MYFmzgkgcuxwoHX6VqJC6Xssu/Mcij5TyQR6eg9ueT25yAV/Ilulh85w8YJKyRSH5hkFWI4HQdgcEgrjt5N+zkqN4MvSzKXXUpSqsnI/dQglW/Qjn+Hp39D8TeJJfDn2dF8M6rqtjIjJIdMgEzRtxtUx5GVID5PRcKP4hXN/A/wAN6l4a8ANFqkQhnu7x7pYc/PGpVEAcfwtlDleoyAcHIAB6DOfKiyJE85iUjaRgpyx6A4P4DBzgZrlfiH4dl8Y/D+80623NeTQCW2KEbWkUq4X720btu0MSQM5z69DBKbq8uCsJ8pwI/NUsjYAJ746EsODkHgjvVtEljjbasIOXIVQQCSxIye3vweTQI8/+FHijS/E/gjSbKG+ji1HT7NbeWyWVfMj8rCCUAgEgjac8r8205INcx8Zby08Tavonw/09s3z3/wBpu3Vs/Zk2FixUkKflkd+WBAT0YGu91/4f+FfF93HdarokD3qtiWYLLG0mAoILoUL8BQpbOADgda5Y/CXUvCt1dX3gLW47ZZZYp20zUbdZY5GjdiF8376KFYgYBY925yDQZ6pGoaTezq8qKEbaB8p4J9xng4z2FS1maOL5raB9UmsJdREOLlrIMsYbceAGYkgcjJwchjgbiBp0AFNeNZNu4Z2sGU9CDRRQA6o4reGEkxQxxlgASigZAGAPwFFFAElFFFADTGrOrkfMucH6/wCf0FOoooAKKKKAIZLWGW5huHTMsO7y2yflyMGpqKKAEVFQYVQoyTgDHJOSfzpaKKAP/9k="
    ]
}'

上面用的最新版代码进行打包,如果需要用旧版代码代码,可以进行如下操作。

改用其他版本的打包的命令

复制代码
curl -O https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/requirements.txt
git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleOCR.git
rm -rf PaddleOCR/.git
docker build -t ppocr:0.9 .
docker tag ppocr:0.9 registry.cn-hongkong.aliyuncs.com/llapi/ppocr:0.9
docker push registry.cn-hongkong.aliyuncs.com/llapi/ppocr:0.9

dockerfile里模型文件根据/deploy/hubserving/ocr_system/params.py文件内容进行调整。

相关推荐
IT成长日记19 分钟前
【Docker基础】Docker核心概念:容器(Container)与镜像(Image)的区别与联系
运维·docker·区别与联系·container·imgae
容器魔方29 分钟前
科大讯飞基于Volcano实现AI基础设施突破,赢得CNCF最终用户案例研究竞赛
云原生·容器·云计算
德育处主任39 分钟前
亚马逊云 Lambda 容器化部署教程
后端·容器
xixingzhe21 小时前
docker compose安装Prometheus、Grafana
docker·grafana·prometheus
Kevin不想说话926192 小时前
WSL2 Ubuntu Docker 完整部署指南
docker
啃火龙果的兔子2 小时前
在服务器上使用 Docker 部署 Node.js 后端服务和前端项目
服务器·docker·node.js
程序员阿超的博客3 小时前
云原生核心技术 (10/12): K8s 终极实战:从零部署一个 Spring Boot + MySQL + Redis 应用
spring boot·云原生·kubernetes
风清再凯3 小时前
docker-compose容器单机编排
docker·容器·dubbo
ℳ₯㎕ddzོꦿ࿐4 小时前
Ubuntu 24.04 上安装与 Docker 部署 Sentinel
ubuntu·docker·sentinel
互联网搬砖老肖5 小时前
Web 架构之 Kubernetes 弹性伸缩策略设计
前端·架构·kubernetes