参考paddlehub_ppocr: 我们把 PaddleOCR 服务打包成一个镜像,以便在 Docker 或 k8s 环境里,快速发布到线上使用
但是这个参考的内容很多都失效了,镜像只留下了一个duolabmeng666/paddlehub_ppocr:1.0
几个云函数的镜像都失效了,readme的构建脚本也失效了paddlepaddle==2.0.2 这个版本在pip各镜像里已经不存在了。
对项目里的500M镜像持怀疑态度,dockerhub上的镜像是2个多G,自己构建完最小也是2个多G
花点时间重新构建了一下。
requirements.txt
albumentations==1.3.1
astor==0.8.1
Babel==2.14.0
bce-python-sdk==0.9.35
certifi==2025.4.26
cfgv==3.3.1
charset-normalizer==3.4.2
click==8.1.8
colorama==0.4.6
colorlog==6.9.0
cycler==0.11.0
decorator==5.1.1
dill==0.3.7
distlib==0.3.9
easydict==1.13
filelock==3.12.2
flake8==5.0.4
Flask==2.2.5
flask-babel==3.1.0
fonttools==4.38.0
future==1.0.0
gitdb==4.0.12
GitPython==3.1.44
gunicorn==23.0.0
h5py==3.8.0
identify==2.5.24
idna==3.10
imageio==2.31.2
imgaug==0.4.0
importlib-metadata==4.2.0
itsdangerous==2.1.2
jieba==0.42.1
jinja2==3.1.6
joblib==1.3.2
kiwisolver==1.4.5
lmdb==1.3.0
MarkupSafe==2.1.5
matplotlib==3.5.3
mccabe==0.7.0
multiprocess==0.70.15
networkx==2.6.3
nodeenv==1.9.1
numpy==1.21.6
opencv-contrib-python==4.2.0.32
opencv-python==4.11.0.86
opencv-python-headless==4.11.0.86
opt-einsum==3.3.0
packaging==24.0
paddle-bfloat==0.1.7
paddle2onnx==0.5.1
paddlehub==2.1.0
paddlenlp==2.0.0
paddlepaddle==2.4.2
pandas==1.3.5
Pillow==9.5.0
platformdirs==2.6.2
pre-commit==2.21.0
protobuf==3.20.0
pyclipper==1.3.0.post2
pycodestyle==2.9.1
pycryptodome==3.23.0
pyflakes==2.5.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-Levenshtein==0.12.2
pytz==2025.2
PyWavelets==1.3.0
PyYAML==6.0.1
pyzmq==26.2.1
qudida==0.0.4
rarfile==4.2
requests==2.31.0
scikit-image==0.17.2
scikit-learn==1.0.2
scipy==1.7.3
seqeval==1.2.2
Shapely==1.8.1.post1
shellcheck-py==0.9.0.5
six==1.17.0
smmap==5.0.2
threadpoolctl==3.1.0
tifffile==2021.11.2
tqdm==4.64.0
typing-extensions==4.7.1
urllib3==2.0.7
virtualenv==20.16.2
visualdl==2.2.3
Werkzeug==2.2.3
zipp==3.15.0
Dockerfile
# ================= 第一阶段:构建阶段(builder)=================
FROM python:3.7.10-slim as builder
# 替换 apt-get 源为阿里云镜像
RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \
sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list
# 安装系统依赖
RUN apt-get update && \
apt-get install -y --no-install-recommends \
g++ \
libglib2.0-dev \
libgl1-mesa-glx \
libsm6 \
libxrender1 \
wget \
unzip && \
rm -rf /var/lib/apt/lists/* && \
mkdir -p /PaddleOCR/inference
# 设置工作目录
WORKDIR /app
# 创建离线包目录并复制 requirements.txt
COPY requirements.txt ./
RUN mkdir -p pg
# 下载 Python 包到本地(离线安装用)
RUN pip download -r requirements.txt -d ./pg --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple && \
pip download paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple -d ./pg && \
pip download paddlehub==2.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple -d ./pg
# 下载 OCR 模型文件(det, cls, rec) - 改为 v3 版本
RUN wget -O /PaddleOCR/inference/ch_PP-OCRv3_det_infer.tar https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar && \
wget -O /PaddleOCR/inference/ch_ppocr_mobile_v2.0_cls_infer.tar https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar && \
wget -O /PaddleOCR/inference/ch_PP-OCRv3_rec_infer.tar https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar
# 解压模型文件(为后续 COPY 准备)
RUN mkdir -p /tmp/ocr_det && tar xf /PaddleOCR/inference/ch_PP-OCRv3_det_infer.tar -C /tmp/ocr_det && \
mkdir -p /tmp/ocr_cls && tar xf /PaddleOCR/inference/ch_ppocr_mobile_v2.0_cls_infer.tar -C /tmp/ocr_cls && \
mkdir -p /tmp/ocr_rec && tar xf /PaddleOCR/inference/ch_PP-OCRv3_rec_infer.tar -C /tmp/ocr_rec
# ================= 第二阶段:运行阶段(runtime)=================
FROM python:3.7.10-slim
# 替换 apt-get 源为阿里云镜像
RUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \
sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list
# 安装必要系统依赖
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libglib2.0-0 \
libgl1 \
libsm6 \
libxrender1 \
g++ && \
rm -rf /var/lib/apt/lists/*
# 设置工作目录
WORKDIR /app
# 复制依赖包和 requirements.txt
COPY --from=builder /app/pg ./pg
COPY --from=builder /app/requirements.txt ./
# 安装 Python 依赖(从离线包中安装)
RUN pip install --no-cache-dir -r requirements.txt --find-links ./pg -i https://pypi.tuna.tsinghua.edu.cn/simple && \
pip install --no-cache-dir paddlepaddle --find-links ./pg -i https://pypi.tuna.tsinghua.edu.cn/simple && \
pip install --no-cache-dir paddlehub --find-links ./pg -i https://pypi.tuna.tsinghua.edu.cn/simple && \
rm -rf ./pg && \
rm -rf /root/.cache/pip
# 复制模型文件
COPY --from=builder /tmp/ocr_det/ch_PP-OCRv3_det_infer /PaddleOCR/inference/PP-OCRv3_mobile_det_infer
COPY --from=builder /tmp/ocr_cls/ch_ppocr_mobile_v2.0_cls_infer /PaddleOCR/inference/ch_ppocr_mobile_v2.0_cls_infer
COPY --from=builder /tmp/ocr_rec/ch_PP-OCRv3_rec_infer /PaddleOCR/inference/ch_PP-OCRv3_rec_infer
# 复制 PaddleOCR 项目源码(需要在构建上下文中提供该目录)
COPY PaddleOCR /PaddleOCR
# 设置工作目录为 PaddleOCR
WORKDIR /PaddleOCR
# 安装 Hubserving 模块
RUN hub install deploy/hubserving/ocr_system && \
hub install deploy/hubserving/ocr_cls && \
hub install deploy/hubserving/ocr_det && \
hub install deploy/hubserving/ocr_rec
EXPOSE 9000
CMD ["bash", "-c", "cd /PaddleOCR && hub serving start --modules ocr_system ocr_cls ocr_det ocr_rec -p 9000"]
然后按下面的命令进行打包
git clone https://gitee.com/paddlepaddle/PaddleOCR.git
rm -rf PaddleOCR/.git
docker build -t ppocr:1.0 .
运行
docker run -itd --name ppocr -p 9000:9000 ppocr:1.0
测试访问
curl --location 'http://192.168.36.128:9000/predict/ocr_system' \
--header 'Content-Type: application/json' \
--data '{
"images": [
"/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAjAGkDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3sxsxkDOSjjAA4K+uCP8A9fXn0I2GTHuLmMAM5I5OO+Ohxg9B1FNmjinkSOWEtsIlRiOAwPY9iP1z35r57+IV/p0vxiubHx1f3jeGhbJJZw2kv7sMFIBdUy2QxnGcbsleiHgA+gyty24GVFG9WVlXnbnlSD7DGffoMcymQBXOGwnXCn0zx6/hXinwks1j+IHiaHw+J5fBETZg85lkjS7Gz/VtzkgeYNyk5UIWJ+Un2lYEQMAZPmfecyMefbJ4HHQcUAORSqKpcuQACzYyfc44qrbTOsRe4kk2qzAM6bcr94M3Hy4HHp64PA8H8G+BdB8bePPiOmtWhmkttUb7O4ldfLLSz54VhnO1ev4YrQ+ElxDa/E7xNpfhWe5uPCKwrMiMS0ccxaMcMR1A80Dn51jHLbQaAPbHd7iINaTx7HVh5g+bB7MOxwe368YL4ZvNaUBcLG+wN2bgZP5kj6g1FbS2s0a3cD/u3yoOSqklv7vTJPfGa4X4VXHhHUPD89z4P0q/sbGG9IeC5cnMpRQXBLsPusBgH8OlAHoOZF8sbQ5Jw7DgDg84+uBj3qE3DMZBHsASRUDuflbkBgP9rqPTOOc5AgeWW5iI8u6hljALRRsmST0wT1AxnsD0OeRWb4m1qy8GeErvVLiOKSKygDiAAgyyAoiAHkgbigyQcZB7UAbvmOYRIw8gDlxLg4UdeQcD65NNjjby08mfKbgyk5fcvcEkknuQfp1A58k8J/D/AP4T3w7YeJPHl/daveXY86K2eYpBBERgBFiYAMwVWJGOcArkHMfiK3b4Q+INBvtPubj/AIQq5uhBd6dPJ50dm+GIeItucZ3PIQoyShBPzKAAexMHWTzA0jLgL5Y246/e55/Xp2zUlVI5Ab8jzH2yR741DBkcDGWHGQRkcZxznrnFugCvcLOZ4DEp27iHPmYwPpgg9PY/ma8U0xbLx98XvEGl+OnP2jTy9vpGluSkTQncHdQRlnKbHDZDEHcvCrs9xd9iFtrNjqFGTXGeI/hv4e8T373+saXbTzqrYnXcjPyMbxFtZyFUKMscAcDk0AcR4Ys5vAnxwfwj4eu5Lnw/f27Xl3ZtmT7A21iPmzlTlYxluqyIDuIVq9lvZJorSR7dN83ARducknHPI4/Hiszwz4Z0Dw1pyxeH9NgtIJVDF0BLyDJI3O2WbG443E4zgYq9NEuoQJvjjZBIHRg/UdQyMOQef5joc0AeE+AfBOgeM/HvxBPiPTvtclrqZ8sebJFsLyz7uEf/AGR3PTrXtGl6Np2g6YukaNp1rbxBMukcWI2YBAd5ySWZcctuJAyc45TTPDeh+GrvV9VsLT7NNqLm5vpfMd/MYFmzgkgcuxwoHX6VqJC6Xssu/Mcij5TyQR6eg9ueT25yAV/Ilulh85w8YJKyRSH5hkFWI4HQdgcEgrjt5N+zkqN4MvSzKXXUpSqsnI/dQglW/Qjn+Hp39D8TeJJfDn2dF8M6rqtjIjJIdMgEzRtxtUx5GVID5PRcKP4hXN/A/wAN6l4a8ANFqkQhnu7x7pYc/PGpVEAcfwtlDleoyAcHIAB6DOfKiyJE85iUjaRgpyx6A4P4DBzgZrlfiH4dl8Y/D+80623NeTQCW2KEbWkUq4X720btu0MSQM5z69DBKbq8uCsJ8pwI/NUsjYAJ746EsODkHgjvVtEljjbasIOXIVQQCSxIye3vweTQI8/+FHijS/E/gjSbKG+ji1HT7NbeWyWVfMj8rCCUAgEgjac8r8205INcx8Zby08Tavonw/09s3z3/wBpu3Vs/Zk2FixUkKflkd+WBAT0YGu91/4f+FfF93HdarokD3qtiWYLLG0mAoILoUL8BQpbOADgda5Y/CXUvCt1dX3gLW47ZZZYp20zUbdZY5GjdiF8376KFYgYBY925yDQZ6pGoaTezq8qKEbaB8p4J9xng4z2FS1maOL5raB9UmsJdREOLlrIMsYbceAGYkgcjJwchjgbiBp0AFNeNZNu4Z2sGU9CDRRQA6o4reGEkxQxxlgASigZAGAPwFFFAElFFFADTGrOrkfMucH6/wCf0FOoooAKKKKAIZLWGW5huHTMsO7y2yflyMGpqKKAEVFQYVQoyTgDHJOSfzpaKKAP/9k="
]
}'
上面用的最新版代码进行打包,如果需要用旧版代码代码,可以进行如下操作。
改用其他版本的打包的命令
curl -O https://gitee.com/paddlepaddle/PaddleOCR/raw/release/2.6/requirements.txt
git clone -b release/2.6 https://github.com/PaddlePaddle/PaddleOCR.git
rm -rf PaddleOCR/.git
docker build -t ppocr:0.9 .
docker tag ppocr:0.9 registry.cn-hongkong.aliyuncs.com/llapi/ppocr:0.9
docker push registry.cn-hongkong.aliyuncs.com/llapi/ppocr:0.9
dockerfile里模型文件根据/deploy/hubserving/ocr_system/params.py文件内容进行调整。