PPIO上新GPU实例模板,一键部署PaddleOCR-VL

今天,PPIO 上线 OCR 领域的 SOTA 模型------百度 PaddleOCR-VL。

PaddleOCR-VL 是一款先进、高效的文档解析模型,专为文档中的元素识别设计。其核心组件为 PaddleOCR-VL-0.9B,这是一种紧凑而强大的视觉语言模型(VLM),它由 NaViT 风格的动态分辨率视觉编码器与 ERNIE-4.5-0.3B 语言模型组成,能够实现精准的元素识别。

PaddleOCR-VL 支持 109 种语言,并在识别复杂元素(如文本、表格、公式和图表)方面表现出色,同时保持极低的资源消耗。它显著优于现有的基于Pipeline方案和文档解析多模态方案以及先进的通用多模态大模型,并具备更快的推理速度。

现在,你可以通过 PPIO 算力市场的 PaddleOCR-VL 模板,一键部署在 GPU 云服务器上,10 分钟就能拥有 PaddleOCR-VL 专属模型。

PPIO 的算力市场模板,是将大语言模型进行私有化部署的模板,方便企业及个人开发者降低模型部署成本,实现高效、安全调用。

同时,PaddleOCR-VL 的 API 服务也已上线。你可以点击文末阅读原文了解更多。

1. GPU 实例+模板,一键部署 PaddleOCR-VL

**Step 1:**打开 PPIO 控制台算力市场---选择模板,搜索 PaddleOCR-VL 并使用该模板。

**Step 2:**按照所需配置点击部署,GPU 卡型推荐 RTX 4090。

Step 3:检查磁盘大小等信息,确认无误后点击下一步。

Step 4: 选择计费方式后,点击部署。

step 5: 稍等一会,实例创建需要一些时间。

step 6: 在实例管理里可以查看到所创建的实例。

step 7: 查看实例日志,确保服务正常启动。

step 8: 点击启动 Web Terminal 选项,启动后点击连接选项即可连接到网页终端。

2.如何使用

第一步:测试用例如下,后续将被命名为 test.py

复制代码
import base64
import requests
import pathlib

API_URL = "http://localhost:8080/layout-parsing"  # Service URL

image_path = "./demo.jpg"

# Encode local image to Base64
with open(image_path, "rb") as file:
    image_bytes = file.read()
    image_data = base64.b64encode(image_bytes).decode("ascii")

payload = {
    "file": image_data,  # Base64 encoded file content or file URL
    "fileType": 1,  # File type, 1 means image file
}

# Call the API
response = requests.post(API_URL, json=payload)

# Process the API response data
assert response.status_code == 200
result = response.json()["result"]
for i, res in enumerate(result["layoutParsingResults"]):
    print(res["prunedResult"])
    md_dir = pathlib.Path(f"markdown_{i}")
    md_dir.mkdir(exist_ok=True)
    (md_dir / "doc.md").write_text(res["markdown"]["text"])
    for img_path, img in res["markdown"]["images"].items():
        img_path = md_dir / img_path
        img_path.parent.mkdir(parents=True, exist_ok=True)
        img_path.write_bytes(base64.b64decode(img))
    print(f"Markdown document saved at {md_dir / 'doc.md'}")
    for img_name, img in res["outputImages"].items():
        img_path = f"{img_name}_{i}.jpg"
        pathlib.Path(img_path).parent.mkdir(exist_ok=True)
        with open(img_path, "wb") as f:
            f.write(base64.b64decode(img))
        print(f"Output image saved at {img_path}")

第二步:准备OCR所需的图片。这里使用的是官方案例

https://github.com/PaddlePaddle/PaddleOCR/blob/main/tests/test_files/book.jpg

复制代码
curl https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/tests/test_files/book.jpg -o demo.jpg 

复制端口映射地址并在 test.py 文件中替换 API URL

第三步:运行 python test.py 检查输出结果

复制代码
{'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_chart_recognition': False, 'format_block_content': False}, 'parsing_res_list': [{'block_label': 'text', 'block_content': "Chances of the lottery jackpot, but it's also use combination formulas to work out the chances of the other prizes, but it all starts to get a bit fiddly so we'll move on to something else. (How to work out the other lottery chances is just one of the amazing features you'll find at: www.murderousmaths.co.uk)", 'block_bbox': [180, 0, 511, 107], 'block_id': 0, 'block_order': 1}, {'block_label': 'paragraph_title', 'block_content': 'The disappearing sum', 'block_bbox': [178, 115, 308, 132], 'block_id': 1, 'block_order': 2}, {'block_label': 'text', 'block_content': "It's Friday evening. The lovely Veronica Gumfloss has been out with the football team who have all escorted her safely back to her doorstep. It's that tender moment when each hopeful player closes his eyes and leans forward with quivering lips. Unfortunately Veronica's parents heard them clumping down the road and Veronica knows she only has time to kiss four out of the eleven of them if she's going to do it properly.", 'block_bbox': [174, 125, 505, 282], 'block_id': 2, 'block_order': 3}, {'block_label': 'image', 'block_content': '', 'block_bbox': [177, 284, 489, 468], 'block_id': 3, 'block_order': None}, {'block_label': 'vision_footnote', 'block_content': "How many choices has she got? It's  $ ^{11}C_{4} $  which is  $ \\frac{11}{4! \\times 7!} $  but for goodness sake DON'T reach for the calculator! The most brilliant thing about perms and", 'block_bbox': [163, 458, 493, 528], 'block_id': 4, 'block_order': None}, {'block_label': 'number', 'block_content': '94', 'block_bbox': [300, 545, 325, 563], 'block_id': 5, 'block_order': None}, {'block_label': 'text', 'block_content': "means that EVERYTHING ON THE BOTTOM ALWAYS CANCELS OUT! It's probably the best fun you'll ever have with a pencil so here we go...", 'block_bbox': [551, 0, 890, 76], 'block_id': 6, 'block_order': 4}, {'block_label': 'display_formula', 'block_content': ' $$ \\frac{11!}{4!\\times7!}=\\frac{11\\times10\\times9\\times8\\times7\\times6\\times5\\times4\\times3\\times2\\times1}{4\\times3\\times2\\times1\\times7\\times6\\times5\\times4\\times3\\times2\\times1} $$ ', 'block_bbox': [573, 74, 879, 124], 'block_id': 7, 'block_order': 5}, {'block_label': 'text', 'block_content': "(Before we continue, grab this book and show somebody this sum. Rub their face on it if you need to and tell them that this is the sort of thing you do for fun without a calculator these days because you're so brilliant.)", 'block_bbox': [549, 124, 887, 206], 'block_id': 8, 'block_order': 6}, {'block_label': 'text', 'block_content': "Off we go then. For starters we'll get rid of the 7!bit from top and bottom and get:", 'block_bbox': [549, 205, 886, 244], 'block_id': 9, 'block_order': 7}, {'block_label': 'display_formula', 'block_content': ' $$ \\frac{11\\times10\\times9\\times8}{4\\times3\\times2\\times1} $$ ', 'block_bbox': [676, 254, 768, 290], 'block_id': 10, 'block_order': 8}, {'block_label': 'text', 'block_content': 'Pow! That\'s already got rid of more than half the numbers. Next we\'ll see that the  $ 4 \\times 2 $  on the bottom cancels out the 8 on top (and we don\'t need that "×1" on the bottom either). We\'re left with...', 'block_bbox': [546, 300, 885, 372], 'block_id': 11, 'block_order': 9}, {'block_label': 'display_formula', 'block_content': ' $$ \\frac{11\\times10\\times9}{3} $$ ', 'block_bbox': [684, 383, 755, 416], 'block_id': 12, 'block_order': 10}, {'block_label': 'text', 'block_content': "Then the 3 on the bottom divides into the 9 on top leaving it as a 3 so all we've got now is:", 'block_bbox': [545, 429, 883, 465], 'block_id': 13, 'block_order': 11}, {'block_label': 'display_formula', 'block_content': ' $$ Veronica^{\\prime}s\\ choices=11\\times10\\times3 $$ ', 'block_bbox': [617, 476, 816, 495], 'block_id': 14, 'block_order': 12}, {'block_label': 'text', 'block_content': 'Look! No bottom.', 'block_bbox': [543, 507, 665, 529], 'block_id': 15, 'block_order': 13}, {'block_label': 'number', 'block_content': '95', 'block_bbox': [705, 554, 728, 570], 'block_id': 16, 'block_order': None}], 'layout_det_res': {'boxes': [{'cls_id': 22, 'label': 'text', 'score': 0.8623488545417786, 'coordinate': [180.37161254882812, 0, 511.5435485839844, 107.78173828125]}, {'cls_id': 17, 'label': 'paragraph_title', 'score': 0.9094090461730957, 'coordinate': [178.8297119140625, 115.10285949707031, 308.66815185546875, 132.9949188232422]}, {'cls_id': 22, 'label': 'text', 'score': 0.9703303575515747, 'coordinate': [174.90829467773438, 125.8176498413086, 505.4891052246094, 282.4457702636719]}, {'cls_id': 14, 'label': 'image', 'score': 0.9670560956001282, 'coordinate': [177.98138427734375, 284.2871398925781, 489.8233642578125, 468.6240539550781]}, {'cls_id': 24, 'label': 'vision_footnote', 'score': 0.6963039636611938, 'coordinate': [163.943603515625, 458.204833984375, 493.232666015625, 528.9574584960938]}, {'cls_id': 16, 'label': 'number', 'score': 0.8297750353813171, 'coordinate': [300.74310302734375, 545.8948364257812, 325.43939208984375, 563.2888793945312]}, {'cls_id': 22, 'label': 'text', 'score': 0.9042862057685852, 'coordinate': [551.9371948242188, 0.3308563232421875, 890.8565063476562, 76.84647369384766]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.9609742760658264, 'coordinate': [573.0150146484375, 74.92318725585938, 879.24755859375, 124.89605712890625]}, {'cls_id': 22, 'label': 'text', 'score': 0.9755771160125732, 'coordinate': [549.653076171875, 124.03761291503906, 887.7197265625, 206.6728057861328]}, {'cls_id': 22, 'label': 'text', 'score': 0.959395170211792, 'coordinate': [549.031005859375, 205.06463623046875, 886.72119140625, 244.28329467773438]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.9485629796981812, 'coordinate': [676.5474243164062, 254.51788330078125, 768.7219848632812, 290.23797607421875]}, {'cls_id': 22, 'label': 'text', 'score': 0.9793534874916077, 'coordinate': [546.6148681640625, 300.3444519042969, 885.8001708984375, 372.3039855957031]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.9466578960418701, 'coordinate': [684.8164672851562, 383.488525390625, 755.7780151367188, 416.1806640625]}, {'cls_id': 22, 'label': 'text', 'score': 0.9657009840011597, 'coordinate': [545.3870849609375, 429.8050842285156, 883.2789306640625, 465.9079284667969]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.8961516618728638, 'coordinate': [617.6002197265625, 476.957763671875, 816.8131103515625, 495.5823974609375]}, {'cls_id': 22, 'label': 'text', 'score': 0.9052585363388062, 'coordinate': [543.5634765625, 507.57012939453125, 665.855712890625, 529.0166625976562]}, {'cls_id': 16, 'label': 'number', 'score': 0.8552185893058777, 'coordinate': [705.1265869140625, 554.5432739257812, 728.734375, 570.6980590820312]}]}}
Markdown document saved at markdown_0/doc.md
Output image saved at layout_det_res_0.jpg
Output image saved at layout_order_res_0.jpg

目前,PPIO算力市场已上线几十个私有化部署模板,除了 PaddleOCR-VL,用户也可以将 Kimi-Linear、DeepSeek-R1-Distill-Qwen-1.5B、StableDiffusion:v1.10、PaddleOCR-VL 等模型快速进行私有化部署。

相关推荐
测试_AI_一辰1 天前
AI测试工程笔记 05:AI评测实践(从数据集到自动评测闭环)
人工智能·笔记·功能测试·自动化·ai编程
云境筑桃源哇1 天前
海洋ALFA:主权与创新的交响,开启AI生态新纪元
人工智能
liliangcsdn1 天前
LLM复杂数值的提取计算场景示例
人工智能·python
小和尚同志1 天前
OpenCodeUI 让你随时随地 AI Coding
人工智能·aigc·ai编程
AI视觉网奇1 天前
2d 数字人解决方案-待机动作
人工智能·计算机视觉
人工智能AI酱1 天前
【AI深究】逻辑回归(Logistic Regression)全网最详细全流程详解与案例(附大量Python代码演示)| 数学原理、案例流程、代码演示及结果解读 | 决策边界、正则化、优缺点及工程建议
人工智能·python·算法·机器学习·ai·逻辑回归·正则化
爱喝可乐的老王1 天前
机器学习监督学习模型--逻辑回归
人工智能·机器学习·逻辑回归
Ao0000001 天前
机器学习——逻辑回归
人工智能·机器学习·逻辑回归
智算菩萨1 天前
【How Far Are We From AGI】3 AGI的边界扩张——数字、物理与智能三重接口的技术实现与伦理困境
论文阅读·人工智能·深度学习·ai·agi
智算菩萨1 天前
【How Far Are We From AGI】2 大模型的“灵魂“缺口:当感知、记忆与自我意识的迷雾尚未散去
人工智能·ai·agi·感知