13、Ollama OCR - 技术栈

1、介绍

Ollama OCR，是一个强大的OCR（光学字符识别）工具包。利用 Ollama 的先进视觉语言模型从图像中提取文本，可作为 Python 包和 Streamlit 网络应用程序使用。具有支持多种视觉模型、多种输出格式、批量处理、图像预处理等功能。还介绍了安装方法、快速入门示例、输出格式细节以及 Streamlit 网络应用程序的特点。

1.1 多视觉模型支持

LLaVA 7B：用于实时处理的高效视觉语言模型（LLaVa 模型有时会生成错误的输出）
Llama 3.2 Vision：适用于复杂文档的高精度高级模型

1.2 输出格式

Markdown：保留带有标题和列表的文本格式
纯文本：干净、简单的文本提取
JSON：结构化数据格式
结构化：表和有序的数据
键值对：提取标记信息

1.3 支持批处理

并行处理多个图像
每张图片的进度跟踪
图像预处理（调整大小、标准化等）

2、安装测试

2.1 ollama 安装

第一次安装可参考跳转

已安装的测试下是否需要更新

bash 复制代码

(base) [root@local-staging-x4f7 Ollama-OCR-main]# ollama pull llama3.2-vision:11b
pulling manifest 
Error: pull model manifest: 412: 

The model you are attempting to pull requires a newer version of Ollama.

Please download the latest version at:

	https://ollama.com/download

说明当前版本不支持视觉模型，需要更新。可以参考前面第一次下载的方式重新下载一个新的安装包。

也可以直接下载

bash 复制代码

curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz

sudo tar -xzf ollama-linux-amd64.tgz -C /usr/local/

关掉已经开启的服务

bash 复制代码

(base) [root@local-staging-x4f7 Ollama-OCR-main]# lsof -i :11434
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
ollama  16778 root    3u  IPv6 39719454      0t0  TCP *:11434 (LISTEN)
(base) [root@local-staging-x4f7 Ollama-OCR-main]# kill 16778
(base) [root@local-staging-x4f7 Ollama-OCR-main]# ollama serve

2025/01/06 13:50:38 routes.go:1259: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-01-06T13:50:38.624+08:00 level=INFO source=images.go:757 msg="total blobs: 17"
time=2025-01-06T13:50:38.624+08:00 level=INFO source=images.go:764 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:	export GIN_MODE=release
 - using code:	gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
time=2025-01-06T13:50:38.624+08:00 level=INFO source=routes.go:1310 msg="Listening on [::]:11434 (version 0.5.4)"

2.2 下载模型

bash 复制代码

(base) [root@local-staging-x4f7 szl]# ollama run llama3.2-vision:11b
pulling manifest 
pulling 11f274007f09... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 6.0 GB                         
pulling ece5e659647a... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.9 GB                         
pulling 715415638c9c... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  269 B                         
pulling 0b4284c1f870... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 7.7 KB                         
pulling fefc914e46e6... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   32 B                         
pulling fbd313562bb7... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  572 B                         
verifying sha256 digest 
writing manifest 
success

2.3 下载源码

bash 复制代码

git clone git@github.com:imanoop7/Ollama-OCR.git

或者下载ZIP文件解压

进入源码文件

创建一个虚拟环境

bash 复制代码

(base) [root@local-staging-x4f7 ollama_ocr]# conda create -n ollamaocr python=3.11 -y

激活虚拟环境

bash 复制代码

(base) [root@local-staging-x4f7 ollama_ocr]# conda activate ollamaocr

安装依赖

bash 复制代码

(ollamaocr) [root@local-staging-x4f7 Ollama-OCR-main]# pip install -r requirements.txt

2.4 启动环境

bash 复制代码

(ollamaocr) [root@local-staging-x4f7 Ollama-OCR-main]# cd src/ollama_ocr      
(ollamaocr) [root@local-staging-x4f7 ollama_ocr]# streamlit run app.py

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.


  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://****:8501
  External URL: http://****:8501

2.5 测试

英文测试

测试图片

测试结果

可以看到英文的识别效果很不错

中文测试

中文图片

识别结果

效果可以说是惨不忍睹

3、说明

中文识别之所以比较差应该是以下几方面原因

目前只支持LLama3.2 的模型，其中文能力较弱
在源代码的Prompt里面没有对中文进行说明
后续如果能支持Qwen或者deepseek等中文的视觉模型，然后根据项目修改Prompt应该会有所提升，持续关注吧