论文地址:OCRBench: On The Hidden Mystery of OCR In Large Multimodal Models:2305.07895
OCRBench在10个文本相关任务上测评多模态大模型(LMM)的OCR能力,包含1000个问题-答案对,每个问题-答案对包含以下五个类别:index(索引),image(图片),question(问题),answer(回答),category(问题类别)。其中问题类别主要有以下内容:
任务 | 翻译 | image(图片)示例 | question(问题)示例 | answer(回答)示例 | 任务数量 |
---|---|---|---|---|---|
Key Information Extraction | 关键信息提取 | ![]() |
what is the total amount of this receipt? Answer this question using the text in the image directly. | ['26.58'] | 200 |
Doc-oriented VQA | 面向文档的视觉问答 | ![]() |
Whats the Venue Name? | ['the halfmoon'] | 200 |
Scene Text-centric VQA | 以场景文本为中心的视觉问答 | ![]() |
What is the title of the book? | ['PENDRAGON'] | 200 |
Handwritten Mathematical Expression Recognition | 手写数学表达式识别 | ![]() |
Please write out the expression of the formula in the image using LaTeX format. | ['x = \\frac { 1 7 } { 5 }\n'] | 100 |
Irregular Text Recognition | 不规则文本识别 | ![]() |
what is written in the image? | ['COFFEE'] | 50 |
Regular Text Recognition | 规则文本识别 | ![]() |
what is written in the image? | ['CHAIN'] | 50 |
Non-Semantic Text Recognition | 非语义文本识别 | ![]() |
what is written in the image? | ['espt'] | 50 |
Digit String Recognition | 数字字符串识别 | ![]() |
what is the number in the image? | ['9557'] | 50 |
Handwriting Recognition | 手写体识别 | ![]() |
what is written in the image? | ['bread'] | 50 |
Artistic Text Recognition | 艺术文本识别 | ![]() |
what is written in the image? | ['Home'] | 50 |
Total | 总计 | - | - | - | 1000 |
需要注意的是,在tsv文件中,图片使用Base64编码保存。Base64 编码可将二进制图像文件(PNG、JPEG、GIF)转换为紧凑的纯文本字符串,从而直接嵌入到 HTML、CSS 或 JSON 中。

要将Base64编码转换为图片,有以下三种方式:
(1)使用在线网站:例如:Base64 转图片转换器 -- 免费在线工具箱 - DopuBOX

(2)使用脚本:
python
import base64
# 1. 复制 Base64 编码字符串
base64_data = "/9j/4AAQSkZJRgABAQAAAQABAAD/...(完整字符串)/ALz44+gHAooA/9k="
# 2. 解码并保存为图片
with open("output.jpg", "wb") as f:
f.write(base64.b64decode(base64_data))
print("图片已保存为 output.jpg")
(3)浏览器直接预览
在 HTML 文件中使用以下代码:
html
<img src="...(完整 Base64 字符串).../9k=">
用浏览器打开该 HTML 文件即可显示图片。
说明
-
编码类型 :该字符串是 JPEG 图片 的 Base64 编码(以
/9j/
开头)。 -
注意事项 :确保复制完整的编码(从
/9j/
到结束标记/9k=
),否则转换会失败。