OCRBench：评估多模态大模型的OCR能力

论文地址：OCRBench: On The Hidden Mystery of OCR In Large Multimodal Models：2305.07895

OCRBench在10个文本相关任务上测评多模态大模型（LMM）的OCR能力，包含1000个问题-答案对，每个问题-答案对包含以下五个类别：index（索引），image（图片），question（问题），answer（回答），category（问题类别）。其中问题类别主要有以下内容：

任务	翻译	image（图片）示例	question（问题）示例	answer（回答）示例	任务数量
Key Information Extraction	关键信息提取		what is the total amount of this receipt? Answer this question using the text in the image directly.	['26.58']	200
Doc-oriented VQA	面向文档的视觉问答		Whats the Venue Name?	['the halfmoon']	200
Scene Text-centric VQA	以场景文本为中心的视觉问答		What is the title of the book?	['PENDRAGON']	200
Handwritten Mathematical Expression Recognition	手写数学表达式识别		Please write out the expression of the formula in the image using LaTeX format.	['x = \\frac { 1 7 } { 5 }\n']	100
Irregular Text Recognition	不规则文本识别		what is written in the image?	['COFFEE']	50
Regular Text Recognition	规则文本识别		what is written in the image?	['CHAIN']	50
Non-Semantic Text Recognition	非语义文本识别		what is written in the image?	['espt']	50
Digit String Recognition	数字字符串识别		what is the number in the image?	['9557']	50
Handwriting Recognition	手写体识别		what is written in the image?	['bread']	50
Artistic Text Recognition	艺术文本识别		what is written in the image?	['Home']	50
Total	总计	-	-	-	1000

需要注意的是，在tsv文件中，图片使用Base64编码保存。Base64 编码可将二进制图像文件（PNG、JPEG、GIF）转换为紧凑的纯文本字符串，从而直接嵌入到 HTML、CSS 或 JSON 中。

要将Base64编码转换为图片，有以下三种方式：

（1）使用在线网站：例如：Base64 转图片转换器 -- 免费在线工具箱 - DopuBOX

（2）使用脚本：

python 复制代码

import base64

# 1. 复制 Base64 编码字符串
base64_data = "/9j/4AAQSkZJRgABAQAAAQABAAD/...（完整字符串）/ALz44+gHAooA/9k="

# 2. 解码并保存为图片
with open("output.jpg", "wb") as f:
    f.write(base64.b64decode(base64_data))

print("图片已保存为 output.jpg")

（3）浏览器直接预览

在 HTML 文件中使用以下代码：

html 复制代码

<img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQ...（完整 Base64 字符串）.../9k=">

用浏览器打开该 HTML 文件即可显示图片。

说明

编码类型 ：该字符串是 JPEG 图片 的 Base64 编码（以 /9j/ 开头）。
注意事项 ：确保复制完整的编码（从 /9j/ 到结束标记 /9k=），否则转换会失败。