欢迎关注我的CSDN:https://spike.blog.csdn.net/
免责声明:本文来源于个人知识与公开资料,仅用于学术交流,欢迎讨论,不支持转载。
VLMEvalKit 是大型视觉语言模型设计的开源评估工具包,由 Open Compass 团队开发,它支持一键式评估体验,无需繁琐的数据准备工作,能够对多种视觉语言模型进行评估,并覆盖了多样化的任务场景。
VLMEvalKit:GitHub - open-compass/VLMEvalKit
1. 运行环境
准备 VLMEvalKit 工程环境:
- Python 使用 3.11
- 建议预先安装 PyTorch、Transformers、flash-attn 等基础 Python 库,避免冲突。
bash
git clone https://github.com/open-compass/VLMEvalKit
cd VLMEvalKit
conda create -n vlm_eval_kit python=3.11
conda activate vlm_eval_kit
# 预先安装
pip install torch torchvision torchaudio # 最新版本
pip install transformers==4.45.0
# pip install flash-attn (建议手动安装)
# 其次安装
pip install -r requirements.txt
pip install -e .
# 最后安装
pip install ipdb
pip install einops transformers_stream_generator
安装
flash-attn
参考:使用 vLLM 部署 Qwen2-VL 多模态大模型 (配置 FlashAttention) 教程
MME(Multimodal Model Evaluation) 是由腾讯优图实验室和厦门大学联合开发,多模态大型语言模型评估基准,包含 14 个子任务,覆盖从粗粒度到细粒度的对象识别、常识推理、数值计算、文本翻译和代码推理等多个方面,全面评估模型的感知和认知能力。
评测 MME 多模态数据集:
bash
# torchrun --nproc-per-node=2 run.py --data MME --model Qwen2-VL-7B-Instruct --verbose
python3 run.py --data MME --model Qwen2-VL-7B-Instruct --verbose
python3 run.py --data MME --model Llama-3.2-11B-Vision-Instruct --verbose
评估结果:
bash
[2024-12-09 14:51:21] INFO - run.py: main - 400:
--------------------- --------
perception 1675.9
reasoning 640.714
OCR 155
artwork 151.25
celebrity 149.412
code_reasoning 160
color 180
commonsense_reasoning 155.714
count 160
existence 195
landmark 185
numerical_calculation 125
position 155
posters 182.993
scene 162.25
text_translation 200
--------------------- --------
输出结果 outputs/Qwen2-VL-7B-Instruct
,即:
bash
outputs/Qwen2-VL-7B-Instruct
├── Qwen2-VL-7B-Instruct_MME.xlsx -> outputs/Qwen2-VL-7B-Instruct/T20241209_Ga18f5d69/Qwen2-VL-7B-Instruct_MME.xlsx
├── Qwen2-VL-7B-Instruct_MME_auxmatch.xlsx -> outputs/Qwen2-VL-7B-Instruct/T20241209_Ga18f5d69/Qwen2-VL-7B-Instruct_MME_auxmatch.xlsx
├── Qwen2-VL-7B-Instruct_MME_score.csv -> outputs/Qwen2-VL-7B-Instruct/T20241209_Ga18f5d69/Qwen2-VL-7B-Instruct_MME_score.csv
└── T20241209_Ga18f5d69
├── Qwen2-VL-7B-Instruct_MME.xlsx
├── Qwen2-VL-7B-Instruct_MME_auxmatch.xlsx
└── Qwen2-VL-7B-Instruct_MME_score.csv
其他参考 Llama-3.2-11B-Vision-Instruct
的效果
bash
[2024-12-09 16:33:49] INFO - run.py: main - 400:
--------------------- --------
perception 1343.25
reasoning 325.714
OCR 125
artwork 87
celebrity 127.353
code_reasoning 27.5
color 143.333
commonsense_reasoning 110.714
count 143.333
existence 190
landmark 110.5
numerical_calculation 115
position 123.333
posters 153.401
scene 140
text_translation 72.5
--------------------- --------
2. 工程配置
2.1 环境变量(Env)
构建环境变量,在 VLMEvalKit 中,编写 .env
格式文件,指定 模型下载路径(HF_HOME
),和 数据集下载路径(LMUData
),即:
bash
HF_HOME="[your path]/huggingface/"
LMUData="[your path]/huggingface/LMUData/"
使用
from dotenv import dotenv_values
库,进行调用,参考vlmeval/smp/misc.py
2.2 评估模型(Env)
VLMEvalKit 的模型,参考 vlmeval/config.py
,包括现有的主流模型,位置默认是 HuggingFace 的下载路径 $HF_HOME
,即:
python
model_groups = [
ungrouped, api_models,
xtuner_series, qwen_series, llava_series, internvl_series, yivl_series,
xcomposer_series, minigpt4_series, idefics_series, instructblip_series,
deepseekvl_series, janus_series, minicpm_series, cogvlm_series, wemm_series,
cambrian_series, chameleon_series, video_models, ovis_series, vila_series,
mantis_series, mmalaya_series, phi3_series, xgen_mm_series, qwen2vl_series,
slime_series, eagle_series, moondream_series, llama_series, molmo_series,
kosmos_series, points_series, nvlm_series, vintern_series, h2ovl_series, aria_series,
smolvlm_series
]
如果模型无法下载,例如 Llama-3.2-11B-Vision-Instruct
,修改模型路径 vlmeval/config.py
,调用位置 vlmeval/vlm/llama_vision.py
,即:
python
# vlmeval/config.py
llama_series={
# meta-llama/Llama-3.2-11B-Vision-Instruct 替换 [your path]/huggingface/meta-llama/Llama-3.2-11B-Vision-Instruct
'Llama-3.2-11B-Vision-Instruct': partial(llama_vision, model_path='[your path]/huggingface/meta-llama/Llama-3.2-11B-Vision-Instruct'),
'LLaVA-CoT': partial(llama_vision, model_path='[your path]/huggingface/Xkev/Llama-3.2V-11B-cot'),
'Llama-3.2-90B-Vision-Instruct': partial(llama_vision, model_path='meta-llama/Llama-3.2-90B-Vision-Instruct'),
}
# vlmeval/vlm/llama_vision.py
class llama_vision(BaseModel):
INSTALL_REQ = False
INTERLEAVE = False
# This function is used to split Llama-3.2-90B
def split_model(self):
# ...
# meta-llama/Llama-3.2-11B-Vision-Instruct 替换 [your path]/huggingface/meta-llama/Llama-3.2-11B-Vision-Instruct
def __init__(self, model_path='meta-llama/Llama-3.2-11B-Vision-Instruct', **kwargs):
默认与 HuggingFace 下载路径一致,需要指定,则修改
vlmeval/config.py
配置。
2.3 评估集(Env)
VLMEvalKit 的数据,参考 vlmeval/dataset/__init__.py
,主要支持 IMAGE_DATASET
、VIDEO_DATASET
、TEXT_DATASET
、CUSTOM_DATASET
、DATASET_COLLECTION
,即:
python
# run.py
dataset = build_dataset(dataset_name, **dataset_kwargs)
# vlmeval/dataset/__init__.py
DATASET_CLASSES = IMAGE_DATASET + VIDEO_DATASET + TEXT_DATASET + CUSTOM_DATASET + DATASET_COLLECTION
def build_dataset(dataset_name, **kwargs):
for cls in DATASET_CLASSES:
if dataset_name in cls.supported_datasets():
return cls(dataset=dataset_name, **kwargs)
以 MME 为例,调用的是 vlmeval/dataset/image_yorn.py
数据集,即
bash
DATASET_URL = {
'MME': 'https://opencompass.openxlab.space/utils/VLMEval/MME.tsv',
'HallusionBench': 'https://opencompass.openxlab.space/utils/VLMEval/HallusionBench.tsv',
'POPE': 'https://opencompass.openxlab.space/utils/VLMEval/POPE.tsv',
'AMBER': 'https://huggingface.co/datasets/yifanzhang114/AMBER_base64/resolve/main/AMBER.tsv',
}
基类 ImageBaseDataset
负责处理逻辑:
python
# Return a list of dataset names that are supported by this class, can override
@classmethod
def supported_datasets(cls):
return list(cls.DATASET_URL)
具体位置参考,位于 LMUData
变量之中,默认位置 ~/LMUData/images/MME
,即:
python
def LMUDataRoot():
if 'LMUData' in os.environ and osp.exists(os.environ['LMUData']):
return os.environ['LMUData']
home = osp.expanduser('~')
root = osp.join(home, 'LMUData')
os.makedirs(root, exist_ok=True)
return root
3. 雷达图
绘制雷达图,参考 scripts/visualize.ipynb
,使用 OpenVLM.json
全量的 MLLM 评估结果,进行绘制,效果如下:
Bug:
bash
[your path]/miniconda3_62/envs/vlm_eval_kit/lib/python3.11/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
即重新安装 torch 即可:
bash
pip uninstall torch numpy
pip3 install torch torchvision torchaudio