多模态数据,尤其是中文多模态数据,找一些中文多模态的数据
1.XrayGLM
https://github.com/WangRongsheng/XrayGLMhttps://github.com/WangRongsheng/XrayGLM6423张Xray图片,
2.sharegpt4v
用的gpt4v标注的多模态数据,1.2M
4.zero-chinese
5.laion-2b-chinese
数据格式:
这里llava格式是应用最为广泛的,除了qwen-vl系列,几乎所有的系列都是llava格式。
1.InternLM-VL
python
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好"
},
{
"from": "assistant",
"value": "我是Qwen-VL,一个支持视觉输入的大模型。"
}
]
},
{
"id": "identity_1",
"conversations": [
{
"from": "user",
"value": "Picture 1: <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>\n图中的狗是什么品种?"
},
{
"from": "assistant",
"value": "图中是一只拉布拉多犬。"
},
{
"from": "user",
"value": "框出图中的格子衬衫"
},
{
"from": "assistant",
"value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"
}
]
},
{
"id": "identity_2",
"conversations": [
{
"from": "user",
"value": "Picture 1: <img>assets/mm_tutorial/Chongqing.jpeg</img>\nPicture 2: <img>assets/mm_tutorial/Beijing.jpeg</img>\n图中都是哪"
},
{
"from": "assistant",
"value": "第一张图片是重庆的城市天际线,第二张图片是北京的天际线。"
}
]
}
]
3. swift中的数据格式
qwen:
python
[
{"conversations": [
{"from": "user", "value": "Picture 1:<img>img_path</img>\n11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "Picture 1:<img>img_path</img>\nPicture 2:<img>img_path2</img>\nPicture 3:<img>img_path3</img>\naaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "Picture 1:<img>img_path</img>\nccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
4.bunny
python
{
'id': '0',
'image': 'coco_2017/000000337760.jpg',
'conversations': [
{'from': 'human', 'value': '<image>\n这是一辆现代消防车吗?\n请用单个短语回答问题。'},
{'from': 'gpt', 'value': '不'},
{'from': 'human', 'value': '这张照片是黑白的还是彩色的?'},
{'from': 'gpt', 'value': '黑白的'},
{'from': 'human', 'value': '图中显示了哪些车辆?'},
{'from': 'gpt', 'value': '消防车'}
]
}
5.visualglm
python
[
{"img": "fewshot-data/2p.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是蒙蒙细雨。"},
{"img": "fewshot-data/pig.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是是虚化的。"},
{"img": "fewshot-data/meme.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是蓝色的木质地板。"},
{"img": "fewshot-data/passport.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是棕黄色木质桌子。"},
{"img": "fewshot-data/tower.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是黄昏的天空、云彩和繁华的城市高楼。"},
{"img": "fewshot-data/rub.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是太阳、大树、蓝天白云。"},
{"img": "fewshot-data/push.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是蓝天和沙漠。"},
{"img": "fewshot-data/traf.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是城市街道。"},
{"img": "fewshot-data/music.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是一个音乐混音器。"},
{"img": "fewshot-data/pattern.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是小区的楼房和街道。"},
{"img": "fewshot-data/rou.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是大理石桌子和一个盘子。"},
{"img": "fewshot-data/katong.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是绿色的草地。"},
{"img": "fewshot-data/man.jpg", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是城市的街道和高楼。"},
{"img": "fewshot-data/kobe.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是虚化的观众席。"},
{"img": "fewshot-data/panda.jpg", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是纯白的。"},
{"img": "fewshot-data/titan.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是一座雕像。"},
{"img": "fewshot-data/woman.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是纯蓝的。"},
{"img": "fewshot-data/ghost.jpg", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是一个房间。"},
{"img": "fewshot-data/justice.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是天空和阳光。"},
{"img": "fewshot-data/tianye.png", "prompt": "这张图片的背景里有什么内容?", "label": "这张图片的背景是金黄的田野。"}
]