LLAVA数据集下载

LLAVA数据集下载

1. Data

Data file name Size
llava_instruct_150k.json 229 MB
llava_instruct_80k.json 229 MB
conversation_58k.json 126 MB
detail_23k.json 20.5 MB
complex_reasoning_77k.json 79.6 MB

1.1 Pretraining Dataset

The pretraining dataset used in this release is a subset of CC-3M dataset, filtered with a more balanced concept coverage distribution. Please see here for a detailed description of the dataset structure and how to download the images.

If you already have CC-3M dataset on your disk, the image names follow this format: GCC_train_000000000.jpg. You may edit the image field correspondingly if necessary.

Data Chat File Meta Data Size
CC-3M Concept-balanced 595K chat.json metadata.json 211 MB
LAION/CC/SBU BLIP-Caption Concept-balanced 558K blip_laion_cc_sbu_558k.json [metadata.json](#Data Chat File Meta Data Size CC-3M Concept-balanced 595K chat.json metadata.json 211 MB LAION/CC/SBU BLIP-Caption Concept-balanced 558K blip_laion_cc_sbu_558k.json metadata.json 181 MB) 181 MB

Important notice : Upon the request from the community, as ~15% images of the original CC-3M dataset are no longer accessible, we upload images.zip for better reproducing our work in research community. It must not be used for any other purposes. The use of these images must comply with the CC-3M license. This may be taken down at any time when requested by the original CC-3M dataset owner or owners of the referenced images.

1.2 GPT-4 Prompts

We provide our prompts and few-shot samples for GPT-4 queries, to better facilitate research in this domain. Please check out the prompts folder for three kinds of questions: conversation, detail description, and complex reasoning.

They are organized in a format of system_message.txt for system message, pairs of abc_caps.txt for few-shot sample user input, and abc_conv.txt for few-shot sample reference output.

Note that you may find them in different format. For example, conversation is in jsonl, and detail description is answer-only. The selected format in our preliminary experiments works slightly better than a limited set of alternatives that we tried: jsonl, more natural format, answer-only. If interested, you may try other variants or conduct more careful study in this. Contributions are welcomed!

2. Visual Instruction Tuning

---------2.1 指令调整数据(instruction tuning data)---------:

LLaVA-Instruct-150K

官方llava_v1_5_mix665k.json

---------2.2 图像(images)---------

COCO

官方train2017

GQA

官方images

OCR-VAQ

官方download script
多线程下载(速度更快)Github解决方案 以及 CSDN解决方案
处理好的数据集下载(方便快捷)Huggingface

TextVQA

官方train_val_images

VisualGenome

官方part1, part2

playground
	├──data
	│	├── coco
	│	│   └── train2017
	│	├── gqa
	│	│   └── images
	│	├── ocr_vqa
	│	│   └── images
	│	├── textvqa
	│	│   └── train_images
	│	└── vg
	│	    ├── VG_100K
	│	    └── VG_100K_2
	└── ...   

3. Pretrained Model

---------3.1 语言大模型---------
vicuna-13b-v1.5
vicuna-7b-v1.5
---------3.2 视觉大模型---------
clip-vit-large-patch14-336
---------3.3 LLAVA-1.5预训练模型---------
LLAVA-1.5-13b
LLAVA-1.5-7b
---------3.4 LLAVA-lora微调训练的模型---------
LLAVA-1.5--13b-lora
LLAVA-1.5--7b-lora

相关推荐
陈鋆21 分钟前
智慧城市初探与解决方案
人工智能·智慧城市
qdprobot22 分钟前
ESP32桌面天气摆件加文心一言AI大模型对话Mixly图形化编程STEAM创客教育
网络·人工智能·百度·文心一言·arduino
QQ395753323722 分钟前
金融量化交易模型的突破与前景分析
人工智能·金融
QQ395753323723 分钟前
金融量化交易:技术突破与模型优化
人工智能·金融
The_Ticker36 分钟前
CFD平台如何接入实时行情源
java·大数据·数据库·人工智能·算法·区块链·软件工程
Elastic 中国社区官方博客42 分钟前
Elasticsearch 开放推理 API 增加了对 IBM watsonx.ai Slate 嵌入模型的支持
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
jwolf242 分钟前
摸一下elasticsearch8的AI能力:语义搜索/vector向量搜索案例
人工智能·搜索引擎
有Li1 小时前
跨视角差异-依赖网络用于体积医学图像分割|文献速递-生成式模型与transformer在医学影像中的应用
人工智能·计算机视觉
傻啦嘿哟1 小时前
如何使用 Python 开发一个简单的文本数据转换为 Excel 工具
开发语言·python·excel
B站计算机毕业设计超人1 小时前
计算机毕业设计SparkStreaming+Kafka旅游推荐系统 旅游景点客流量预测 旅游可视化 旅游大数据 Hive数据仓库 机器学习 深度学习
大数据·数据仓库·hadoop·python·kafka·课程设计·数据可视化