LLAVA数据集下载

LLAVA数据集下载

1. Data

Data file name Size
llava_instruct_150k.json 229 MB
llava_instruct_80k.json 229 MB
conversation_58k.json 126 MB
detail_23k.json 20.5 MB
complex_reasoning_77k.json 79.6 MB

1.1 Pretraining Dataset

The pretraining dataset used in this release is a subset of CC-3M dataset, filtered with a more balanced concept coverage distribution. Please see here for a detailed description of the dataset structure and how to download the images.

If you already have CC-3M dataset on your disk, the image names follow this format: GCC_train_000000000.jpg. You may edit the image field correspondingly if necessary.

Data Chat File Meta Data Size
CC-3M Concept-balanced 595K chat.json metadata.json 211 MB
LAION/CC/SBU BLIP-Caption Concept-balanced 558K blip_laion_cc_sbu_558k.json [metadata.json](#Data Chat File Meta Data Size CC-3M Concept-balanced 595K chat.json metadata.json 211 MB LAION/CC/SBU BLIP-Caption Concept-balanced 558K blip_laion_cc_sbu_558k.json metadata.json 181 MB) 181 MB

Important notice : Upon the request from the community, as ~15% images of the original CC-3M dataset are no longer accessible, we upload images.zip for better reproducing our work in research community. It must not be used for any other purposes. The use of these images must comply with the CC-3M license. This may be taken down at any time when requested by the original CC-3M dataset owner or owners of the referenced images.

1.2 GPT-4 Prompts

We provide our prompts and few-shot samples for GPT-4 queries, to better facilitate research in this domain. Please check out the prompts folder for three kinds of questions: conversation, detail description, and complex reasoning.

They are organized in a format of system_message.txt for system message, pairs of abc_caps.txt for few-shot sample user input, and abc_conv.txt for few-shot sample reference output.

Note that you may find them in different format. For example, conversation is in jsonl, and detail description is answer-only. The selected format in our preliminary experiments works slightly better than a limited set of alternatives that we tried: jsonl, more natural format, answer-only. If interested, you may try other variants or conduct more careful study in this. Contributions are welcomed!

2. Visual Instruction Tuning

---------2.1 指令调整数据(instruction tuning data)---------:

LLaVA-Instruct-150K

官方llava_v1_5_mix665k.json

---------2.2 图像(images)---------

COCO

官方train2017

GQA

官方images

OCR-VAQ

官方download script
多线程下载(速度更快)Github解决方案 以及 CSDN解决方案
处理好的数据集下载(方便快捷)Huggingface

TextVQA

官方train_val_images

VisualGenome

官方part1, part2

复制代码
playground
	├──data
	│	├── coco
	│	│   └── train2017
	│	├── gqa
	│	│   └── images
	│	├── ocr_vqa
	│	│   └── images
	│	├── textvqa
	│	│   └── train_images
	│	└── vg
	│	    ├── VG_100K
	│	    └── VG_100K_2
	└── ...   

3. Pretrained Model

---------3.1 语言大模型---------
vicuna-13b-v1.5
vicuna-7b-v1.5
---------3.2 视觉大模型---------
clip-vit-large-patch14-336
---------3.3 LLAVA-1.5预训练模型---------
LLAVA-1.5-13b
LLAVA-1.5-7b
---------3.4 LLAVA-lora微调训练的模型---------
LLAVA-1.5--13b-lora
LLAVA-1.5--7b-lora

相关推荐
惜.己6 分钟前
pytest中使用ordering控制函数的执行顺序
开发语言·python·pytest
旧时光巷7 分钟前
【机器学习-2】 | 决策树算法基础/信息熵
算法·决策树·机器学习·id3算法·信息熵·c4.5算法
数据智能老司机27 分钟前
使用 Python 进行并行与高性能编程——并行编程导论
python·性能优化·编程语言
落了一地秋35 分钟前
4.5 优化器中常见的梯度下降算法
人工智能·算法·机器学习
格林威44 分钟前
Baumer工业相机堡盟工业相机如何通过YoloV8深度学习模型实现卫星图像识别(C#代码,UI界面版)
人工智能·深度学习·数码相机·yolo·计算机视觉
豆浆Whisky1 小时前
字节Coze入场开源,一文搞定基础部署和实践,放弃Dify?
人工智能·coze
精灵vector1 小时前
【Agentic】通过LangGrah实现RAG评分和重写
python
狗都不学爬虫_1 小时前
JS逆向 - (国外)SHEIN站 - 请求头(armorToken、Anti-in)
javascript·python·ajax·网络爬虫·wasm
柠檬味拥抱1 小时前
基于YOLOv8的边坡排水沟堵塞检测与识别项目|完整源码数据集+PyQt5界面+完整训练流程+开箱即用!
人工智能