LLAVA数据集下载

LLAVA数据集下载

1. Data

Data file name Size
llava_instruct_150k.json 229 MB
llava_instruct_80k.json 229 MB
conversation_58k.json 126 MB
detail_23k.json 20.5 MB
complex_reasoning_77k.json 79.6 MB

1.1 Pretraining Dataset

The pretraining dataset used in this release is a subset of CC-3M dataset, filtered with a more balanced concept coverage distribution. Please see here for a detailed description of the dataset structure and how to download the images.

If you already have CC-3M dataset on your disk, the image names follow this format: GCC_train_000000000.jpg. You may edit the image field correspondingly if necessary.

Data Chat File Meta Data Size
CC-3M Concept-balanced 595K chat.json metadata.json 211 MB
LAION/CC/SBU BLIP-Caption Concept-balanced 558K blip_laion_cc_sbu_558k.json [metadata.json](#Data Chat File Meta Data Size CC-3M Concept-balanced 595K chat.json metadata.json 211 MB LAION/CC/SBU BLIP-Caption Concept-balanced 558K blip_laion_cc_sbu_558k.json metadata.json 181 MB) 181 MB

Important notice : Upon the request from the community, as ~15% images of the original CC-3M dataset are no longer accessible, we upload images.zip for better reproducing our work in research community. It must not be used for any other purposes. The use of these images must comply with the CC-3M license. This may be taken down at any time when requested by the original CC-3M dataset owner or owners of the referenced images.

1.2 GPT-4 Prompts

We provide our prompts and few-shot samples for GPT-4 queries, to better facilitate research in this domain. Please check out the prompts folder for three kinds of questions: conversation, detail description, and complex reasoning.

They are organized in a format of system_message.txt for system message, pairs of abc_caps.txt for few-shot sample user input, and abc_conv.txt for few-shot sample reference output.

Note that you may find them in different format. For example, conversation is in jsonl, and detail description is answer-only. The selected format in our preliminary experiments works slightly better than a limited set of alternatives that we tried: jsonl, more natural format, answer-only. If interested, you may try other variants or conduct more careful study in this. Contributions are welcomed!

2. Visual Instruction Tuning

---------2.1 指令调整数据(instruction tuning data)---------:

LLaVA-Instruct-150K

官方llava_v1_5_mix665k.json

---------2.2 图像(images)---------

COCO

官方train2017

GQA

官方images

OCR-VAQ

官方download script
多线程下载(速度更快)Github解决方案 以及 CSDN解决方案
处理好的数据集下载(方便快捷)Huggingface

TextVQA

官方train_val_images

VisualGenome

官方part1, part2

复制代码
playground
	├──data
	│	├── coco
	│	│   └── train2017
	│	├── gqa
	│	│   └── images
	│	├── ocr_vqa
	│	│   └── images
	│	├── textvqa
	│	│   └── train_images
	│	└── vg
	│	    ├── VG_100K
	│	    └── VG_100K_2
	└── ...   

3. Pretrained Model

---------3.1 语言大模型---------
vicuna-13b-v1.5
vicuna-7b-v1.5
---------3.2 视觉大模型---------
clip-vit-large-patch14-336
---------3.3 LLAVA-1.5预训练模型---------
LLAVA-1.5-13b
LLAVA-1.5-7b
---------3.4 LLAVA-lora微调训练的模型---------
LLAVA-1.5--13b-lora
LLAVA-1.5--7b-lora

相关推荐
eve杭1 天前
AI、大数据与智能时代:从理论基石到实战路径
人工智能·python·5g·网络安全·ai
TG:@yunlaoda360 云老大1 天前
腾讯云国际站代理商的QAPM服务能提供哪些专属服务?
人工智能·云计算·腾讯云
Honmaple1 天前
中国四级城市联动数据,包含港澳台,内含json , sql , python 脚本
python·sql·json
BoBoZz191 天前
Curvatures 曲率的计算、边缘曲率的调整以及曲率、颜色的映射
python·vtk·图形渲染·图形处理
明月满西楼1 天前
4.2.1 分类任务
人工智能
AI_56781 天前
Webpack5优化的“双引擎”
大数据·人工智能·性能优化
LZL_SQ1 天前
昇腾NPU架构设计 从抽象硬件模型到物理实现
人工智能·昇腾·cann·ascend c
少吃零食多运动1 天前
【Jupyter notebook修改工作目录】
python·jupyter
慎独4131 天前
家家有平台:Web3.0绿色积分引领消费新纪元
大数据·人工智能·物联网
Swizard1 天前
别买树莓派了!3步教你在安卓手机上跑通 CPython + PaddleOCR,打造随身 AI 识别终端
python·ai·移动开发