Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
山顶夕景3 天前
【VLM】结合Python沙箱的以图思辨S1-VL模型
python·大模型·llm·agent·多模态·vlm
一颗小树x22 天前
《VLA 系列》Humanoid Everyday | 人形机器人 | 开源数据集
机器人·开源数据集·人形机器人·vlm
bryant_meng1 个月前
【VLA】Vision Language Action
人工智能·深度学习·rl·vla·世界模型·vlm
一颗小树x1 个月前
空间理解 SITI 数据集 | 格式转换 | Lora微调 | VLM 大模型
格式转换·vlm·lora微调·空间理解·siti 数据集
山顶夕景1 个月前
【VLM】HopChain视觉语言推理多跳数据合成框架
大模型·llm·cot·vlm·视觉模型
山顶夕景2 个月前
【VLM】Qwen3-VL-SFT微调简要流程
llm·多模态大模型·vlm
山顶夕景2 个月前
【VLM】Qwen3-VL模型架构和训练流程
大模型·llm·多模态·vlm
ASS-ASH3 个月前
AI时代之向量数据库概览
数据库·人工智能·python·llm·embedding·向量数据库·vlm
一个处女座的程序猿3 个月前
CV之VLM之LLM-OCR:《DeepSeek-OCR 2: Visual Causal Flow》翻译与解读
llm·ocr·cv·vlm
leo03083 个月前
深入解析 π₀ 与 π₀.5:Physical Intelligence 的机器人基础模型演进
vla·vlm