Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
山顶夕景9 天前
【VLM】Qwen3-VL-SFT微调简要流程
llm·多模态大模型·vlm
山顶夕景11 天前
【VLM】Qwen3-VL模型架构和训练流程
大模型·llm·多模态·vlm
ASS-ASH25 天前
AI时代之向量数据库概览
数据库·人工智能·python·llm·embedding·向量数据库·vlm
一个处女座的程序猿1 个月前
CV之VLM之LLM-OCR:《DeepSeek-OCR 2: Visual Causal Flow》翻译与解读
llm·ocr·cv·vlm
leo03081 个月前
深入解析 π₀ 与 π₀.5:Physical Intelligence 的机器人基础模型演进
vla·vlm
国家一级假勤奋大学生1 个月前
InternVL系列 technical report 解析
大模型·llm·vlm·mllm·internvl·调研笔记
具身智能之心1 个月前
ICLR 2026中稿工作VLASER: 究竟哪些多模态能力和数据对提升机器人的控制表现最关键?
具身智能·vlm·iclr 2026
一颗小树x1 个月前
《VLA 系列》从 VLM 到 VLA 机器人控制,关键的多模态数据和能力是什么?| Vlaser | ICLR 2026
人工智能·深度学习·机器人·vlm·vlaser
安如衫1 个月前
从 OCR 到多模态 VLM Agentic AI:智能文档问答的范式转移全解
人工智能·ocr·agent·cv·rag·vlm
hjs_deeplearning1 个月前
认知篇#15:ms-swift微调中gradient_accumulation_steps和warmup_ratio等参数的意义与设置
开发语言·人工智能·机器学习·swift·vlm