Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
ASS-ASH3 天前
AI时代之向量数据库概览
数据库·人工智能·python·llm·embedding·向量数据库·vlm
一个处女座的程序猿7 天前
CV之VLM之LLM-OCR:《DeepSeek-OCR 2: Visual Causal Flow》翻译与解读
llm·ocr·cv·vlm
leo03087 天前
深入解析 π₀ 与 π₀.5:Physical Intelligence 的机器人基础模型演进
vla·vlm
国家一级假勤奋大学生9 天前
InternVL系列 technical report 解析
大模型·llm·vlm·mllm·internvl·调研笔记
具身智能之心9 天前
ICLR 2026中稿工作VLASER: 究竟哪些多模态能力和数据对提升机器人的控制表现最关键?
具身智能·vlm·iclr 2026
一颗小树x10 天前
《VLA 系列》从 VLM 到 VLA 机器人控制,关键的多模态数据和能力是什么?| Vlaser | ICLR 2026
人工智能·深度学习·机器人·vlm·vlaser
安如衫10 天前
从 OCR 到多模态 VLM Agentic AI:智能文档问答的范式转移全解
人工智能·ocr·agent·cv·rag·vlm
hjs_deeplearning13 天前
认知篇#15:ms-swift微调中gradient_accumulation_steps和warmup_ratio等参数的意义与设置
开发语言·人工智能·机器学习·swift·vlm
一颗小树x13 天前
Qwen3-VL 目标检测 | 生成训练标签 | LabelMe格式 | COCO格式
目标检测·vlm·模型推理·vllm加速·标注数据
山顶夕景22 天前
【VLM】Format Decoupled Reinforcement Learning for Document OCR
大模型·llm·ocr·多模态·文档智能·vlm