Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
s1ckrain4 天前
【论文阅读】DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning
论文阅读·强化学习·多模态大模型·vlm
s1ckrain1 个月前
【论文阅读】KIMI-VL TECHNICAL REPORT
论文阅读·多模态大模型·vlm
PLUS_WAVE2 个月前
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning 学习笔记
学习·语言模型·大模型·cot·vlm·推理模型·reasoning
kaaokou3 个月前
论文笔记——KIMI-VL:具有增强推理能力的有效开源视觉语言模型
深度学习·计算机视觉·vlm
征途黯然.4 个月前
olmOCR模型论文解读
ocr·vlm·olmocr
白云千载尽4 个月前
大语言加持的闭环端到端自动驾驶模型 学习笔记纯干货
论文阅读·笔记·学习·自动驾驶·carla·vlm
datamonday5 个月前
[EAI-028] Diffusion-VLA,能够进行多模态推理和机器人动作预测的VLA模型
扩散模型·具身智能·vla·vlm·diffusionvla
datamonday5 个月前
[EAI-023] FAST: Efficient Action Tokenization for Vision-Language-Action Models
tokenizer·具身智能·vla·vlm·pi0
带电的小王7 个月前
llama.cpp:PC端测试 MobileVLM -- 电脑端部署图生文大模型
llm·llama.cpp·vlm·mobilevlm·图生文