Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
贾全20 天前
零基础完全理解视觉语言模型(VLM):从理论到代码实践
人工智能·ai·语言模型·自然语言处理·vlm
贾全20 天前
从LLM到VLM:视觉语言模型的核心技术与Python实现
人工智能·python·ai·机器人·视觉语言模型·vlm
lovep120 天前
详解大模型的位置编码-positional encoding
llm·位置编码·基础模型·vlm·rope·mllm·sinusoidal编码
s1ckrain1 个月前
【论文阅读】DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning
论文阅读·强化学习·多模态大模型·vlm
s1ckrain2 个月前
【论文阅读】KIMI-VL TECHNICAL REPORT
论文阅读·多模态大模型·vlm
PLUS_WAVE3 个月前
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning 学习笔记
学习·语言模型·大模型·cot·vlm·推理模型·reasoning
kaaokou4 个月前
论文笔记——KIMI-VL:具有增强推理能力的有效开源视觉语言模型
深度学习·计算机视觉·vlm
征途黯然.4 个月前
olmOCR模型论文解读
ocr·vlm·olmocr
白云千载尽5 个月前
大语言加持的闭环端到端自动驾驶模型 学习笔记纯干货
论文阅读·笔记·学习·自动驾驶·carla·vlm