Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
山顶夕景2 天前
【VLM】Format Decoupled Reinforcement Learning for Document OCR
大模型·llm·ocr·多模态·文档智能·vlm
songyuc4 天前
【Qwen3-VL】请你用易懂且简洁的语言来介绍一下Qwen3VL的图像预处理
vlm
njsgcs17 天前
基于vlm+ocr+yolo的一键ai从模之屋下载模型
人工智能·python·yolo·ocr·vlm
njsgcs19 天前
ai自己制作mod2 ocr vlm识别 模型页面点击打开模型页面
ocr·vlm
ASS-ASH1 个月前
视觉语言大模型Qwen3-VL-8B-Instruct概述
人工智能·python·llm·多模态·qwen·视觉语言模型·vlm
程序员miki1 个月前
多模态模型演变
人工智能·python·llm·多模态·vlm
m0_650108242 个月前
Flamingo:打破模态壁垒的少样本视觉语言模型
论文阅读·人工智能·视觉语言模型·deepmind·vlm·通用智能·通用小样本适配
温柔哥`2 个月前
一种面向整体零样本视频异常分析的统一推理框架
vad·视频异常检测·vlm·异常定位·异常理解·异常推理·推理门控
一颗小树x2 个月前
『大模型量化』Qwen3-VL + Lora监督微调 + 8bit量化 + 实践推理
量化·vlm·qwen3-vl·lora监督微调
oliveray3 个月前
ATPrompt:基于属性的视觉提示
人工智能·prompt·vlm