Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
ASS-ASH6 天前
视觉语言大模型Qwen3-VL-8B-Instruct概述
人工智能·python·llm·多模态·qwen·视觉语言模型·vlm
程序员miki18 天前
多模态模型演变
人工智能·python·llm·多模态·vlm
m0_650108241 个月前
Flamingo:打破模态壁垒的少样本视觉语言模型
论文阅读·人工智能·视觉语言模型·deepmind·vlm·通用智能·通用小样本适配
温柔哥`1 个月前
一种面向整体零样本视频异常分析的统一推理框架
vad·视频异常检测·vlm·异常定位·异常理解·异常推理·推理门控
一颗小树x1 个月前
『大模型量化』Qwen3-VL + Lora监督微调 + 8bit量化 + 实践推理
量化·vlm·qwen3-vl·lora监督微调
oliveray2 个月前
ATPrompt:基于属性的视觉提示
人工智能·prompt·vlm
nenchoumi31192 个月前
LLM 论文精读(十二)DeepSeek-OCR: Contexts Optical Compression
人工智能·计算机视觉·llm·ocr·vlm·deepseek
贾全4 个月前
准备篇:搭建你的AI“炼丹炉“
人工智能·ai·vlm·多模态ai·vlm环境配置
Uzuki4 个月前
LLM 指标 | PPL vs. BLEU vs. ROUGE-L vs. METEOR vs. CIDEr
深度学习·机器学习·llm·vlm
Struart_R4 个月前
SpatialVLM和SpatialRGPT论文解读
计算机视觉·语言模型·transformer·大语言模型·vlm·视觉理解·空间推理