Finetune LLaVA on Custom Datasets

Dataset Format

Convert your data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image), and conversations (the conversation data between human and AI).

A sample JSON for finetuning LLaVA for generating tag-style captions for Stable Diffusion:

json 复制代码
[
  {
    "id": "997bb945-628d-4724-b370-b84de974a19f",
    "image": "part-000001/997bb945-628d-4724-b370-b84de974a19f.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "<image>\nWrite a prompt for Stable Diffusion to generate this image."
      },
      {
        "from": "gpt",
        "value": "a beautiful painting of chernobyl by nekro, pascal blanche, john harris, greg rutkowski, sin jong hun, moebius, simon stalenhag. in style of cg art. ray tracing. cel shading. hyper detailed. realistic. ue 5. maya. octane render. "
      },
    ]
  },
  ...
]

Command

If you have a limited task-specific data, we recommend finetuning from LLaVA checkpoints with LoRA following this script.

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

You may need to adjust the hyperparameters to fit each specific dataset and your hardware constraint.

相关推荐
m0_6501082414 天前
Flamingo:打破模态壁垒的少样本视觉语言模型
论文阅读·人工智能·视觉语言模型·deepmind·vlm·通用智能·通用小样本适配
温柔哥`17 天前
一种面向整体零样本视频异常分析的统一推理框架
vad·视频异常检测·vlm·异常定位·异常理解·异常推理·推理门控
一颗小树x20 天前
『大模型量化』Qwen3-VL + Lora监督微调 + 8bit量化 + 实践推理
量化·vlm·qwen3-vl·lora监督微调
oliveray1 个月前
ATPrompt:基于属性的视觉提示
人工智能·prompt·vlm
nenchoumi31191 个月前
LLM 论文精读(十二)DeepSeek-OCR: Contexts Optical Compression
人工智能·计算机视觉·llm·ocr·vlm·deepseek
贾全3 个月前
准备篇:搭建你的AI“炼丹炉“
人工智能·ai·vlm·多模态ai·vlm环境配置
Uzuki3 个月前
LLM 指标 | PPL vs. BLEU vs. ROUGE-L vs. METEOR vs. CIDEr
深度学习·机器学习·llm·vlm
Struart_R4 个月前
SpatialVLM和SpatialRGPT论文解读
计算机视觉·语言模型·transformer·大语言模型·vlm·视觉理解·空间推理
贾全5 个月前
零基础完全理解视觉语言模型(VLM):从理论到代码实践
人工智能·ai·语言模型·自然语言处理·vlm
贾全5 个月前
从LLM到VLM:视觉语言模型的核心技术与Python实现
人工智能·python·ai·机器人·视觉语言模型·vlm