技术栈

强化微调

Catching Star
11 天前
论文阅读·强化微调
【论文笔记】【强化微调】Vision-R1:首个针对多模态 LLM 制定的强化微调方法,以 7B 比肩 70B[2503.06749] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Catching Star
1 个月前
论文阅读·强化微调
【论文笔记】【强化微调】TinyLLaVA-Video-R1:小参数模型也能视频推理[2504.09641] TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
温柔哥`
2 个月前
vad·var·视频异常检测·grpo·视频异常推理·推理数据集·强化微调
Vad-R1:通过从感知到认知的思维链进行视频异常推理Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought ¹中山大学深圳校区;²哈尔滨工业大学(深圳);³香港理工大学 arxiv’25’05