Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles

Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles

Authors: Eric Slyman, Mehrab Tanjim, Kushal Kafle, Stefan Lee

Deep-Dive Summary:

Original Abstract: Multimodal large language models (MLLMs) are increasingly used to evaluate

text-to-image (TTI) generation systems, providing automated judgments based on

visual and textual context. However, these "judge" models often suffer from

biases, overconfidence, and inconsistent performance across diverse image

domains. While prompt ensembling has shown promise for mitigating these issues

in unimodal, text-only settings, our experiments reveal that standard

ensembling methods fail to generalize effectively for TTI tasks. To address

these limitations, we propose a new multimodal-aware method called Multimodal

Mixture-of-Bayesian Prompt Ensembles (MMB). Our method uses a Bayesian prompt

ensemble approach augmented by image clustering, allowing the judge to

dynamically assign prompt weights based on the visual characteristics of each

sample. We show that MMB improves accuracy in pairwise preference judgments and

greatly enhances calibration, making it easier to gauge the judge's true

uncertainty. In evaluations on two TTI benchmarks, HPSv2 and MJBench, MMB

outperforms existing baselines in alignment with human annotations and

calibration across varied image content. Our findings highlight the importance

of multimodal-specific strategies for judge calibration and suggest a promising

path forward for reliable large-scale TTI evaluation.

PDF Link: 2509.08777v1

部分平台可能图片显示异常,请以我的博客内容为准

相关推荐
可信AI Coding41 分钟前
AI产业周报|AI编程工具的代际跃迁:可信智能开发进入自主时代
ai·大模型·编程
彦为君1 小时前
JavaSE-11-BIO/NIO/AIO(多人聊天室)
java·开发语言·python·ai·nio
ashen♂2 小时前
OpenClaw Windows本机安装
ai·openclaw
彦为君2 小时前
JavaSE-10-并发编程(11个案例)
java·开发语言·python·ai·nio
这是谁的博客?3 小时前
AI 领域精选新闻(2026-05-21)
人工智能·gpt·ai·google·大模型·gemini·新闻
Elastic 中国社区官方博客4 小时前
在 Elasticsearch 中,存储向量查询速度最高提升 3 倍
大数据·人工智能·elasticsearch·搜索引擎·ai·全文检索
searchforAI5 小时前
视频画面里的PPT怎么提取?视频转图文讲义的实操教程
人工智能·学习·ai·aigc·powerpoint·音视频·贴图
柱子jason5 小时前
基于IOT-Tree Server,配合IOT-Tree AI Edge实现手势识别
物联网·ai·手势识别·iot-tree
kkkliaoo5 小时前
AI编程Token费用大公开:四种真实场景的年成本对比
ai·开发工具·程序开发