多模态大语言模型arxiv论文略读(四十五)➡️ 论文标题:CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios ➡️ 论文作者:Qilang Ye, Zitong Yu, Rui Shao, Xinyu Xie, Philip Torr, Xiaochun Cao ➡️ 研究机构: Great Bay University、Harbin Institute of Technology, Shenzhe