计算机视觉·TagCLIP

TagCLIP

Abstract---Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks. However, existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes, leading to confusion between novel classes and semantically similar ones. In this work, we propose a novel approach, TagCLIP (Trusty-aware guided CLIP), to address this issue. We disentangle the ill-posed optimization problem into two parallel processes: semantic matching performed individually and reliability judgment for improving discrimination ability. Building on the idea of special tokens in language modeling representing sentence-level embeddings, we introduce a trusty token that enables distinguishing novel classes from known ones in prediction. To evaluate our approach, we conduct experiments on two benchmark datasets, PASCAL VOC 2012 and COCO-Stuff 164 K. Our results show that TagCLIP improves the Intersection over Union (IoU) of unseen classes by 7.4% and 1.7%, respectively, with negligible overheads. The code is available at here.

动机

过去的工作总是将不可见类错误分类为相似类(应该指的是可见类)

  • 引入一个额外的token tCt_CtC

可信token学习器:就是一个自注意力机制。

  • 分为两个MAM_AMA和MRM_RMR,MRM_RMR用于减少对于不可见类的概率。

  • 可见类为1,不可见类为0

  • 损失函数:就是一个Dice损失

推理

  • 减少可见类的预测概率
  • 适当调整概率

消融实验

  • 作者的消融实验还是比较丰富的。可以学习以下
相关推荐
晨之清风7 分钟前
Codex常用命令
人工智能
hsg7710 分钟前
简述:2026年中考一地作文题目 :接纳无解,向阳求索
人工智能·机器学习
北京耐用通信15 分钟前
国产化替代优选!耐达讯自动化NY-HUB6完美兼容替代PB-HUB6\GL
人工智能·科技·网络协议·自动化·信息与通信
大白话_NOI17 分钟前
【洛谷 P2249】查找(深基 13. 例 1)+ 详细分析
c++·算法
吠品17 分钟前
C++实现m行n列带边框的长方形输出
算法
LaughingZhu19 分钟前
Product Hunt 每日热榜 | 2026-06-11
人工智能·经验分享·神经网络·html·产品运营
智者知已应修善业25 分钟前
【51单片机2个外部中断显示中断历时,初始化8左移3位共阳数码管】2024-6-6
c++·经验分享·笔记·算法·51单片机
像风一样自由202032 分钟前
17.推理框架横评:vLLM / TGI / TensorRT-LLM / SGLang 全面对比
人工智能·大模型·vllm·sglang
walnut_oyb34 分钟前
CVPR 2026|VisRes Bench:视觉语言模型视觉推理能力评估
人工智能·语言模型·自然语言处理
网教盟人才服务平台40 分钟前
第223期方班学术研讨厅成功举办
人工智能