总目录 大模型安全研究论文整理 2026年版:https://blog.csdn.net/WhiffeYF/article/details/159047894
本文整理自 DBLP WWW 2026 论文集,筛选出与大模型(LLM)、推理模型、智能体(Agent)、多模态大模型等安全、隐私、对抗、防御相关的论文。
共计整理 72 篇论文。
大模型越狱与对抗攻击
| 序号 | 论文标题 | 简介 | 所属Track |
|---|---|---|---|
| 1 | Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs | 探索大语言模型在文本属性图上的通用对抗攻击能力,揭示LLM对图学习的潜在欺骗风险。 | Track 2: Graph Algorithms and Modeling for the Web |
| 2 | LLMQuA: Practical Backdoor Injection on Large Language Model Quantization | 面向大语言模型量化过程的实用后门注入方法,展示在模型压缩阶段植入恶意行为的可行性。 | Track 5: Security and Privacy |
| 3 | Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs | 揭示图-大语言模型在文本属性图上的多维脆弱性,提出可解释的多维对抗攻击框架。 | Track 5: Security and Privacy |
| 4 | Breaking Cross-modal Alignment in Embodied Intelligence: A Multimodal Adversarial Attack Framework for Vision-Language-Action Models | 针对具身智能中的视觉-语言-动作模型,提出破坏跨模态对齐的多模态对抗攻击框架。 | Track 5: Security and Privacy |
| 5 | ICL-Evader: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses | 研究针对上下文学习的零查询黑盒逃逸攻击,并探讨相应的防御策略。 | Track 5: Security and Privacy |
| 6 | Inference Cost Attacks for Retrieval-Augmented Large Language Models | 提出针对检索增强大语言模型的推理成本攻击,通过构造恶意输入显著增加模型推理开销。 | Track 10: Web Mining and Content Analysis |
| 7 | The Asymmetric Vulnerability: Bypassing LLM Defenses via Guardrail-Model Mismatch | 揭示安全护栏与底层模型之间的不对称脆弱性,利用二者失配绕过LLM防御机制。 | Track 5: Security and Privacy |
| 8 | Exploring and Exploiting Security Vulnerabilities in Self-Hosted LLM Services | 系统探索并利用自托管大语言模型服务中的安全漏洞,评估本地部署LLM的安全风险。 | Track 5: Security and Privacy |
| 9 | KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation | 提出面向基于图的检索增强生成的知识演化毒化攻击,污染知识图谱以操控模型输出。 | Track 4: Search and Retrieval-Augmented AI |
| 10 | Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models | 研究单比特翻转即可触发大语言模型产生异常行为的现象,验证硬件故障导致AI失效的早期预言。 | Track 5: Security and Privacy |
推理大模型越狱攻击
| 序号 | 论文标题 | 简介 | 所属Track |
|---|---|---|---|
| 1 | When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models | 首次针对黑盒推理大模型提出成员推断攻击,发现推理过程会泄露训练数据的成员关系信息。 | Track 5: Security and Privacy |
大模型安全防御
| 序号 | 论文标题 | 简介 | 所属Track |
|---|---|---|---|
| 1 | PADD: Prefix-based Attention Divergence Detector for LLM Jailbreaks | 提出基于前缀的注意力发散检测器,用于识别和防御大语言模型的越狱提示。 | Track 5: Security and Privacy |
| 2 | FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks | 构建基于知识图谱的防御框架,增强大语言模型对抗欺诈攻击的能力。 | Track 5: Security and Privacy |
| 3 | Towards Robust Detection of Chinese Toxic Variants via Dynamic Knowledge Graph-LLM Reasoning | 结合动态知识图谱与大语言模型推理,实现对中文毒性变体内容的鲁棒检测。 | Track 5: Security and Privacy |
| 4 | Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning | 揭示大语言模型的谄媚行为,提出利用对立立场推理对抗标题党与错误信息。 | Track 5: Security and Privacy |
| 5 | BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text | 提出双向对齐的下一代token去噪框架BIND,实现有害网络文本的快速轻量级去混淆。 | Track 3: Responsible Web |
| 6 | Be Responsible in Your Answers! Monitoring Out-of-Domain Behaviors in Domain-Specific LLMs | 研究对领域专用大语言模型进行域外行为监控的方法,确保模型在限定领域内的回答可靠性。 | Track 3: Responsible Web |
| 7 | D-Models and E-Models: Diversity-Stability Trade-offs in the Sampling Behavior of Large Language Models | 分析大语言模型采样行为中的多样性与稳定性权衡,提出D-Models与E-Models的划分框架。 | Track 3: Responsible Web |
| 8 | Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks | 研究在对抗性情感攻击下,利用大语言模型进行鲁棒假新闻检测的方法。 | Track 3: Responsible Web |
| 9 | Read as You See: Guiding Unimodal LLMs for Low-Resource Explainable Harmful Meme Detection | 提出引导单模态大语言模型的方法,在低资源场景下实现可解释的有害模因检测。 | Track 3: Responsible Web |
| 10 | Med-R2: Crafting Trustworthy LLM Physicians via Retrieval and Reasoning of Evidence-Based Medicine | 通过循证医学的检索与推理,构建可信赖的医疗大语言模型医生系统Med-R2。 | Track 4: Search and Retrieval-Augmented AI |
| 11 | Rethinking the Hidden Risk of Reranking: Achieving Risk-aware Reranking with Information Gain for RAG with LLMs | 重新审视RAG系统中重排序的隐藏风险,提出基于信息增益的风险感知重排序方法。 | Track 4: Search and Retrieval-Augmented AI |
| 12 | A Fact-Checking Framework with Denoising Evidence Retrieval and LLM-Based Debate Verification | 构建结合去噪证据检索与大语言模型辩论验证的事实核查框架。 | Track 4: Search and Retrieval-Augmented AI |
| 13 | Conflict-Aware RAG: Multi-Stage Learning with Conflict Signals for Robust Retrieval-Augmented Generation | 提出冲突感知的RAG框架,通过多阶段学习利用冲突信号增强检索增强生成的鲁棒性。 | Track 4: Search and Retrieval-Augmented AI |
| 14 | IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation | 通过危害分离机制,提出鲁棒的多模态检索增强生成框架IRAG。 | Track 4: Search and Retrieval-Augmented AI |
| 15 | PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading | 提出PaperAsk基准,用于系统评估大语言模型在学术论文搜索与阅读任务中的可靠性。 | Track 4: Search and Retrieval-Augmented AI |
| 16 | SeaRAG: Reducing Hallucination in Retrieval-Augmented Generation via Statement-Entity Adaptive Ranking | 通过陈述-实体自适应排序,有效缓解检索增强生成中的幻觉问题。 | Track 4: Search and Retrieval-Augmented AI |
| 17 | A Graph Foundation Model for Unified Anomaly Detection | 提出一种图基础模型,统一处理多种场景下的异常检测任务。 | Track 2: Graph Algorithms and Modeling for the Web |
| 18 | Can Multimodal LLMs Perform Time Series Anomaly Detection? | 首次系统评估多模态大语言模型在时间序列异常检测任务上的性能表现。 | Track 8: Systems and Infrastructure for Web, Mobile, and Web of Things |
| 19 | Smart Eye: LLM-Guided Proposer-Verifier Framework for Industrial-Scale Log Anomaly Detection | 提出基于大语言模型引导的提出者-验证者框架Smart Eye,用于工业级日志异常检测。 | Industry Track |
| 20 | Cascaded Verification Framework: A Progressive Approach for Mitigating Hallucinations in Large Language Models | 提出级联验证框架,通过渐进式验证过程缓解大语言模型的幻觉现象。 | Short Papers |
| 21 | PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation Detection | 构建自适应多智能体系统PAMAS,通过视角聚合提升虚假信息检测能力。 | Track 10: Web Mining and Content Analysis |
| 22 | Triple-R: Iterative Query Rewriting and Refinement for Retrieval-Augmented Fake News Detection | 提出Triple-R框架,通过迭代查询重写与优化增强检索增强的假新闻检测效果。 | Track 10: Web Mining and Content Analysis |
| 23 | Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection | 发现提示诱导产生的语言指纹特征,并将其用于检测大语言模型生成的假新闻。 | Track 10: Web Mining and Content Analysis |
| 24 | How Human Experts Educate Specialized LLMs: Filling Knowledge Gaps in KG-Augmented Generation through Hallucination Detection | 探索人类专家如何通过幻觉检测填补知识增强生成中的知识鸿沟,以教育专用大语言模型。 | Track 6: Semantics and Knowledge |
| 25 | Knowledge-Enhanced Multimodal Fake News Detection: Semantic Visual and Priority Fusion | 提出知识增强的多模态假新闻检测方法,融合语义视觉信息与优先级特征。 | Track 6: Semantics and Knowledge |
| 26 | CogAgent: Self-Evolving Cognitive Agents for Multi-Source Fraud Detection in Heterogeneous Financial Networks | 构建自演进认知智能体CogAgent,在异构金融网络中实现多源欺诈检测。 | Track 5: Security and Privacy |
| 27 | TGNN: Enhancing Pixel Tracking Detection via LLM-driven Annotation and GAT-powered Structural Representation | 结合大语言模型驱动的标注与图注意力网络的结构表示,提升像素追踪检测能力。 | Track 5: Security and Privacy |
| 28 | Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages | 构建融合专家推理与大语言模型检测的知识驱动框架,用于识别恶意软件包。 | Track 6: Semantics and Knowledge |
| 29 | Does LLM Focus on the Right Words? Mitigating Context Bias in LLM-based Recommenders | 分析大语言模型在推荐系统中是否关注正确词汇,并提出缓解上下文偏置的方法。 | Track 9: User Modeling, Personalization and Recommendation |
| 30 | Unbiased Multimodal Reranking for Long-Tail Short-Video Search | 提出无偏的多模态重排序方法,改善长尾短视频搜索的公平性与效果。 | Industry Track |
| 31 | Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings | 揭示大语言模型与表情符号嵌入中的肤色偏见,量化数字皮肤带来的数字偏见。 | Track 3: Responsible Web |
| 32 | Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation | 评估大语言模型增强型搜索引擎对抗黑帽SEO操纵的韧性。 | Track 5: Security and Privacy |
推理大模型安全防御
| 序号 | 论文标题 | 简介 | 所属Track |
|---|---|---|---|
| 1 | Expectation-Guided Self-Verification for Aligning Large Reasoning Models with Domain Knowledge | 提出期望引导的自验证方法,使推理大模型与领域知识实现对齐。 | Track 6: Semantics and Knowledge |
| 2 | Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning | 利用思维链推理的多智能体方法,抵御meme币跟单交易中的操纵性机器人。 | Track 5: Security and Privacy |
大模型隐私保护
| 序号 | 论文标题 | 简介 | 所属Track |
|---|---|---|---|
| 1 | Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation | 提出基于零阶梯度估计的黑盒大语言模型指纹提取方法,实现模型身份的可靠识别。 | Track 5: Security and Privacy |
| 2 | Reconstructing Training Data from Adapter-based Federated Large Language Models | 研究从基于适配器的联邦大语言模型中重建训练数据的隐私风险。 | Track 5: Security and Privacy |
| 3 | When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models | 首次针对黑盒推理大模型提出成员推断攻击,发现推理过程会泄露训练数据的成员关系信息。 | Track 5: Security and Privacy |
| 4 | Decoding Web Memorization: A Semantic Membership Inference Attack on LLMs | 提出基于语义记忆解码的成员推断攻击,揭示大语言模型对网页内容的记忆泄露。 | Track 6: Semantics and Knowledge |
| 5 | Towards Practical LLM Unlearning: Efficient, Modular, and Retain-Free | 提出一种实用的大语言模型遗忘学习方法,具备高效、模块化且不影响保留知识的特性。 | Track 5: Security and Privacy |
| 6 | DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing | 提出双空间平滑方法DSSmoothing,为预训练语言模型提供可认证的数据集所有权验证。 | Track 5: Security and Privacy |
| 7 | AWMA-MoE: Attention-Guided Watermark Adapter with MoE for Latent Diffusion Models | 设计基于注意力引导的混合专家水印适配器,用于潜在扩散模型的版权保护。 | Short Papers |
| 8 | Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection | 研究大语言模型引导的语义一致性注入攻击,可破坏语义感知水印机制。 | Short Papers |
| 9 | The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT | 解构ChatGPT中的记忆机制,从算法自画像的角度分析大语言模型的记忆构成。 | Track 5: Security and Privacy |
智能体/大模型多模态等的安全
| 序号 | 论文标题 | 简介 | 所属Track |
|---|---|---|---|
| 1 | Combating Knowledge Corruption in Agent Systems: A Byzantine-Tolerant Secure Collaborative RAG Framework | 提出容忍拜占庭故障的安全协作RAG框架,以抵御智能体系统中的知识腐化攻击。 | Track 5: Security and Privacy |
| 2 | ARuleCon: Agentic Security Rule Conversion | 提出ARuleCon框架,实现安全规则向智能体可执行规则的自动转换。 | Track 5: Security and Privacy |
| 3 | SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection | 设计基于信用机制的动态威胁检测网络SentinelNet,保障多智能体协作安全。 | Track 5: Security and Privacy |
| 4 | Beyond Detection: Autonomous Anomaly Remediation for MCP Against Tool Poisoning Attacks | 提出面向MCP协议的自主异常修复机制,超越检测层面以防御工具投毒攻击。 | Track 5: Security and Privacy |
| 5 | MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM | 提出多粒度提示学习方法MGFFD-VLM,利用视觉-语言模型进行人脸伪造检测。 | Track 5: Security and Privacy |
| 6 | Breaking Cross-modal Alignment in Embodied Intelligence: A Multimodal Adversarial Attack Framework for Vision-Language-Action Models | 针对具身智能中的视觉-语言-动作模型,提出破坏跨模态对齐的多模态对抗攻击框架。 | Track 5: Security and Privacy |
| 7 | Navigating Truth in Multimodal Fact-checking via Retrieval- and Reasoning-Enhanced Large Language Models | 利用检索与推理增强的大语言模型,引导多模态事实核查中的真相发现。 | Track 3: Responsible Web |
| 8 | IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation | 通过危害分离机制,提出鲁棒的多模态检索增强生成框架IRAG。 | Track 4: Search and Retrieval-Augmented AI |
| 9 | Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations | 建模网络化大语言模型智能体中的涌现协调行为,分析信息作战中的战略动态。 | Track 7: Social Networks and Social Media |
| 10 | CogAgent: Self-Evolving Cognitive Agents for Multi-Source Fraud Detection in Heterogeneous Financial Networks | 构建自演进认知智能体CogAgent,在异构金融网络中实现多源欺诈检测。 | Track 5: Security and Privacy |
| 11 | PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation Detection | 构建自适应多智能体系统PAMAS,通过视角聚合提升虚假信息检测能力。 | Track 10: Web Mining and Content Analysis |
| 12 | Mitigating Cognitive Vulnerabilities in Code Generation via Multi-Agent Adversarial Debate | 利用多智能体对抗辩论机制,缓解代码生成过程中的认知脆弱性。 | Track 10: Web Mining and Content Analysis |
| 13 | What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, & Emerging Implications of Agentic E-Commerce | 系统评估AI智能体在电子商务中的购买行为,揭示偏置、模型依赖性与新兴影响。 | Short Papers |
| 14 | Can Multimodal LLMs Perform Time Series Anomaly Detection? | 首次系统评估多模态大语言模型在时间序列异常检测任务上的性能表现。 | Track 8: Systems and Infrastructure for Web, Mobile, and Web of Things |
| 15 | AWMA-MoE: Attention-Guided Watermark Adapter with MoE for Latent Diffusion Models | 设计基于注意力引导的混合专家水印适配器,用于潜在扩散模型的版权保护。 | Short Papers |
| 16 | Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs | 揭示图-大语言模型在文本属性图上的多维脆弱性,提出可解释的多维对抗攻击框架。 | Track 5: Security and Privacy |
| 17 | Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs | 探索大语言模型在文本属性图上的通用对抗攻击能力,揭示LLM对图学习的潜在欺骗风险。 | Track 2: Graph Algorithms and Modeling for the Web |
| 18 | Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating | 提出以LLM为核心的情感视觉定制方法,实现高效且精确的情绪操控与内容生成。 | Track 3: Responsible Web |