WWW 2026 大模型安全相关论文整理

总目录大模型安全研究论文整理 2026年版：https://blog.csdn.net/WhiffeYF/article/details/159047894

本文整理自 DBLP WWW 2026 论文集，筛选出与大模型（LLM）、推理模型、智能体（Agent）、多模态大模型等安全、隐私、对抗、防御相关的论文。
共计整理 72 篇论文。

大模型越狱与对抗攻击

序号	论文标题	简介	所属Track
1	Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs	探索大语言模型在文本属性图上的通用对抗攻击能力，揭示LLM对图学习的潜在欺骗风险。	Track 2: Graph Algorithms and Modeling for the Web
2	LLMQuA: Practical Backdoor Injection on Large Language Model Quantization	面向大语言模型量化过程的实用后门注入方法，展示在模型压缩阶段植入恶意行为的可行性。	Track 5: Security and Privacy
3	Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs	揭示图-大语言模型在文本属性图上的多维脆弱性，提出可解释的多维对抗攻击框架。	Track 5: Security and Privacy
4	Breaking Cross-modal Alignment in Embodied Intelligence: A Multimodal Adversarial Attack Framework for Vision-Language-Action Models	针对具身智能中的视觉-语言-动作模型，提出破坏跨模态对齐的多模态对抗攻击框架。	Track 5: Security and Privacy
5	ICL-Evader: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses	研究针对上下文学习的零查询黑盒逃逸攻击，并探讨相应的防御策略。	Track 5: Security and Privacy
6	Inference Cost Attacks for Retrieval-Augmented Large Language Models	提出针对检索增强大语言模型的推理成本攻击，通过构造恶意输入显著增加模型推理开销。	Track 10: Web Mining and Content Analysis
7	The Asymmetric Vulnerability: Bypassing LLM Defenses via Guardrail-Model Mismatch	揭示安全护栏与底层模型之间的不对称脆弱性，利用二者失配绕过LLM防御机制。	Track 5: Security and Privacy
8	Exploring and Exploiting Security Vulnerabilities in Self-Hosted LLM Services	系统探索并利用自托管大语言模型服务中的安全漏洞，评估本地部署LLM的安全风险。	Track 5: Security and Privacy
9	KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation	提出面向基于图的检索增强生成的知识演化毒化攻击，污染知识图谱以操控模型输出。	Track 4: Search and Retrieval-Augmented AI
10	Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models	研究单比特翻转即可触发大语言模型产生异常行为的现象，验证硬件故障导致AI失效的早期预言。	Track 5: Security and Privacy

推理大模型越狱攻击

序号	论文标题	简介	所属Track
1	When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models	首次针对黑盒推理大模型提出成员推断攻击，发现推理过程会泄露训练数据的成员关系信息。	Track 5: Security and Privacy

大模型安全防御

序号	论文标题	简介	所属Track
1	PADD: Prefix-based Attention Divergence Detector for LLM Jailbreaks	提出基于前缀的注意力发散检测器，用于识别和防御大语言模型的越狱提示。	Track 5: Security and Privacy
2	FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks	构建基于知识图谱的防御框架，增强大语言模型对抗欺诈攻击的能力。	Track 5: Security and Privacy
3	Towards Robust Detection of Chinese Toxic Variants via Dynamic Knowledge Graph-LLM Reasoning	结合动态知识图谱与大语言模型推理，实现对中文毒性变体内容的鲁棒检测。	Track 5: Security and Privacy
4	Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning	揭示大语言模型的谄媚行为，提出利用对立立场推理对抗标题党与错误信息。	Track 5: Security and Privacy
5	BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text	提出双向对齐的下一代token去噪框架BIND，实现有害网络文本的快速轻量级去混淆。	Track 3: Responsible Web
6	Be Responsible in Your Answers! Monitoring Out-of-Domain Behaviors in Domain-Specific LLMs	研究对领域专用大语言模型进行域外行为监控的方法，确保模型在限定领域内的回答可靠性。	Track 3: Responsible Web
7	D-Models and E-Models: Diversity-Stability Trade-offs in the Sampling Behavior of Large Language Models	分析大语言模型采样行为中的多样性与稳定性权衡，提出D-Models与E-Models的划分框架。	Track 3: Responsible Web
8	Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks	研究在对抗性情感攻击下，利用大语言模型进行鲁棒假新闻检测的方法。	Track 3: Responsible Web
9	Read as You See: Guiding Unimodal LLMs for Low-Resource Explainable Harmful Meme Detection	提出引导单模态大语言模型的方法，在低资源场景下实现可解释的有害模因检测。	Track 3: Responsible Web
10	Med-R2: Crafting Trustworthy LLM Physicians via Retrieval and Reasoning of Evidence-Based Medicine	通过循证医学的检索与推理，构建可信赖的医疗大语言模型医生系统Med-R2。	Track 4: Search and Retrieval-Augmented AI
11	Rethinking the Hidden Risk of Reranking: Achieving Risk-aware Reranking with Information Gain for RAG with LLMs	重新审视RAG系统中重排序的隐藏风险，提出基于信息增益的风险感知重排序方法。	Track 4: Search and Retrieval-Augmented AI
12	A Fact-Checking Framework with Denoising Evidence Retrieval and LLM-Based Debate Verification	构建结合去噪证据检索与大语言模型辩论验证的事实核查框架。	Track 4: Search and Retrieval-Augmented AI
13	Conflict-Aware RAG: Multi-Stage Learning with Conflict Signals for Robust Retrieval-Augmented Generation	提出冲突感知的RAG框架，通过多阶段学习利用冲突信号增强检索增强生成的鲁棒性。	Track 4: Search and Retrieval-Augmented AI
14	IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation	通过危害分离机制，提出鲁棒的多模态检索增强生成框架IRAG。	Track 4: Search and Retrieval-Augmented AI
15	PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading	提出PaperAsk基准，用于系统评估大语言模型在学术论文搜索与阅读任务中的可靠性。	Track 4: Search and Retrieval-Augmented AI
16	SeaRAG: Reducing Hallucination in Retrieval-Augmented Generation via Statement-Entity Adaptive Ranking	通过陈述-实体自适应排序，有效缓解检索增强生成中的幻觉问题。	Track 4: Search and Retrieval-Augmented AI
17	A Graph Foundation Model for Unified Anomaly Detection	提出一种图基础模型，统一处理多种场景下的异常检测任务。	Track 2: Graph Algorithms and Modeling for the Web
18	Can Multimodal LLMs Perform Time Series Anomaly Detection?	首次系统评估多模态大语言模型在时间序列异常检测任务上的性能表现。	Track 8: Systems and Infrastructure for Web, Mobile, and Web of Things
19	Smart Eye: LLM-Guided Proposer-Verifier Framework for Industrial-Scale Log Anomaly Detection	提出基于大语言模型引导的提出者-验证者框架Smart Eye，用于工业级日志异常检测。	Industry Track
20	Cascaded Verification Framework: A Progressive Approach for Mitigating Hallucinations in Large Language Models	提出级联验证框架，通过渐进式验证过程缓解大语言模型的幻觉现象。	Short Papers
21	PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation Detection	构建自适应多智能体系统PAMAS，通过视角聚合提升虚假信息检测能力。	Track 10: Web Mining and Content Analysis
22	Triple-R: Iterative Query Rewriting and Refinement for Retrieval-Augmented Fake News Detection	提出Triple-R框架，通过迭代查询重写与优化增强检索增强的假新闻检测效果。	Track 10: Web Mining and Content Analysis
23	Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection	发现提示诱导产生的语言指纹特征，并将其用于检测大语言模型生成的假新闻。	Track 10: Web Mining and Content Analysis
24	How Human Experts Educate Specialized LLMs: Filling Knowledge Gaps in KG-Augmented Generation through Hallucination Detection	探索人类专家如何通过幻觉检测填补知识增强生成中的知识鸿沟，以教育专用大语言模型。	Track 6: Semantics and Knowledge
25	Knowledge-Enhanced Multimodal Fake News Detection: Semantic Visual and Priority Fusion	提出知识增强的多模态假新闻检测方法，融合语义视觉信息与优先级特征。	Track 6: Semantics and Knowledge
26	CogAgent: Self-Evolving Cognitive Agents for Multi-Source Fraud Detection in Heterogeneous Financial Networks	构建自演进认知智能体CogAgent，在异构金融网络中实现多源欺诈检测。	Track 5: Security and Privacy
27	TGNN: Enhancing Pixel Tracking Detection via LLM-driven Annotation and GAT-powered Structural Representation	结合大语言模型驱动的标注与图注意力网络的结构表示，提升像素追踪检测能力。	Track 5: Security and Privacy
28	Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages	构建融合专家推理与大语言模型检测的知识驱动框架，用于识别恶意软件包。	Track 6: Semantics and Knowledge
29	Does LLM Focus on the Right Words? Mitigating Context Bias in LLM-based Recommenders	分析大语言模型在推荐系统中是否关注正确词汇，并提出缓解上下文偏置的方法。	Track 9: User Modeling, Personalization and Recommendation
30	Unbiased Multimodal Reranking for Long-Tail Short-Video Search	提出无偏的多模态重排序方法，改善长尾短视频搜索的公平性与效果。	Industry Track
31	Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings	揭示大语言模型与表情符号嵌入中的肤色偏见，量化数字皮肤带来的数字偏见。	Track 3: Responsible Web
32	Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation	评估大语言模型增强型搜索引擎对抗黑帽SEO操纵的韧性。	Track 5: Security and Privacy

推理大模型安全防御

序号	论文标题	简介	所属Track
1	Expectation-Guided Self-Verification for Aligning Large Reasoning Models with Domain Knowledge	提出期望引导的自验证方法，使推理大模型与领域知识实现对齐。	Track 6: Semantics and Knowledge
2	Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning	利用思维链推理的多智能体方法，抵御meme币跟单交易中的操纵性机器人。	Track 5: Security and Privacy

大模型隐私保护

序号	论文标题	简介	所属Track
1	Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation	提出基于零阶梯度估计的黑盒大语言模型指纹提取方法，实现模型身份的可靠识别。	Track 5: Security and Privacy
2	Reconstructing Training Data from Adapter-based Federated Large Language Models	研究从基于适配器的联邦大语言模型中重建训练数据的隐私风险。	Track 5: Security and Privacy
3	When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models	首次针对黑盒推理大模型提出成员推断攻击，发现推理过程会泄露训练数据的成员关系信息。	Track 5: Security and Privacy
4	Decoding Web Memorization: A Semantic Membership Inference Attack on LLMs	提出基于语义记忆解码的成员推断攻击，揭示大语言模型对网页内容的记忆泄露。	Track 6: Semantics and Knowledge
5	Towards Practical LLM Unlearning: Efficient, Modular, and Retain-Free	提出一种实用的大语言模型遗忘学习方法，具备高效、模块化且不影响保留知识的特性。	Track 5: Security and Privacy
6	DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing	提出双空间平滑方法DSSmoothing，为预训练语言模型提供可认证的数据集所有权验证。	Track 5: Security and Privacy
7	AWMA-MoE: Attention-Guided Watermark Adapter with MoE for Latent Diffusion Models	设计基于注意力引导的混合专家水印适配器，用于潜在扩散模型的版权保护。	Short Papers
8	Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection	研究大语言模型引导的语义一致性注入攻击，可破坏语义感知水印机制。	Short Papers
9	The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT	解构ChatGPT中的记忆机制，从算法自画像的角度分析大语言模型的记忆构成。	Track 5: Security and Privacy

智能体/大模型多模态等的安全

序号	论文标题	简介	所属Track
1	Combating Knowledge Corruption in Agent Systems: A Byzantine-Tolerant Secure Collaborative RAG Framework	提出容忍拜占庭故障的安全协作RAG框架，以抵御智能体系统中的知识腐化攻击。	Track 5: Security and Privacy
2	ARuleCon: Agentic Security Rule Conversion	提出ARuleCon框架，实现安全规则向智能体可执行规则的自动转换。	Track 5: Security and Privacy
3	SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection	设计基于信用机制的动态威胁检测网络SentinelNet，保障多智能体协作安全。	Track 5: Security and Privacy
4	Beyond Detection: Autonomous Anomaly Remediation for MCP Against Tool Poisoning Attacks	提出面向MCP协议的自主异常修复机制，超越检测层面以防御工具投毒攻击。	Track 5: Security and Privacy
5	MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM	提出多粒度提示学习方法MGFFD-VLM，利用视觉-语言模型进行人脸伪造检测。	Track 5: Security and Privacy
6	Breaking Cross-modal Alignment in Embodied Intelligence: A Multimodal Adversarial Attack Framework for Vision-Language-Action Models	针对具身智能中的视觉-语言-动作模型，提出破坏跨模态对齐的多模态对抗攻击框架。	Track 5: Security and Privacy
7	Navigating Truth in Multimodal Fact-checking via Retrieval- and Reasoning-Enhanced Large Language Models	利用检索与推理增强的大语言模型，引导多模态事实核查中的真相发现。	Track 3: Responsible Web
8	IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation	通过危害分离机制，提出鲁棒的多模态检索增强生成框架IRAG。	Track 4: Search and Retrieval-Augmented AI
9	Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations	建模网络化大语言模型智能体中的涌现协调行为，分析信息作战中的战略动态。	Track 7: Social Networks and Social Media
10	CogAgent: Self-Evolving Cognitive Agents for Multi-Source Fraud Detection in Heterogeneous Financial Networks	构建自演进认知智能体CogAgent，在异构金融网络中实现多源欺诈检测。	Track 5: Security and Privacy
11	PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation Detection	构建自适应多智能体系统PAMAS，通过视角聚合提升虚假信息检测能力。	Track 10: Web Mining and Content Analysis
12	Mitigating Cognitive Vulnerabilities in Code Generation via Multi-Agent Adversarial Debate	利用多智能体对抗辩论机制，缓解代码生成过程中的认知脆弱性。	Track 10: Web Mining and Content Analysis
13	What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, & Emerging Implications of Agentic E-Commerce	系统评估AI智能体在电子商务中的购买行为，揭示偏置、模型依赖性与新兴影响。	Short Papers
14	Can Multimodal LLMs Perform Time Series Anomaly Detection?	首次系统评估多模态大语言模型在时间序列异常检测任务上的性能表现。	Track 8: Systems and Infrastructure for Web, Mobile, and Web of Things
15	AWMA-MoE: Attention-Guided Watermark Adapter with MoE for Latent Diffusion Models	设计基于注意力引导的混合专家水印适配器，用于潜在扩散模型的版权保护。	Short Papers
16	Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs	揭示图-大语言模型在文本属性图上的多维脆弱性，提出可解释的多维对抗攻击框架。	Track 5: Security and Privacy
17	Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs	探索大语言模型在文本属性图上的通用对抗攻击能力，揭示LLM对图学习的潜在欺骗风险。	Track 2: Graph Algorithms and Modeling for the Web
18	Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating	提出以LLM为核心的情感视觉定制方法，实现高效且精确的情绪操控与内容生成。	Track 3: Responsible Web