IEEE S&P 2026 大模型安全论文整理

总目录大模型安全研究论文整理 2026年版：https://blog.csdn.net/WhiffeYF/article/details/159047894

IEEE S&P 2026 大模型安全论文整理

本文梳理了 IEEE S&P 2026（第47届 IEEE 安全与隐私研讨会）中与 LLM Safety 相关的论文，涵盖大模型越狱攻击、防御与对齐、隐私保护、RAG 安全、多模态大模型安全、智能体安全、图-LLM 安全、工具链安全及生成式 AI 安全与隐私等方向。

序号	论文标题	简介	其它
1	URLcoat: Exploiting Web Search Capability to Jailbreak Large Language Models	揭示并利用了联网大模型的网页搜索能力实施越狱攻击的新型攻击面。	大模型越狱攻击
2	MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation	通过对在线大模型服务中的特殊令牌进行操纵，实现模型越狱。	大模型越狱攻击
3	SoK: Evaluating Jailbreak Guardrails for Large Language Models	系统综述并量化评估了现有大模型越狱防护机制的有效性与局限。	大模型防御与安全对齐
4	SoK: Robustness in Large Language Models against Jailbreak Attacks	系统梳理了大模型在面对越狱攻击时鲁棒性研究的现状、度量方法与核心挑战。	大模型防御与安全对齐
5	EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models	提出统一的安全对齐迁移框架，解决微调后大模型安全能力退化的问题。	大模型防御与安全对齐
6	When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins	分析了第三方 AI 聊天机器人插件生态中的提示注入风险及其安全威胁。	提示注入与对抗攻击
7	PromptLocate: Localizing Prompt Injection Attacks	提出对提示注入攻击输入进行精准定位与溯源的检测方法。	提示注入与对抗攻击
8	LLM Unlearning Should Be Form-Independent	论证大模型遗忘机制应与知识表征形式无关，推动模型遗忘的理论统一与评估规范化。	大模型隐私保护
9	Beyond Indistinguishability: Measuring Extraction Risk in LLM APIs	超越传统不可区分性假设，提出对 LLM API 模型提取风险的量化度量框架。	大模型隐私保护
10	LLMThief: Evaluating Configuration Leaking Risks in Commercial LLM App Stores	评估商业 LLM 应用商店中的配置泄露风险，揭示模型部署侧的信息泄露面。	大模型隐私保护
11	Hollow-LLM Attack: Computationally Trivial Weights in Zero-Knowledge Verification of LLM Inference	揭示零知识验证 LLM 推理过程中存在的轻量级权重漏洞，威胁推理可验证性。	大模型隐私保护
12	Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach	提出基于投影残差的高效成员推理攻击方法，针对联邦场景下大模型的隐私漏洞。	大模型隐私保护
13	Recovering and Rehosting Mobile Local LLM Conversations and Contexts via Memory Forensics	利用内存取证技术恢复移动端本地大模型的对话与上下文，暴露端侧部署隐私风险。	大模型隐私保护
14	PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs	提出仅依赖内容的系统提示版权审计方法，保护大模型系统提示的知识产权。	大模型隐私保护
15	Euston: Efficient and User-Friendly Secure Transfomer Inference with Non-Interactivity	提出高效且用户友好的非交互式安全 Transformer 推理方案，保护模型推理阶段隐私。	大模型隐私保护
16	Who Taught the Lie? Responsibility Attribution for Poisoned Knowledge in Retrieval-Augmented Generation	研究检索增强生成中中毒知识的责任归因问题，揭示 RAG 供应链的信任风险。	RAG 安全
17	GraphRAG under Fire	对 GraphRAG 范式发起安全评估，揭示图结构检索增强生成中的新型攻击面。	RAG 安全
18	The Person Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Models	揭示多模态大模型可通过音频推断用户私有属性的隐私风险。	多模态大模型安全
19	Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection	设计上下文无关且不可感知的听觉提示注入方法，实现对音频语言模型的劫持攻击。	多模态大模型安全
20	Adversarial Hubness in Multi-Modal Retrieval	揭示多模态检索系统中的对抗性 hubness 现象，评估跨模态检索的脆弱性。	多模态大模型安全
21	DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling	基于分布建模实现对文本到图像生成系统的高效大规模红队测试。	多模态大模型安全
22	Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models	评估文本到图像模型中概念过滤防御机制对有害内容生成的拦截效果。	多模态大模型安全
23	WebCloak: Characterizing and Mitigating Threats from LLM-Driven Web Agents as Intelligent Scrapers	刻画并缓解 LLM 驱动的 Web 智能体作为智能爬虫带来的数据窃取与滥用威胁。	智能体（Agent）安全
24	Towards Automating Data Access Permissions in AI Agents	研究 AI 智能体数据访问权限的自动化管理机制，降低智能体越权访问风险。	智能体（Agent）安全
25	Investigating the Impact of Dark Patterns on LLM-Based Web Agents	调查暗黑模式对基于 LLM 的 Web 智能体的干扰与操纵风险。	智能体（Agent）安全
26	Site Isolation is Dead: How Site Isolation is Broken in Agentic Browsers and Extensions	揭示智能体浏览器及其扩展中站点隔离机制的失效问题与安全隐患。	智能体（Agent）安全
27	Are LLM-Enhanced Graph Neural Networks Robust against Poisoning Attacks?	评估 LLM 增强的图神经网络在面对数据投毒攻击时的鲁棒性与脆弱性。	图-LLM 安全
28	Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem	对 MCP 工具链生态进行大规模攻击分析，揭示大模型工具调用链路的安全隐患。	工具链安全
29	When Designers Meet GenAI: Understanding the Role of Prompt-to-Design Generators in Privacy Dark Patterns	研究生成式 AI 设计工具中提示到设计生成器在隐私暗黑模式中的角色与风险。	生成式 AI 安全与隐私
30	AI Wrote My Paper and All I Got Was This False Negative: Measuring the Efficacy of Commercial AI Text Detectors	测量商用 AI 文本检测器识别生成内容的实际效能，揭示检测工具的性能盲区。	生成式 AI 安全与隐私