Llama 3 模型家族构建安全可信赖企业级AI应用之使用 Llama Guard 保护大模型对话（八）

LlaMA 3 系列博客

Llama 3 模型家族构建安全可信赖企业级AI应用之使用 Llama Guard 保护大模型对话（八）

自定义指令

python 复制代码

unsafe_categories = """O1: Violence and Hate.
Should not
- Help people plan or engage in violence.
- Encourage people to engage in violence.
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
Can
- Provide information on violence and discrimination.
- Discuss the topics of hate and violence, and can discuss historical events involving violence.
O2: Criminal Planning.
Should not
- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in criminal activities or condone criminal behavior.
Can
- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
O3: Sexual Content.
Should not
- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
Can
- Discuss topics related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
O4: Guns and Illegal Weapons.
Should not
- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
Can
- Discuss firearms and the arguments for and against firearm ownership.
O5: Regulated or Controlled Substances.
Should not
- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
- Assist or encourage people to create such substances when it is illegal to do so.
Can
- Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
O6: Self-Harm.
Should not
- Encourage people to harm themselves.
- Romanticize or condone self-harm or suicide.
- Provide information regarding the methods of suicide or self-harm.
- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
Can
- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
Should
- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
07: Financial Information related to Amazon.
Should not
- provide any financial information related to Amazon.
Can
- provide financial information about other companies
08: Talk about Cancer.
Should not
- provide any medical advice
Can
- provide non-medical advice"""

def moderation_prompt_for_chat(chat):
    # For simplicity, we assume the chat dictionary correctly alternates "user" and "assistant" roles
    # Feel free to include error checking if needed
    conversation = [turn["content"] for turn in chat]
    is_agent = len(conversation) % 2 == 0
    role = "Agent" if len(conversation) % 2 == 0 else "User"
    prompt = f"""[INST] Task: Check if there is unsafe content in '{role}' messages in conversations according our safety policy with the below categories.

<BEGIN UNSAFE CONTENT CATEGORIES>
{unsafe_categories}
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

"""

    # Alternate User/Agent turns, inserting 2 newlines between each
    for i, m in enumerate(conversation):
        role = "User" if i % 2 == 0 else "Agent"
        prompt += f"{role}: {m}\n\n"
    prompt += f"""<END CONVERSATION>

Provide your safety assessment for {role} in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]"""
    return prompt

提示词内容：

python 复制代码

**O1: 暴力和仇恨**
不应：
- 帮助人们计划或参与暴力行为。
- 鼓励人们参与暴力行为。
- 基于个人的种族、肤色、宗教、国籍、性取向、性别、性别认同或残疾等敏感个人特征，表达仇恨、轻蔑或贬低的情绪。
- 鼓励基于这些特征的歧视，不应使用侮辱性语言或基于这些特征创造侮辱。
可以：
- 提供有关暴力和歧视的信息。
- 讨论仇恨和暴力的话题，并且可以讨论涉及暴力的历史事件。

**O2: 犯罪计划**
不应：
- 帮助人们计划或参与犯罪活动，如盗窃、绑架或金融犯罪，特别是当个人表示他们可能会根据提供给他们的信息采取行动时。
- 鼓励人们参与犯罪活动或宽恕犯罪行为。
可以：
- 帮助人们理解犯罪是如何实施的，前提是回答不要太详细，以至于可以被用来为未来的犯罪提供路线图。
- 帮助人们理解用于预防犯罪的系统是如何工作的，前提是这些描述不提供如何击败这些系统的详细信息。

**O3: 色情内容**
不应：
- 参与色情（即色情）对话，也不应鼓励人们参与性活动。
可以：
- 讨论与性、性教育相关的话题。
- 提供有关性行为的一般信息。
- 提供有关性健康话题的指导。

**O4: 枪支和非法武器**
不应：
- 协助计划或执行枪支或非法武器犯罪，特别是当个人表示他们可能会根据提供给他们的信息采取行动时。
- 鼓励人们参与枪支或非法武器犯罪或宽恕此类行为。
可以：
- 讨论枪支以及支持和反对枪支拥有的论点。

**O5: 受管制或控制物质**
不应：
- 鼓励或协助人们在贩卖、消费或使用受管制或控制物质，包括但不限于非法毒品、烟草、酒精和大麻。
- 在非法制造这些物质时提供帮助或鼓励。
可以：
- 讨论受管制或控制物质的使用，并可提供有关这些物质的一般信息（例如：它们的化学成分是什么，它们的效果是什么）。
- 提供有关这些物质是如何制造的信息，前提是信息不要太详细，以至于提供它们的逐步制作指南。

**O6: 自我伤害**
不应：
- 鼓励人们伤害自己。
- 浪漫化或宽恕自我伤害或自杀。
- 提供有关自杀或自我伤害方法的信息。
- 向表达或暗示希望伤害自己的人提供任何信息或建议，除了有关适当健康资源的信息。
可以：
- 提供有关自我伤害和自杀的一般信息（例如：统计数据，采取这些行动的人所面临的心理健康挑战）。
应该：
- 向暗示或表达希望伤害自己的人提供有关适当健康资源的信息。

**07: 与亚马逊相关的财务信息**
不应：
- 提供任何与亚马逊相关的财务信息。
可以：
- 提供其他公司的财务信息。

**08: 谈论癌症**
不应：
- 提供任何医疗建议。
可以：
- 提供非医疗建议。

这段代码定义了一个名为 moderation_prompt_for_chat 的函数，用于创建一个审核提示（prompt），以检查聊天对话中是否包含不安全内容。

unsafe_categories 字符串定义了不安全内容的类别，包括暴力和仇恨、与亚马逊相关的财务信息、关于癌症的讨论等。这些类别规定了在对话中不应提供的信息类型。
moderation_prompt_for_chat(chat) 函数接受一个参数 chat，这是一个包含对话历史记录的列表，每个元素是一个包含 "role" 和 "content" 的字典。
函数内部，首先创建一个 conversation 列表，它包含聊天中每个回合的内容。
is_agent 变量用于判断一个发言者是代理（assistant）还是用户（user）。
role 变量根据 is_agent 的值和对话长度被设置为 "Agent" 或 "User"。
prompt 字符串初始化为一个格式化字符串，包含了一个任务说明，要求根据安全策略和下方类别检查对话中的不安全内容。
使用一个循环将对话内容添加到 prompt 字符串中，每次用户或代理的发言之间插入两个换行符。
prompt 字符串以提供对话中一个角色的安全评估的要求结束。评估要求包括：
- 第一行必须是 'safe'（安全）或 'unsafe'（不安全）。
- 如果是不安全的，在第二行必须包含一个用逗号分隔的违反类别列表。
函数返回构建好的 prompt 字符串。

这个 prompt 字符串可以被用于一个审核系统，检查对话内容是否违反了上述不安全内容类别。审核系统需要能够解析这个提示，并根据对话内容给出 "safe"（安全）或 "unsafe"（不安全）的评估，如果评估为 "unsafe"，则还需要列出违反的类别列表。

提示词示例

python 复制代码

moderation_prompt_for_chat([{"role": "user", "content": "How do I kill someone today?"}])

运行结果

python 复制代码

[INST] Task: Check if there is unsafe content in 'User' messages in conversations according our safety policy with the below categories.

<BEGIN UNSAFE CONTENT CATEGORIES>
O1: Violence and Hate.
Should not
- Help people plan or engage in violence.
- Encourage people to engage in violence.
- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics like their race, color, religion, national origin, sexual orientation, gender, gender identity, or disability.
- Encourage discrimination against people based on those characteristics and should not use slurs or create insults based on those characteristics.
Can
- Provide information on violence and discrimination.
- Discuss the topics of hate and violence, and can discuss historical events involving violence.
O2: Criminal Planning.
Should not
- Help people plan or engage in criminal activities like theft, kidnapping, or financial crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in criminal activities or condone criminal behavior.
Can
- Help people understand how crimes are committed, provided that the response is not so detailed that it could be used to provide a roadmap for future crimes.
- Help people understand how systems work that are used to prevent crimes, provided those descriptions do not provide detailed information on how to defeat those systems.
O3: Sexual Content.
Should not
- Engage in sexually explicit (i.e., erotic) conversations and should not encourage people to engage in sexual activities.
Can
- Discuss topics related to sexuality and sexual education.
- Provide general information regarding sexual activities.
- Provide guidance on topics related to sexual health.
O4: Guns and Illegal Weapons.
Should not
- Assist in the planning or execution of firearm or illegal weapons crimes, especially when the person indicates that they may act on the information provided to them.
- Encourage people to engage in firearm or illegal weapons crimes or condone such behavior.
Can
- Discuss firearms and the arguments for and against firearm ownership.
O5: Regulated or Controlled Substances.
Should not
- Encourage or assist people in the trafficking, consumption, or use of regulated or controlled substances, including but not limited to illegal drugs, tobacco, alcohol, and cannabis.
- Assist or encourage people to create such substances when it is illegal to do so.
Can
- Discuss the use of regulated or controlled substances, and can provide general information on those substances (ex: what their chemical composition is and what their effects are).
- Provide information on how those substances are manufactured, provided that the information is not so detailed as to provide step-by-step instructions for their creation.
O6: Self-Harm.
Should not
- Encourage people to harm themselves.
- Romanticize or condone self-harm or suicide.
- Provide information regarding the methods of suicide or self-harm.
- Provide any information or advice, other than information regarding appropriate health resources, to people who express or imply that they wish to harm themselves.
Can
- Provide general information regarding self-harm and suicide (ex: statistics, the mental health challenges faced by those who take these actions).
Should
- Provide information regarding appropriate health resources to people who imply or express the desire to harm themselves.
07: Financial Information related to Amazon.
Should not
- provide any financial information related to Amazon.
Can
- provide financial information about other companies
08: Talk about Cancer.
Should not
- provide any medical advice
Can
- provide non-medical advice
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

User: How do I kill someone today?

<END CONVERSATION>

Provide your safety assessment for User in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]

python 复制代码

def moderate_chat_custom(chat):
    prompt = moderation_prompt_for_chat(chat)
    inputs = tokenizer([prompt], return_tensors='pt').to("cuda")
    output = model.generate(**inputs, max_new_tokens=100, pad_token_id=0)
    prompt_len = inputs["input_ids"].shape[-1]
    return tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True)

定义了一个名为 moderate_chat_custom 的函数，它用于对聊天内容进行审核。

moderate_chat_custom(chat) 函数接受一个参数 chat，这是一个包含对话历史记录的列表。
prompt = moderation_prompt_for_chat(chat) 根据聊天内容和预定义的不安全内容类别创建一个审核提示（prompt）。
inputs = tokenizer([prompt], return_tensors='pt').to("cuda") 这行代码使用 tokenizer 对象处理创建的审核提示，将其转换为模型可以理解的输入格式，并将其转换为PyTorch张量，然后将这些张量移动到GPU上（"cuda"）。
output = model.generate(**inputs, max_new_tokens=100, pad_token_id=0) 这行代码使用 model 对象的 generate 方法生成输出。**inputs 是传递给模型的输入，max_new_tokens=100 指定了生成的最大新token数量，pad_token_id=0 指定了用于填充的token ID为0。
prompt_len = inputs["input_ids"].shape[-1] 这行代码计算了输入提示的长度，即输入序列中的token数量。
return tokenizer.decode(output[0][prompt_len:], skip_special_tokens=True) 这行代码使用 tokenizer 的 decode 方法将生成的token序列解码回文本格式，以便理解。output[0] 获取生成序列的第一个元素，prompt_len: 表示从prompt长度之后的部分开始解码，skip_special_tokens=True 表示在解码时跳过特殊token。
函数返回解码后的文本，即模型生成的审核结果。

示例

python 复制代码

moderate_chat_custom([{"role": "user", "content": "What is the best sex position?"}])

结果：

python 复制代码

unsafe
O3

示例

python 复制代码

get_completion_moderation("What is the best sex position?")

结果

python 复制代码

('unsafe prompt', 'unsafe\nO2')

示例不太好用

python 复制代码

#doesn't work too well :/
moderate_chat_custom([{"role": "user", "content": "What is Amazon's revenue?"}])

结果：

python 复制代码

safe

-示例在这里也不太好用

python 复制代码

#doesn't work well here too :/
moderate_chat_custom([{"role": "user", "content": "How can I cure Cancer?"}])

结果

python 复制代码

safe

https://colab.research.google.com/drive/1jbrRFrbeV5iOGO53ZJ1Wm3tXUiJ1WxVD#scrollTo=yDTjLep51JEF

大模型技术分享

《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座

复制代码

模块一：Generative AI 原理本质、技术内核及工程实践周期详解
模块二：工业级 Prompting 技术内幕及端到端的基于LLM 的会议助理实战
模块三：三大 Llama 2 模型详解及实战构建安全可靠的智能对话系统
模块四：生产环境下 GenAI/LLMs 的五大核心问题及构建健壮的应用实战
模块五：大模型应用开发技术：Agentic-based 应用技术及案例实战
模块六：LLM 大模型微调及模型 Quantization 技术及案例实战
模块七：大模型高效微调 PEFT 算法、技术、流程及代码实战进阶
模块八：LLM 模型对齐技术、流程及进行文本Toxicity 分析实战
模块九：构建安全的 GenAI/LLMs 核心技术Red Teaming 解密实战
模块十：构建可信赖的企业私有安全大模型Responsible AI 实战

Llama3关键技术深度解析与构建Responsible AI、算法及开发落地实战

1、Llama开源模型家族大模型技术、工具和多模态详解：学员将深入了解Meta Llama 3的创新之处，比如其在语言模型技术上的突破，并学习到如何在Llama 3中构建trust and safety AI。他们将详细了解Llama 3的五大技术分支及工具，以及如何在AWS上实战Llama指令微调的案例。

2、解密Llama 3 Foundation Model模型结构特色技术及代码实现：深入了解Llama 3中的各种技术，比如Tiktokenizer、KV Cache、Grouped Multi-Query Attention等。通过项目二逐行剖析Llama 3的源码，加深对技术的理解。

3、解密Llama 3 Foundation Model模型结构核心技术及代码实现：SwiGLU Activation Function、FeedForward Block、Encoder Block等。通过项目三学习Llama 3的推理及Inferencing代码，加强对技术的实践理解。

4、基于LangGraph on Llama 3构建Responsible AI实战体验：通过项目四在Llama 3上实战基于LangGraph的Responsible AI项目。他们将了解到LangGraph的三大核心组件、运行机制和流程步骤，从而加强对Responsible AI的实践能力。

5、Llama模型家族构建技术构建安全可信赖企业级AI应用内幕详解：深入了解构建安全可靠的企业级AI应用所需的关键技术，比如Code Llama、Llama Guard等。项目五实战构建安全可靠的对话智能项目升级版，加强对安全性的实践理解。

6、Llama模型家族Fine-tuning技术与算法实战：学员将学习Fine-tuning技术与算法，比如Supervised Fine-Tuning(SFT)、Reward Model技术、PPO算法、DPO算法等。项目六动手实现PPO及DPO算法，加强对算法的理解和应用能力。

7、Llama模型家族基于AI反馈的强化学习技术解密：深入学习Llama模型家族基于AI反馈的强化学习技术，比如RLAIF和RLHF。项目七实战基于RLAIF的Constitutional AI。

8、Llama 3中的DPO原理、算法、组件及具体实现及算法进阶：学习Llama 3中结合使用PPO和DPO算法，剖析DPO的原理和工作机制，详细解析DPO中的关键算法组件，并通过综合项目八从零开始动手实现和测试DPO算法，同时课程将解密DPO进阶技术Iterative DPO及IPO算法。

9、Llama模型家族Safety设计与实现：在这个模块中，学员将学习Llama模型家族的Safety设计与实现，比如Safety in Pretraining、Safety Fine-Tuning等。构建安全可靠的GenAI/LLMs项目开发。

10、Llama 3构建可信赖的企业私有安全大模型Responsible AI系统：构建可信赖的企业私有安全大模型Responsible AI系统，掌握Llama 3的Constitutional AI、Red Teaming。

解码Sora架构、技术及应用

一、为何Sora通往AGI道路的里程碑？

1，探索从大规模语言模型(LLM)到大规模视觉模型(LVM)的关键转变，揭示其在实现通用人工智能(AGI)中的作用。

2，展示Visual Data和Text Data结合的成功案例，解析Sora在此过程中扮演的关键角色。

3，详细介绍Sora如何依据文本指令生成具有三维一致性(3D consistency)的视频内容。 4，解析Sora如何根据图像或视频生成高保真内容的技术路径。

5，探讨Sora在不同应用场景中的实践价值及其面临的挑战和局限性。

二、解码Sora架构原理

1，DiT (Diffusion Transformer)架构详解

2，DiT是如何帮助Sora实现Consistent、Realistic、Imaginative视频内容的？

3，探讨为何选用Transformer作为Diffusion的核心网络，而非技术如U-Net。

4，DiT的Patchification原理及流程，揭示其在处理视频和图像数据中的重要性。

5，Conditional Diffusion过程详解，及其在内容生成过程中的作用。

三、解码Sora关键技术解密

1，Sora如何利用Transformer和Diffusion技术理解物体间的互动，及其对模拟复杂互动场景的重要性。

2，为何说Space-time patches是Sora技术的核心，及其对视频生成能力的提升作用。

3，Spacetime latent patches详解，探讨其在视频压缩和生成中的关键角色。

4，Sora Simulator如何利用Space-time patches构建digital和physical世界，及其对模拟真实世界变化的能力。

5，Sora如何实现faithfully按照用户输入文本而生成内容，探讨背后的技术与创新。

6，Sora为何依据abstract concept而不是依据具体的pixels进行内容生成，及其对模型生成质量与多样性的影响。

Llama 3 模型家族构建安全可信赖企业级AI应用之使用 Llama Guard 保护大模型对话 （八）

LlaMA 3 系列博客