[对比学习LangChain和MAF-04]针对消息的设计

基于对话的Chat Agent是目前最主流的Agent类型，它采用的基于角色的消息是一种结构化对话机制，它通过将对话内容划分为不同的预设身份（Roles）来引导模型理解其职责和当前上下文。这种机制主要由三类核心角色组成：

System ：用于设定模型的人格、行为准则和专业领域；定义AI是谁、该怎么说话、不能做什么。示例："你是一位资深的Python程序员，请用幽默简洁的风格回答问题，不要使用过多的技术术语。"它具有最高优先级，能从宏观上约束模型的输出风格，确保模型不会在长对话中出戏；
User：代表真实用户提出的问题、指令或输入的信息。提供具体的任务请求，比如"帮我写一段冒泡排序的代码"。这种类型的效用用于触发模型的响应机制，是互动的核心驱动力；
Assistant ：代表AI模型之前生成的响应内容。它会记录对话历史，维持上下文连贯性。在多轮对话中，开发者会将模型之前的回答标记为assistant重新传给模型。这样模型才知道自己刚刚说过什么，从而实现记忆功能；
Tool：代表模型调用工具的输入和输出。它用于实现模型与外部工具（如API、数据库、文件系统等）的交互。通过将工具调用的输入输出封装成Tool消息，模型能够理解何时需要调用工具以及如何处理工具返回的结果；

这种基于多角色对话机制用以解决如下的问题：

上下文管理：通过区分谁说了什么，模型能准确区分用户的要求 和自己的回答，避免产生逻辑混乱；
人设稳定性：在角色扮演场景中，系统角色可以持续强化角色的背景故事和性格，提供沉浸感；
安全性与约束：开发者可以在系统消息中设置安全边界，防止模型产生违规内容；
少样本学习 (Few-shot)：开发者常利用user和assistant，通过示例对话来引导模型学习特定的任务格式或风格；

LangChain和MAF针对基于角色的消息设计了一个完整的消息体系，但是它们的设计思路和实现方式却有所不同。

1. LangChain

LangChain的消息类型直接或者间接地继承自如下这个BaseMessage基类，这是一个派生自Serializable的可序列化的类型。对于基类的BaseMessage来说，表示消息内容的content字段可以是单纯的字符串或字典（Key为字符串）的列表，其他于消息相关信息存储在additional_kwargs字段对应的字典中，比如LLM返回的AIMessage可以利用它来保存涉及的工具调用。response_metadata字段用于存储响应的元数据，比如响应的Header、LLM的名称，涉及的Token消费数据等。

python 复制代码

class BaseMessage(Serializable):
    content: str | list[str | dict]
    additional_kwargs: dict = Field(default_factory=dict)
    response_metadata: dict = Field(default_factory=dict)
    type: str
    name: str | None = None
    id: str | None = Field(default=None, coerce_numbers_to_str=True)

    @property
    def content_blocks(self) -> list[types.ContentBlock]

class ChatMessage(BaseMessage):
    role: str
    type: Literal["chat"] = "chat"

消息承载的内容多种多样，可以是简单的文本，还可以是多媒体图片、音频和视频，还可以是一个任意的二进制文件，不同类型的内容具有不同的处理方式，所以LangChain利用ConentBlock这个类型实现了内容的标准化 。类似于HTTP的MIME类型，每个ConentBlock对象都关联一个标准的类型名称（很多采用的就是MIME类型）。BaseMessage的content_blocks属性实现了原始形态的内容到ConentBlock列表的转换。我们可以指定绑定的角色创建一个ChatMessage作为Chat模型的消息，也可以使用如下这些已经绑定好角色的消息类型。

1.1 四种预设消息类型

1.1.1 SystemMessage

在LangChain以及底层的 Chat API 架构中，系统消息是用于定义模型人格与运行规则 的核心组件。它告诉 AI "你是谁"（例如资深Python开发者、苏格拉底式的导师、或是一只可爱的猫娘）。规定模型不能做什么 （例如严禁提及竞争对手、不准输出代码、只能用JSON格式回答）。它通常位于消息列表的最顶端，作为整个对话的宪法，其权重通常高于普通的消息。

python 复制代码

class SystemMessage(BaseMessage):
    type: Literal["system"] = "system"

1.1.2 HumanMessage

HumanMessage代表了对话的需求侧，即真实用户发送给模型的消息。它是用户意图的直接表达，包含了模型需要完成的具体任务或提出的疑问。

python 复制代码

class HumanMessage(BaseMessage):
    type: Literal["human"] = "human"

1.1.3 AIMessage

AIMessage代表模型生成的响应。它是对话闭环的关键，承载了AI的回答、推理逻辑及工具调用指令。它们是模型在接收到SystemMessage和HumanMessage后产生的输出。如果涉及针对工具的调用，描述每个工具调用的ToolCall会出现在tool_calls字段返回的列表中，另一个invalid_tool_calls字段返回于工具调用相关的错误。ToolCall是一个类型化字典，定义了调用的工具名称、传入的参数和当前工具调用的唯一标识。

python 复制代码

class AIMessage(BaseMessage):
    tool_calls: list[ToolCall] = Field(default_factory=list)
    invalid_tool_calls: list[InvalidToolCall] = Field(default_factory=list)
    usage_metadata: UsageMetadata | None = None
    type: Literal["ai"] = "ai"

    @property
    def content_blocks(self) -> list[types.ContentBlock]

class ToolCall(TypedDict):
    name: str
    args: dict[str, Any]
    id: str | None
    type: NotRequired[Literal["tool_call"]]

1.1.4 ToolMessage

ToolMessage属于对话的执行层 ，用于向模型反馈外部工具执行的结果。当Agent接收到LLM发出的带有tool_calls的AIMessage后，它会执行对应的工具，并将结果包装在ToolMessage中反馈给LLM。它的tool_call_id和status字段分别标识工具调用的标识和状态，前者用于关联AIMessage中的某个具体的ToolCall。

python 复制代码

class ToolMessage(BaseMessage, ToolOutputMixin):
    tool_call_id: str
    type: Literal["tool"] = "tool"
    artifact: Any = None
    status: Literal["success", "error"] = "success"

1.2 消息内容块

BaseMessage的content_blocks字段承载消息主体内容，它返回一个ContentBlock对象的列表。表示内容块的ContentBlock被定义成如下所示的Union类型,接下来我们看看具体的类型承载了什么样的内容：

python 复制代码

ContentBlock = (
    TextContentBlock
    | InvalidToolCall
    | ReasoningContentBlock
    | NonStandardContentBlock
    | DataContentBlock
    | ToolContentBlock
)

1.2.1 TextContentBlock

TextContentBlock块是最常见的内容块类型，它承载了文本内容以及与文本相关的元信息。它的text字段存储了文本内容，annotations字段存储了文本的注释信息（如加粗、斜体、链接等），index字段用于表示文本在原始消息中的位置，extras字段可以存储一些额外的信息，比如文本的语言、情感倾向等。

python 复制代码

class TextContentBlock(TypedDict):
    type: Literal["text"]
    id: NotRequired[str]
    text: str
    annotations: NotRequired[list[Annotation]]
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

1.2.2 InvalidToolCall

当LLM返回的AIMessage中包含工具调用时，如果工具调用的格式不正确或者参数有误，就会被记录在invalid_tool_calls字段中。InvalidToolCall类型化字典定义了无效工具调用的相关信息，包括工具名称、传入的参数、错误信息以及当前工具调用的唯一标识等。

python 复制代码

class InvalidToolCall(TypedDict):
    type: Literal["invalid_tool_call"]
    id: str | None
    name: str | None
    args: str | None
    error: str | None
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

1.2.3 ReasoningContentBlock

ReasoningContentBlock块用于承载LLM的推理过程中的一些中间结果或者推理步骤的描述信息。它的reasoning字段可以存储LLM在推理过程中生成的一些解释性文本或者推理步骤的描述，index字段用于表示这个推理内容块在原始消息中的位置，extras字段可以存储一些额外的信息，比如推理的类型、相关的输入输出等。

python 复制代码

class ReasoningContentBlock(TypedDict):
    type: Literal["reasoning"]
    id: NotRequired[str]
    reasoning: NotRequired[str]    
    index: NotRequired[int | str]
    extras: NotRequired[dict[str, Any]]

1.2.4 DataContentBlock

DataContentBlock块用于承载一些非文本类型的内容，比如图片、视频、音频或者文件等。它是一个Union类型，可以表示不同类型的非文本内容块。可以是ImageContentBlock、VideoContentBlock、AudioContentBlock、PlainTextContentBlock或者FileContentBlock中的任意一种，每种类型都对应着不同的内容格式和相关的元信息。

python 复制代码

DataContentBlock = (
    ImageContentBlock
    | VideoContentBlock
    | AudioContentBlock
    | PlainTextContentBlock
    | FileContentBlock
)

1.2.5 ToolContentBlock

ToolContentBlock块用于承载与工具调用相关的内容。它是一个Union类型，可以表示不同类型的工具内容块。可以是ToolCall、ToolCallChunk、ServerToolCall、ServerToolCallChunk或者ServerToolResult中的任意一种，每种类型都对应着不同的工具调用格式和相关的元信息。

python 复制代码

ToolContentBlock = (
    ToolCall | ToolCallChunk | ServerToolCall | ServerToolCallChunk | ServerToolResult
)

对于LangChain中的各种消息和内容块类型，在我们的如下两篇文章中具有详细的介绍：

2. MAF

消息在MAF通过如下这个ChatMessage类来表示。

csharp 复制代码

public class ChatMessage
{
    public string? AuthorName{ get; set; }
    public DateTimeOffset? CreatedAt { get; set; }
    public ChatRole Role { get; set; }    
    public string Text => Contents.ConcatText();
    public IList<AIContent> Contents{ get; set; }
    public string? MessageId { get; set; }    
    public object? RawRepresentation { get; set; }
    public AdditionalPropertiesDictionary? AdditionalProperties { get; set; }
}

ChatMessage的属性成员说明如下：

AuthorName：消息发送者的名称，可以是用户、模型或者工具等；
CreatedAt：消息创建的时间戳；
Role：消息发送者的角色，通常是一个枚举类型，如用户、模型、工具等；
Text：消息的文本内容，实际上是对Contents中所有内容的文本进行拼接后的结果；
Contents：消息的内容列表，每个内容都是一个AIContent对象，AIContent是一个抽象类，代表了消息内容的基类，可以有不同类型的内容，如文本、图片、文件等；
MessageId：消息的唯一标识符，可以用于消息的追踪和管理；
RawRepresentation：消息的原始表示，可以用于存储一些与消息相关的原始数据，如模型返回的原始响应等；
AdditionalProperties：一个字典，用于存储一些额外的属性信息，可以在Agent的运行过程中使用这些属性来进行一些自定义的逻辑处理；

出于可扩展的考虑，Role并不是一个简单的枚举类型，而是一个具有更丰富功能的结构体。它预定义了System、Assistant、User和Tool四个角色，并且允许用户自定义角色。ChatMessage中的Role属性就是利用这个ChatRole结构体来表示消息发送者的角色。

csharp 复制代码

public readonly struct ChatRole : IEquatable<ChatRole>
{
    public static ChatRole System { get; } = new ChatRole("system");
    public static ChatRole Assistant { get; } = new ChatRole("assistant");
    public static ChatRole User { get; } = new ChatRole("user");
    public static ChatRole Tool { get; } = new ChatRole("tool");

    public string Value { get; }
}

2.1 AIContent

在LangChain中，消息内容被抽象成了ContentBlock，而在MAF中，与之对应的类型就是如下所示的AIContent。AIContent是MAF框架中定义一切交互内容的原子基类。它采用高度多态的设计，将AI与用户之间的对话拆解为多种专业化的内容块。在传统的AI开发中，消息通常只有Text。而AIContent将对话模型化为一个多模态、多状态的流。

csharp 复制代码

public class AIContent
{
    public IList<AIAnnotation>? Annotations { get; set; }
    [JsonIgnore]
    public object? RawRepresentation { get; set; }
    public AdditionalPropertiesDictionary? AdditionalProperties { get; set; }
}

AIContent的Annotations返回一个AIAnnotation的列表，AIAnnotation是一个注解类，用于为内容提供一些额外的信息或者标记。AIAnnotation也是一个多态类型，通过JsonDerivedTypeAttribute声明了不同的子类型，如CitationAnnotation等。

csharp 复制代码

[JsonPolymorphic(TypeDiscriminatorPropertyName = "$type")]
[JsonDerivedType(typeof(CitationAnnotation), typeDiscriminator: "citation")]
public class AIAnnotation
{
    public IList<AnnotatedRegion>? AnnotatedRegions { get; set; }
    [JsonIgnore]
    public object? RawRepresentation { get; set; }
    public AdditionalPropertiesDictionary? AdditionalProperties { get; set; }
}

public class CitationAnnotation : AIAnnotation
{
    public string? Title { get; set; }
    public Uri? Url { get; set; }
    public string? FileId { get; set; }
    public string? ToolName { get; set; }
    public string? Snippet { get; set; }
}

CitationAnnotation是AIAnnotation的一个子类，用于表示引用注解，它具有如下的属性成员：

Title：引用的标题，可以是文章标题、网页标题等；
Url：引用的URL地址，可以是文章链接、网页链接等；
FileId：引用的文件ID，如果引用的是一个文件，可以通过这个ID来获取文件的相关信息；
ToolName：引用的工具名称，如果引用的是一个工具，可以通过这个名称来获取工具的相关信息；
Snippet：引用的内容片段，可以是引用内容的摘要、引用内容的一部分等；

AnnotatedRegion是AIAnnotation中的一个属性成员，它代表了被注解的内容区域，可以是文本区域、图像区域等。AnnotatedRegion同样是一个多态类型，通过JsonDerivedTypeAttribute声明了不同的子类型，如TextSpanAnnotatedRegion等。

csharp 复制代码

[JsonPolymorphic(TypeDiscriminatorPropertyName = "$type")]
[JsonDerivedType(typeof(TextSpanAnnotatedRegion), "textSpan")]
public class AnnotatedRegion
{
}

public sealed class TextSpanAnnotatedRegion : AnnotatedRegion
{
    [JsonPropertyName("start")]
    public int? StartIndex { get; set; }

    [JsonPropertyName("end")]
    public int? EndIndex { get; set; }
}

2.2 各种类型的AIContent

MAF定义了多种类型的AIContent，每种类型的AIContent都代表了不同的内容格式和相关的元信息。这些类型体现在标注在AIContent上的19个JsonDerivedTypeAttribute特性。

csharp 复制代码

[JsonPolymorphic(TypeDiscriminatorPropertyName = "$type")]
[JsonDerivedType(typeof(DataContent), typeDiscriminator: "data")]
[JsonDerivedType(typeof(ErrorContent), typeDiscriminator: "error")]
[JsonDerivedType(typeof(FunctionCallContent), typeDiscriminator: "functionCall")]
[JsonDerivedType(typeof(FunctionResultContent), typeDiscriminator: "functionResult")]
[JsonDerivedType(typeof(HostedFileContent), typeDiscriminator: "hostedFile")]
[JsonDerivedType(typeof(HostedVectorStoreContent), typeDiscriminator: "hostedVectorStore")]
[JsonDerivedType(typeof(TextContent), typeDiscriminator: "text")]
[JsonDerivedType(typeof(TextReasoningContent), typeDiscriminator: "reasoning")]
[JsonDerivedType(typeof(UriContent), typeDiscriminator: "uri")]
[JsonDerivedType(typeof(UsageContent), typeDiscriminator: "usage")]
[JsonDerivedType(typeof(ToolCallContent), typeDiscriminator: "toolCall")]
[JsonDerivedType(typeof(ToolResultContent), typeDiscriminator: "toolResult")]
[JsonDerivedType(typeof(InputRequestContent), typeDiscriminator: "inputRequest")]
[JsonDerivedType(typeof(InputResponseContent), typeDiscriminator: "inputResponse")]
[JsonDerivedType(typeof(ToolApprovalRequestContent), typeDiscriminator: "toolApprovalRequest")]
[JsonDerivedType(typeof(ToolApprovalResponseContent), typeDiscriminator: "toolApprovalResponse")]
[JsonDerivedType(typeof(McpServerToolCallContent), typeDiscriminator: "mcpServerToolCall")]
[JsonDerivedType(typeof(McpServerToolResultContent), typeDiscriminator: "mcpServerToolResult")]
public class AIContent
{}

2.2.1 TextContent

TextContent是AIContent的一个子类，用于表示文本内容，它对应着LangChain中TextContentBlock。它具有一个Text属性，用于存储文本内容。

csharp 复制代码

public sealed class TextContent : AIContent
{
	public string Text{get;set;}
	public TextContent(string? text);
}

2.2.2 TextReasoningContent

TextReasoningContent用于表示推理内容，它对应着LangChain中ReasoningContentBlock。它具有一个Text属性，用于存储推理内容，以及一个可选的ProtectedData属性，用于存储一些需要保护的数据，比如敏感信息、隐私数据等。

csharp 复制代码

public sealed class TextReasoningContent : AIContent
{
	public string Text{get;set;}
	public string? ProtectedData { get; set; }

	public TextReasoningContent(string? text);
}

2.2.3 DataContent

DataContent用于承载多媒体数据内容，它对应着LangChain中DataContentBlock，其属性定义如下：

csharp 复制代码

public class DataContent : AIContent
{
	public string Uri { get; }
	public string MediaType { get; }
	public string? Name { get; set; }
	public ReadOnlyMemory<byte> Data{ get; }
	public ReadOnlyMemory<char> Base64Data{ get; }
	public DataContent(Uri uri, string? mediaType = null);
}

属性说明如下：

Uri：数据内容的URI地址，可以是一个文件路径、一个网络链接等；
MediaType：数据内容的媒体类型，通常是一个MIME类型，如image/png、application/pdf等,默认为application/octet-stream；
Name：数据内容的名称，可以是文件名、图片名等；
Data：数据内容的二进制数据，以字节数组的形式存储；
Base64Data：数据内容的Base64编码字符串，以字符数组的形式存储；

2.2.4 ToolCallContent

ToolCallContent用于表示工具调用内容，工具在MAF中通过基类AITool表示，它具有一系列的子类,比较典型是AIFunction。派生于ToolCallContent的FunctionCallContent承载的正式针对AIFunction的工具调用，我们可以认为它与LangChain的TollCall对标。

csharp 复制代码

public class ToolCallContent : AIContent
{
	public string CallId { get; }
	public ToolCallContent(string callId);
}

public class FunctionCallContent : ToolCallContent
{
	public string Name { get; }
	public IDictionary<string, object?>? Arguments { get; set; }
	public Exception? Exception { get; set; }
	public bool InformationalOnly { get; set; }
}

属性说明如下：

CallId：工具调用的唯一标识符，用于关联工具调用的输入和输出；
Name：函数调用的名称，表示要调用的函数的名称；
Arguments：函数调用的参数，以字典的形式存储，键为参数名称，值为参数值；
Exception：函数调用过程中发生的异常，如果调用过程中出现了错误，可以将异常信息存储在这个属性中；
InformationalOnly：一个布尔值，表示这个函数调用是否仅用于提供信息，还是需要实际执行。如果为true，表示这个函数调用只是为了提供一些信息给模型，而不需要真正执行函数的逻辑；

虽然LangChain的工具由两种实现（Tool和StructuredTool），但是ToolCall并不区分这两种实现。MAF则不同，基本上每个具体的AITool类型都具有各自的ToolCallContent类型：

CodeInterpreterToolCallContent：用于表示代码解释器工具的调用内容，包含了代码解释器工具调用的相关信息，如要执行的代码、编程语言等；
ImageGenerationToolCallContent：用于表示图像生成工具的调用内容，包含了图像生成工具调用的相关信息，如要生成的图像的描述、图像的尺寸等；
McpServerToolCallContent：用于表示MCP服务器工具的调用内容，包含了MCP服务器工具调用的相关信息，如要调用的MCP服务器工具的名称、传入的参数等；
WebSearchToolCallContent：用于表示网络搜索工具的调用内容，包含了网络搜索工具调用的相关信息，如要搜索的关键词、搜索引擎等；

2.2.5 ToolResultContent

ToolCallContent表示由LLM生成的针对指定工具的调用意图，Agent根据此对象调用对应的工具，并将执行结果封装成ToolResultContent对象反馈给LLM。继承自的ToolResultContent的FunctionResultContent类型对应的正是LangChain中ToolConentBlock，ToolConentBlock作为ToolMessage的主体内容。FunctionResultContent的Result和Exception属性分别用于存储工具调用的结果和调用过程中发生的异常信息，如果工具调用成功，Result属性将包含工具执行的结果；如果工具调用失败，Exception属性将包含相关的异常信息。

csharp 复制代码

public class ToolResultContent : AIContent
{
	public string CallId { get; }
	public ToolResultContent(string callId); 
}

public class FunctionResultContent : ToolResultContent
{
	public object? Result { get; set; }
	public Exception? Exception { get; set; }
}

对于上面介绍的四个ToolCallContent类型（FunctionCallContent、ImageGenerationToolCallContent、McpServerToolCallContent和WebSearchToolCallContent），它们都有对应的ToolResultContent类型，分别用于表示对应工具调用的结果内容，如FunctionResultContent、ImageGenerationToolResultContent、McpServerToolResultContent和WebSearchToolResultContent等。

2.2.6 其他

除了上面介绍的这些AIContent类型之外，MAF中还有一些其他类型的AIContent：

ErrorContent：用于表示错误内容；
HostedFileContent：用于表示托管文件内容；
HostedVectorStoreContent：用于表示托管向量存储内容；
UriContent：用于表示URI内容；
UsageContent：用于表示Token消费统计内容；
InputRequestContent：LLM在执行过程中如果需要从用户那里获取一些输入信息，可以通过InputRequestContent来向用户发出输入请求。ToolApprovalRequestContent是它的子类，当LLM生成需要人工审批的工具调用时，会反向请求用户进行审批，此对象用于描述这样的审批请求;
InputResponseContent：针对InputRequestContent的响应内容，用于表示用户提供的输入信息。ToolApprovalResponseContent是它的子类，描述针对ToolApprovalRequestContent的响应内容;