06_LangChain多模态输入与自定义输出

多模态数据输入

在构建现代AI应用时，处理多种类型的数据输入（如文本、图像、音频等）变得越来越重要。LangChain提供了强大的多模态输入支持，使开发者能够轻松地将不同类型的数据传递给模型。

图像输入示例

LangChain期望所有输入都以与OpenAI期望的格式相同的格式传递。对于支持多模态输入的其他模型提供者，LangChain在类中添加了逻辑以转换为预期格式。

以下是如何将图像作为输入传递给模型的示例：

python 复制代码

import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import base64
from PIL import Image
from io import BytesIO

# 设置OpenAI API密钥
os.environ["OPENAI_API_KEY"] = "你的OpenAI API密钥"

# 初始化支持视觉的模型
model = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=1024)

# 方法1：使用字节字符串传入图像（最常用的方式）
def get_image_bytes(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# 假设有一个本地图像文件
image_path = "path/to/your/image.jpg"
image_bytes = get_image_bytes(image_path)

# 创建包含图像的消息
message_with_image = HumanMessage(
    content=[
        {"type": "text", "text": "这张图片是什么内容？请详细描述。"},
        {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{image_bytes}"
            }
        }
    ]
)

# 发送消息到模型
response = model.invoke([message_with_image])
print(response.content)

使用图像URL

如果图像已经在网络上，我们可以直接提供URL：

python 复制代码

# 方法2：直接使用图像URL
image_url = "https://example.com/image.jpg"

message_with_image_url = HumanMessage(
    content=[
        {"type": "text", "text": "这张图片展示了什么？"},
        {
            "type": "image_url",
            "image_url": {
                "url": image_url
            }
        }
    ]
)

response = model.invoke([message_with_image_url])
print(response.content)

传入多张图像

LangChain还支持在一个消息中传入多张图像：

python 复制代码

# 方法3：传入多张图像
image_path1 = "path/to/your/image1.jpg"
image_path2 = "path/to/your/image2.jpg"
image_bytes1 = get_image_bytes(image_path1)
image_bytes2 = get_image_bytes(image_path2)

message_with_multiple_images = HumanMessage(
    content=[
        {"type": "text", "text": "比较这两张图片，它们有什么不同？"},
        {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{image_bytes1}"
            }
        },
        {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{image_bytes2}"
            }
        }
    ]
)

response = model.invoke([message_with_multiple_images])
print(response.content)

多模态工具调用

一些多模态模型也支持工具调用功能。要使用此类模型调用工具，只需以通常的方式将工具绑定到它们，然后使用所需类型的内容块（例如，包含图像数据）调用模型。

python 复制代码

from langchain_core.tools import tool
from langchain_core.messages import AIMessage

@tool
def image_analyzer(image_description: str) -> str:
    """分析图像内容并返回详细信息"""
    return f"分析结果：{image_description}包含了以下元素..."

# 绑定工具到模型
model_with_tools = model.bind_tools([image_analyzer])

# 创建包含图像的消息
message = HumanMessage(
    content=[
        {"type": "text", "text": "分析这张图片并告诉我里面有什么"},
        {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{image_bytes}"
            }
        }
    ]
)

# 调用模型
response = model_with_tools.invoke([message])
print(response.content)

# 如果模型决定调用工具，可以处理工具调用
if hasattr(response, "tool_calls") and response.tool_calls:
    print("工具被调用：", response.tool_calls)

自定义输出：JSON、XML、YAML

大型语言模型通常以自然语言文本形式返回响应，但在许多应用场景中，我们需要结构化的数据输出。LangChain提供了多种输出解析器，可以帮助将模型输出解析为结构化格式。

JSON输出解析

虽然一些模型提供商支持内置的方法返回结构化输出，但并非所有都支持。我们可以使用JsonOutputParser来帮助用户通过提示指定任意的JSON模式，查询符合该模式的模型输出，最后将该模式解析为JSON。

python 复制代码

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List

# 定义预期的输出结构
class Movie(BaseModel):
    title: str = Field(description="电影标题")
    director: str = Field(description="导演姓名")
    year: int = Field(description="上映年份")
    rating: float = Field(description="评分（1-10）")
    genres: List[str] = Field(description="电影类型列表")

# 创建JSON输出解析器
parser = JsonOutputParser(pydantic_object=Movie)

# 创建提示模板
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个电影信息专家，请提供以下电影的详细信息。"),
    ("system", "请按照以下JSON格式返回信息：\n{format_instructions}"),
    ("human", "{movie_name}")
])

# 将解析器的格式说明添加到提示中
prompt_with_parser = prompt.partial(
    format_instructions=parser.get_format_instructions()
)

# 创建链
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = prompt_with_parser | model | parser

# 调用链
result = chain.invoke({"movie_name": "黑客帝国"})
print(f"电影标题: {result.title}")
print(f"导演: {result.director}")
print(f"上映年份: {result.year}")
print(f"评分: {result.rating}")
print(f"类型: {', '.join(result.genres)}")

JSON流式处理

JsonOutputParser支持流式处理部分块，这在处理大型响应时特别有用：

python 复制代码

from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from typing import Dict, List

# 不使用Pydantic模型的JSON解析器
parser = JsonOutputParser()

prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个电影推荐专家。请生成一个电影列表，包含标题、导演和简短描述。"),
    ("system", "请以JSON格式返回，格式为电影列表，每部电影包含title、director和description字段。\n{format_instructions}"),
    ("human", "给我推荐5部{genre}电影")
])

# 将解析器的格式说明添加到提示中
prompt_with_parser = prompt.partial(
    format_instructions=parser.get_format_instructions()
)

# 创建链
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
chain = prompt_with_parser | model | parser

# 流式处理
for chunk in chain.stream({"genre": "科幻"}):
    # 在实际应用中，可以逐步更新UI显示
    print("收到新的块:", chunk)

XML输出解析

对于某些应用场景，XML格式可能更合适。LangChain提供了XMLOutputParser来处理XML格式的输出：

python 复制代码

from langchain_core.output_parsers import XMLOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# 创建XML输出解析器
parser = XMLOutputParser()

# 创建提示模板
prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个书籍信息专家，请提供以下书籍的详细信息。"),
    ("system", "请按照XML格式返回信息。{format_instructions}"),
    ("human", "{book_title}")
])

# 将解析器的格式说明添加到提示中
prompt_with_parser = prompt.partial(
    format_instructions=parser.get_format_instructions()
)

# 创建链
model = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = prompt_with_parser | model | parser

# 调用链
result = chain.invoke({"book_title": "三体"})
print(result)

自定义XML标签

我们可以通过指定标签来定制XML输出：

python 复制代码

# 创建带有自定义标签的提示模板
custom_prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个书籍信息专家，请提供以下书籍的详细信息。"),
    ("system", """请按照以下XML格式返回信息：
    <book>
        <title>书名</title>
        <author>作者</author>
        <year>出版年份</year>
        <summary>简短摘要</summary>
        <rating>评分（1-10）</rating>
    </book>
    """),
    ("human", "{book_title}")
])

# 创建链
custom_chain = custom_prompt | model | parser

# 调用链
custom_result = custom_chain.invoke({"book_title": "三体"})
print(custom_result)

XML流式处理

XMLOutputParser也支持流式处理：

python 复制代码

# 流式处理XML输出
for chunk in custom_chain.stream({"book_title": "三体"}):
    print("收到新的XML块:", chunk)

YAML输出解析

有些模型在生成YAML格式输出方面可能比JSON更可靠。LangChain提供了YamlOutputParser来处理YAML格式的输出：

python 复制代码

from langchain_core.output_parsers import YamlOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List

# 定义预期的输出结构
class Recipe(BaseModel):
    name: str = Field(description="菜谱名称")
    ingredients: List[str] = Field(description="所需食材列表")
    steps: List[str] = Field(description="烹饪步骤")
    prep_time: str = Field(description="准备时间")
    cook_time: str = Field(description="烹饪时间")
    servings: int = Field(description="可供人数")

# 创建YAML输出解析器
yaml_parser = YamlOutputParser(pydantic_object=Recipe)

# 创建提示模板
yaml_prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个专业的厨师，请提供以下菜品的详细食谱。"),
    ("system", "请按照以下YAML格式返回信息：\n{format_instructions}"),
    ("human", "{dish_name}")
])

# 将解析器的格式说明添加到提示中
yaml_prompt_with_parser = yaml_prompt.partial(
    format_instructions=yaml_parser.get_format_instructions()
)

# 创建链
yaml_chain = yaml_prompt_with_parser | model | yaml_parser

# 调用链
recipe = yaml_chain.invoke({"dish_name": "宫保鸡丁"})
print(f"菜名: {recipe.name}")
print(f"食材: {', '.join(recipe.ingredients)}")
print(f"准备时间: {recipe.prep_time}")
print(f"烹饪时间: {recipe.cook_time}")
print(f"可供人数: {recipe.servings}")
print("烹饪步骤:")
for i, step in enumerate(recipe.steps, 1):
    print(f"{i}. {step}")

完整示例：多模态输入与结构化输出

下面是一个结合多模态输入和结构化输出的完整示例，该示例接收一张图像作为输入，并返回JSON格式的图像分析结果：

python 复制代码

import os
import base64
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import List

# 设置OpenAI API密钥
os.environ["OPENAI_API_KEY"] = "你的OpenAI API密钥"

# 定义预期的输出结构
class ImageAnalysis(BaseModel):
    main_subject: str = Field(description="图像的主要主题")
    objects: List[str] = Field(description="图像中识别到的物体列表")
    colors: List[str] = Field(description="图像中的主要颜色")
    scene_type: str = Field(description="场景类型（室内/室外等）")
    mood: str = Field(description="图像传达的情绪或氛围")
    description: str = Field(description="图像的简短描述")

# 创建JSON输出解析器
parser = JsonOutputParser(pydantic_object=ImageAnalysis)

# 初始化支持视觉的模型
model = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=1024)

# 读取图像文件
def get_image_bytes(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# 假设有一个本地图像文件
image_path = "path/to/your/image.jpg"
image_bytes = get_image_bytes(image_path)

# 创建包含图像的消息和格式指令
message_with_image = HumanMessage(
    content=[
        {
            "type": "text", 
            "text": f"""分析这张图片并提供详细信息。
            请按照以下JSON格式返回结果：
            {parser.get_format_instructions()}
            """
        },
        {
            "type": "image_url",
            "image_url": {
                "url": f"data:image/jpeg;base64,{image_bytes}"
            }
        }
    ]
)

# 发送消息到模型并解析结果
response = model.invoke([message_with_image])
print("原始响应:", response.content)

# 尝试解析JSON
try:
    analysis = parser.parse(response.content)
    print("\n解析后的结构化数据:")
    print(f"主题: {analysis.main_subject}")
    print(f"物体: {', '.join(analysis.objects)}")
    print(f"颜色: {', '.join(analysis.colors)}")
    print(f"场景类型: {analysis.scene_type}")
    print(f"情绪: {analysis.mood}")
    print(f"描述: {analysis.description}")
except Exception as e:
    print(f"解析错误: {e}")
    print("请检查模型输出是否符合预期的JSON格式")

实际应用场景：电子商务产品分析系统

以下是一个实际应用场景的示例，展示如何构建一个电子商务产品分析系统，该系统可以接收产品图片，分析产品特征，并以结构化格式输出结果，便于集成到现有的电子商务平台。

系统架构

前端界面：用户上传产品图片
后端服务：处理图片并调用LangChain多模态链
数据库：存储分析结果
API接口：提供分析结果给其他系统

代码实现

python 复制代码

import os
import base64
import json
from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.responses import JSONResponse
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field
from typing import List, Optional
import sqlite3
from datetime import datetime
import uvicorn

# 定义产品分析结果模型
class ProductFeatures(BaseModel):
    product_type: str = Field(description="产品类型，如'衣服'、'电子产品'、'家具'等")
    brand: Optional[str] = Field(description="品牌名称（如果可见）")
    colors: List[str] = Field(description="产品的主要颜色")
    materials: List[str] = Field(description="产品材质，如'棉'、'皮革'、'塑料'等")
    features: List[str] = Field(description="产品特点列表")
    target_audience: str = Field(description="目标受众，如'年轻人'、'专业人士'、'儿童'等")
    style: str = Field(description="产品风格，如'现代'、'复古'、'简约'等")
    quality_impression: str = Field(description="质量印象，从1-10评分")
    suggested_price_range: str = Field(description="建议价格范围")
    description: str = Field(description="产品的简短营销描述，100字以内")
    seo_keywords: List[str] = Field(description="建议的SEO关键词，最多5个")

# 创建FastAPI应用
app = FastAPI(title="电子商务产品分析API")

# 初始化数据库
def init_db():
    conn = sqlite3.connect('product_analysis.db')
    cursor = conn.cursor()
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS analyses (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        product_id TEXT,
        analysis_data TEXT,
        created_at TIMESTAMP
    )
    ''')
    conn.commit()
    conn.close()

# 保存分析结果到数据库
def save_analysis(product_id, analysis_data):
    conn = sqlite3.connect('product_analysis.db')
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO analyses (product_id, analysis_data, created_at) VALUES (?, ?, ?)",
        (product_id, json.dumps(analysis_data), datetime.now().isoformat())
    )
    conn.commit()
    analysis_id = cursor.lastrowid
    conn.close()
    return analysis_id

# 分析产品图片
async def analyze_product_image(image_bytes, product_id):
    try:
        # 设置OpenAI API密钥
        os.environ["OPENAI_API_KEY"] = "你的OpenAI API密钥"
        
        # 创建JSON输出解析器
        parser = JsonOutputParser(pydantic_object=ProductFeatures)
        
        # 初始化支持视觉的模型
        model = ChatOpenAI(model="gpt-4-vision-preview", max_tokens=1024)
        
        # 编码图像
        encoded_image = base64.b64encode(image_bytes).decode('utf-8')
        
        # 创建包含图像的消息和格式指令
        message_with_image = HumanMessage(
            content=[
                {
                    "type": "text", 
                    "text": f"""作为电子商务产品分析专家，请分析这张产品图片并提供详细信息。
                    分析应包括产品类型、材质、颜色、风格、目标受众等信息。
                    请按照以下JSON格式返回结果：
                    {parser.get_format_instructions()}
                    """
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                }
            ]
        )
        
        # 发送消息到模型
        response = model.invoke([message_with_image])
        
        # 解析JSON
        analysis = parser.parse(response.content)
        
        # 保存到数据库
        analysis_id = save_analysis(product_id, analysis.dict())
        
        # 返回结果
        return {
            "analysis_id": analysis_id,
            "product_id": product_id,
            "analysis": analysis.dict()
        }
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"分析失败: {str(e)}")

# API路由
@app.post("/analyze-product")
async def analyze_product(product_id: str, file: UploadFile = File(...)):
    if not file.content_type.startswith("image/"):
        raise HTTPException(status_code=400, detail="只接受图片文件")
    
    image_bytes = await file.read()
    result = await analyze_product_image(image_bytes, product_id)
    return JSONResponse(content=result)

@app.get("/analysis/{analysis_id}")
async def get_analysis(analysis_id: int):
    conn = sqlite3.connect('product_analysis.db')
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM analyses WHERE id = ?", (analysis_id,))
    result = cursor.fetchone()
    conn.close()
    
    if not result:
        raise HTTPException(status_code=404, detail="分析结果未找到")
    
    return {
        "analysis_id": result[0],
        "product_id": result[1],
        "analysis": json.loads(result[2]),
        "created_at": result[3]
    }

# 启动应用
@app.on_event("startup")
async def startup_event():
    init_db()

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

使用方法

安装依赖：

bash 复制代码

pip install fastapi uvicorn python-multipart langchain-openai pydantic

运行服务器：

bash 复制代码

python product_analysis_service.py

发送请求：

bash 复制代码

curl -X POST -F "file=@product_image.jpg" -F "product_id=12345" http://localhost:8000/analyze-product

系统优势

自动化产品描述生成：减少手动编写产品描述的时间
一致的产品分类：基于AI分析的标准化分类
SEO优化：自动生成相关关键词
多语言支持：可以轻松扩展到生成多语言产品描述
可扩展性：可以集成到现有的电子商务平台

实际应用

这个系统可以应用于以下场景：

批量产品上传：当商家需要上传大量产品时，自动生成产品描述和特征
产品分类优化：自动为产品分配最合适的类别和标签
营销内容生成：为产品生成吸引人的营销描述
竞品分析：分析竞争对手的产品特征和定位
用户体验改进：基于产品特征推荐相似或互补产品

通过这个系统，电子商务平台可以显著提高产品上传效率，改善产品数据质量，并提供更好的搜索和推荐体验。

结论

LangChain的多模态输入和自定义输出功能为开发者提供了强大的工具，使我们能够构建更加复杂和实用的AI应用。通过多模态输入，我们可以将图像等非文本数据传递给模型；通过自定义输出解析器，我们可以获取结构化的响应数据，便于后续处理和集成。

这些功能特别适用于以下场景：

图像分析和描述应用
多媒体内容管理系统
需要结构化数据的API和服务
数据提取和转换工具
自动化报告生成系统

随着大型语言模型能力的不断提升，结合LangChain提供的工具，我们可以构建出更加智能、实用的AI应用，满足各种复杂的业务需求。

章节总结

在本章中，我们深入探讨了LangChain的多模态输入和自定义输出功能，这些功能极大地扩展了AI应用的可能性。主要内容包括：

多模态输入：
- 如何将图像作为输入传递给模型
- 使用本地图像和图像URL的不同方法
- 在一个请求中传入多张图像
- 结合多模态输入和工具调用
自定义输出格式：
- JSON输出解析与流式处理
- XML输出解析与自定义标签
- YAML输出解析
- 使用Pydantic模型定义输出结构
实际应用场景：
- 构建电子商务产品分析系统
- 将多模态输入和结构化输出结合使用
- 通过API提供服务并存储分析结果

这些技术使我们能够构建更加智能、实用的AI应用，从简单的图像描述到复杂的产品分析系统。通过结构化输出，我们可以更容易地将AI集成到现有系统中，提高数据处理效率和用户体验。

下一章：07_LangChain代理与工具使用 - 我们将探索如何使用LangChain构建智能代理，这些代理能够使用工具、执行操作并解决复杂问题，进一步扩展AI应用的能力范围。