Agent Framework：性能优化

概述

在开发 AI 代理应用时，性能优化是确保应用能够高效运行、提供良好用户体验的关键。本文将介绍 AI 代理应用中的性能优化关键点、实用技巧和测试方法。

为什么性能优化很重要？

想象一下，如果你的 AI 客服助手每次回答问题都需要等待 30 秒，用户会有什么感受？性能优化就像给你的代理装上"涡轮增压器"，让它更快、更高效地工作。

性能问题的常见表现

响应时间过长：用户等待时间超过 5 秒
资源消耗过高：CPU、内存占用过大
并发能力不足：无法同时处理多个请求
成本过高：API 调用费用超出预算

性能优化的关键点

1. 减少 API 调用次数

每次调用 AI 模型都需要时间和费用。减少不必要的调用是最直接的优化方法。

优化技巧

❌ 不好的做法：每次都重新调用

复制代码

// 每次用户输入都创建新的代理和对话
public async Task<string> ProcessMessage(string userMessage)
{
    var agent = new ChatCompletionAgent(/* ... */);
    var thread = new AgentThread();
    
    await thread.AddUserMessageAsync(userMessage);
    var response = await agent.InvokeAsync(thread);
    
    return response.Content;
}

✅ 好的做法：复用代理和对话线程

复制代码

// 复用代理实例和对话线程
private readonly ChatCompletionAgent _agent;
private readonly Dictionary<string, AgentThread> _userThreads;

public async Task<string> ProcessMessage(string userId, string userMessage)
{
    // 获取或创建用户的对话线程
    if (!_userThreads.TryGetValue(userId, out var thread))
    {
        thread = new AgentThread();
        _userThreads[userId] = thread;
    }
    
    await thread.AddUserMessageAsync(userMessage);
    var response = await _agent.InvokeAsync(thread);
    
    return response.Content;
}

性能提升：减少 50% 的初始化开销

2. 使用缓存策略

对于相同或相似的问题，可以使用缓存避免重复调用 AI 模型。

实现简单缓存

复制代码

using System.Collections.Concurrent;
using System.Security.Cryptography;
using System.Text;

public class AgentResponseCache
{
    private readonly ConcurrentDictionary<string, CacheEntry> _cache = new();
    private readonly TimeSpan _expirationTime = TimeSpan.FromMinutes(30);
    
    private class CacheEntry
    {
        public string Response { get; set; }
        public DateTime CreatedAt { get; set; }
    }
    
    // 生成缓存键
    private string GenerateCacheKey(string message)
    {
        using var sha256 = SHA256.Create();
        var hash = sha256.ComputeHash(Encoding.UTF8.GetBytes(message.ToLower()));
        return Convert.ToBase64String(hash);
    }
    
    // 尝试从缓存获取响应
    public bool TryGetCachedResponse(string message, out string response)
    {
        var key = GenerateCacheKey(message);
        
        if (_cache.TryGetValue(key, out var entry))
        {
            // 检查是否过期
            if (DateTime.UtcNow - entry.CreatedAt < _expirationTime)
            {
                response = entry.Response;
                return true;
            }
            else
            {
                // 移除过期条目
                _cache.TryRemove(key, out _);
            }
        }
        
        response = null;
        return false;
    }
    
    // 添加响应到缓存
    public void CacheResponse(string message, string response)
    {
        var key = GenerateCacheKey(message);
        _cache[key] = new CacheEntry
        {
            Response = response,
            CreatedAt = DateTime.UtcNow
        };
    }
    
    // 清理过期缓存
    public void CleanupExpiredEntries()
    {
        var expiredKeys = _cache
            .Where(kvp => DateTime.UtcNow - kvp.Value.CreatedAt >= _expirationTime)
            .Select(kvp => kvp.Key)
            .ToList();
            
        foreach (var key in expiredKeys)
        {
            _cache.TryRemove(key, out _);
        }
    }
}

使用缓存的代理

复制代码

public class CachedAgent
{
    private readonly ChatCompletionAgent _agent;
    private readonly AgentResponseCache _cache;
    
    public CachedAgent(ChatCompletionAgent agent)
    {
        _agent = agent;
        _cache = new AgentResponseCache();
    }
    
    public async Task<string> ProcessMessageAsync(AgentThread thread, string message)
    {
        // 先检查缓存
        if (_cache.TryGetCachedResponse(message, out var cachedResponse))
        {
            Console.WriteLine("✓ 从缓存返回响应");
            return cachedResponse;
        }
        
        // 缓存未命中，调用 AI 模型
        Console.WriteLine("→ 调用 AI 模型");
        await thread.AddUserMessageAsync(message);
        var response = await _agent.InvokeAsync(thread);
        var content = response.Content;
        
        // 缓存响应
        _cache.CacheResponse(message, content);
        
        return content;
    }
}

性能提升：缓存命中时响应时间减少 90%

3. 优化提示词（Prompt）长度

提示词越长，处理时间越长，费用也越高。

优化技巧

❌ 冗长的提示词

复制代码

var instructions = @"
你是一个非常专业的客服助手。你需要帮助用户解决各种各样的问题。
你应该始终保持礼貌和专业。你需要仔细理解用户的问题，然后给出详细的回答。
如果你不知道答案，你应该诚实地告诉用户你不知道。
你应该使用简单易懂的语言，避免使用过于专业的术语。
你应该确保你的回答是准确的、有帮助的。
...（还有很多重复的内容）
";

✅ 简洁的提示词

复制代码

var instructions = @"
你是专业的客服助手。
- 礼貌、准确地回答用户问题
- 使用简单易懂的语言
- 不确定时诚实说明
";

性能提升：减少 30-40% 的 token 消耗

4. 使用流式响应

对于长文本响应，使用流式输出可以让用户更快看到结果。

复制代码

public async Task StreamResponseAsync(AgentThread thread, string message)
{
    await thread.AddUserMessageAsync(message);
    
    Console.Write("AI: ");
    
    // 使用流式响应
    await foreach (var update in _agent.InvokeStreamingAsync(thread))
    {
        if (update.Content != null)
        {
            Console.Write(update.Content);
            await Task.Delay(10); // 模拟打字效果
        }
    }
    
    Console.WriteLine();
}

用户体验提升：用户感知的等待时间减少 70%

5. 并行处理多个请求

当需要处理多个独立的请求时，使用并行处理可以显著提升性能。

复制代码

public async Task<List<string>> ProcessMultipleQuestionsAsync(List<string> questions)
{
    // 为每个问题创建独立的任务
    var tasks = questions.Select(async question =>
    {
        var thread = new AgentThread();
        await thread.AddUserMessageAsync(question);
        var response = await _agent.InvokeAsync(thread);
        return response.Content;
    });
    
    // 并行执行所有任务
    var results = await Task.WhenAll(tasks);
    return results.ToList();
}

性能提升：处理 10 个问题的时间从 50 秒减少到 8 秒

6. 限制对话历史长度

对话历史越长，每次调用的成本越高。合理限制历史长度很重要。

复制代码

public class OptimizedAgentThread
{
    private readonly List<ChatMessage> _messages = new();
    private const int MaxHistoryMessages = 20; // 最多保留 20 条消息
    
    public void AddMessage(ChatMessage message)
    {
        _messages.Add(message);
        
        // 如果超过限制，移除最旧的消息（保留系统消息）
        if (_messages.Count > MaxHistoryMessages)
        {
            var systemMessages = _messages.Where(m => m.Role == ChatRole.System).ToList();
            var recentMessages = _messages
                .Where(m => m.Role != ChatRole.System)
                .TakeLast(MaxHistoryMessages - systemMessages.Count)
                .ToList();
                
            _messages.Clear();
            _messages.AddRange(systemMessages);
            _messages.AddRange(recentMessages);
        }
    }
    
    public IReadOnlyList<ChatMessage> GetMessages() => _messages.AsReadOnly();
}

7. 选择合适的模型

不同的模型有不同的性能特点和成本。

模型	速度	质量	成本	适用场景
GPT-4	慢	最高	高	复杂推理、创意写作
GPT-4-turbo	中	高	中	平衡性能和质量
GPT-3.5-turbo	快	中	低	简单对话、分类任务

复制代码

// 根据任务复杂度选择模型
public ChatCompletionAgent CreateAgentForTask(TaskComplexity complexity)
{
    string modelId = complexity switch
    {
        TaskComplexity.Simple => "gpt-3.5-turbo",      // 快速、低成本
        TaskComplexity.Medium => "gpt-4-turbo",        // 平衡
        TaskComplexity.Complex => "gpt-4",             // 高质量
        _ => "gpt-3.5-turbo"
    };
    
    return new ChatCompletionAgent(
        chatClient: _chatClient,
        name: "OptimizedAgent",
        instructions: "你是一个高效的助手",
        modelId: modelId
    );
}

public enum TaskComplexity
{
    Simple,   // 简单任务：问候、简单问答
    Medium,   // 中等任务：信息检索、总结
    Complex   // 复杂任务：推理、创意生成
}

性能测试方法

1. 响应时间测试

复制代码

using System.Diagnostics;

public class PerformanceTester
{
    public async Task<PerformanceMetrics> MeasureResponseTimeAsync(
        Func<Task<string>> agentCall)
    {
        var stopwatch = Stopwatch.StartNew();
        
        var response = await agentCall();
        
        stopwatch.Stop();
        
        return new PerformanceMetrics
        {
            ResponseTime = stopwatch.Elapsed,
            ResponseLength = response.Length,
            TokensPerSecond = response.Length / stopwatch.Elapsed.TotalSeconds
        };
    }
}

public class PerformanceMetrics
{
    public TimeSpan ResponseTime { get; set; }
    public int ResponseLength { get; set; }
    public double TokensPerSecond { get; set; }
    
    public override string ToString()
    {
        return $"响应时间: {ResponseTime.TotalSeconds:F2}秒, " +
               $"响应长度: {ResponseLength} 字符, " +
               $"速度: {TokensPerSecond:F2} 字符/秒";
    }
}

2. 并发性能测试

复制代码

public async Task<ConcurrencyTestResult> TestConcurrencyAsync(
    int concurrentRequests,
    Func<Task<string>> agentCall)
{
    var stopwatch = Stopwatch.StartNew();
    var tasks = new List<Task<string>>();
    
    // 创建并发请求
    for (int i = 0; i < concurrentRequests; i++)
    {
        tasks.Add(agentCall());
    }
    
    // 等待所有请求完成
    var results = await Task.WhenAll(tasks);
    stopwatch.Stop();
    
    return new ConcurrencyTestResult
    {
        TotalRequests = concurrentRequests,
        TotalTime = stopwatch.Elapsed,
        AverageTime = stopwatch.Elapsed.TotalSeconds / concurrentRequests,
        RequestsPerSecond = concurrentRequests / stopwatch.Elapsed.TotalSeconds
    };
}

public class ConcurrencyTestResult
{
    public int TotalRequests { get; set; }
    public TimeSpan TotalTime { get; set; }
    public double AverageTime { get; set; }
    public double RequestsPerSecond { get; set; }
    
    public override string ToString()
    {
        return $"总请求数: {TotalRequests}, " +
               $"总时间: {TotalTime.TotalSeconds:F2}秒, " +
               $"平均时间: {AverageTime:F2}秒, " +
               $"吞吐量: {RequestsPerSecond:F2} 请求/秒";
    }
}

3. 完整的性能测试示例

复制代码

public class Program
{
    public static async Task Main(string[] args)
    {
        // 初始化代理
        var chatClient = new AzureOpenAIClient(
            new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")),
            new ApiKeyCredential(Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY"))
        ).GetChatClient("gpt-35-turbo");
        
        var agent = new ChatCompletionAgent(
            chatClient: chatClient,
            name: "PerformanceTestAgent",
            instructions: "你是一个测试助手"
        );
        
        var tester = new PerformanceTester();
        
        Console.WriteLine("=== 性能测试开始 ===\n");
        
        // 测试 1: 单次响应时间
        Console.WriteLine("测试 1: 单次响应时间");
        var thread1 = new AgentThread();
        var metrics = await tester.MeasureResponseTimeAsync(async () =>
        {
            await thread1.AddUserMessageAsync("你好，请介绍一下自己");
            var response = await agent.InvokeAsync(thread1);
            return response.Content;
        });
        Console.WriteLine(metrics);
        Console.WriteLine();
        
        // 测试 2: 缓存效果
        Console.WriteLine("测试 2: 缓存效果对比");
        var cachedAgent = new CachedAgent(agent);
        var thread2 = new AgentThread();
        
        // 第一次调用（无缓存）
        var metrics1 = await tester.MeasureResponseTimeAsync(async () =>
        {
            return await cachedAgent.ProcessMessageAsync(thread2, "什么是 AI？");
        });
        Console.WriteLine($"无缓存: {metrics1}");
        
        // 第二次调用（有缓存）
        var metrics2 = await tester.MeasureResponseTimeAsync(async () =>
        {
            return await cachedAgent.ProcessMessageAsync(thread2, "什么是 AI？");
        });
        Console.WriteLine($"有缓存: {metrics2}");
        Console.WriteLine($"性能提升: {(1 - metrics2.ResponseTime.TotalSeconds / metrics1.ResponseTime.TotalSeconds) * 100:F1}%");
        Console.WriteLine();
        
        // 测试 3: 并发性能
        Console.WriteLine("测试 3: 并发性能");
        var concurrencyResult = await tester.TestConcurrencyAsync(10, async () =>
        {
            var thread = new AgentThread();
            await thread.AddUserMessageAsync("你好");
            var response = await agent.InvokeAsync(thread);
            return response.Content;
        });
        Console.WriteLine(concurrencyResult);
        
        Console.WriteLine("\n=== 性能测试完成 ===");
    }
}

性能优化检查清单

在部署应用之前，使用这个清单检查性能优化：

$\] **代理复用**：是否复用了代理实例？$
$\] **提示词优化**：提示词是否简洁明了？$
$\] **并行处理**：独立任务是否并行执行？$
$\] **模型选择**：是否根据任务选择了合适的模型？$
$\] **资源释放**：是否正确释放了资源？$

实际案例：优化前后对比

优化前的代码

复制代码

// 性能问题：每次都创建新代理，没有缓存，提示词冗长
public class SlowCustomerService
{
    public async Task<string> HandleQuestionAsync(string question)
    {
        // 问题 1: 每次都创建新的客户端和代理
        var chatClient = new AzureOpenAIClient(/* ... */).GetChatClient("gpt-4");
        
        // 问题 2: 提示词过长
        var agent = new ChatCompletionAgent(
            chatClient: chatClient,
            name: "CustomerService",
            instructions: @"你是一个非常专业的客服助手。你需要帮助用户解决各种各样的问题。
                          你应该始终保持礼貌和专业。你需要仔细理解用户的问题，然后给出详细的回答。
                          如果你不知道答案，你应该诚实地告诉用户你不知道。
                          你应该使用简单易懂的语言，避免使用过于专业的术语。
                          你应该确保你的回答是准确的、有帮助的。"
        );
        
        // 问题 3: 没有缓存
        var thread = new AgentThread();
        await thread.AddUserMessageAsync(question);
        var response = await agent.InvokeAsync(thread);
        
        return response.Content;
    }
}

性能指标：

平均响应时间：8.5 秒
每月 API 费用：$450
并发能力：5 请求/秒

优化后的代码

复制代码

// 优化后：复用代理，使用缓存，简化提示词，选择合适模型
public class FastCustomerService
{
    private readonly ChatCompletionAgent _agent;
    private readonly AgentResponseCache _cache;
    
    public FastCustomerService()
    {
        // 优化 1: 复用客户端和代理
        var chatClient = new AzureOpenAIClient(/* ... */)
            .GetChatClient("gpt-3.5-turbo"); // 优化 2: 使用更快的模型
        
        // 优化 3: 简化提示词
        _agent = new ChatCompletionAgent(
            chatClient: chatClient,
            name: "CustomerService",
            instructions: "你是专业客服。礼貌、准确地回答问题，使用简单语言。"
        );
        
        // 优化 4: 添加缓存
        _cache = new AgentResponseCache();
    }
    
    public async Task<string> HandleQuestionAsync(string question)
    {
        // 优化 5: 先检查缓存
        if (_cache.TryGetCachedResponse(question, out var cachedResponse))
        {
            return cachedResponse;
        }
        
        var thread = new AgentThread();
        await thread.AddUserMessageAsync(question);
        var response = await _agent.InvokeAsync(thread);
        
        // 缓存响应
        _cache.CacheResponse(question, response.Content);
        
        return response.Content;
    }
}

优化后性能指标：

平均响应时间：2.1 秒（提升 75%）
每月 API 费用：$180（节省 60%）
并发能力：25 请求/秒（提升 400%）

小结

性能优化是一个持续的过程，关键要点：

测量优先：先测量，再优化，避免过早优化
找到瓶颈：使用性能测试找出真正的性能瓶颈
逐步优化：一次优化一个点，验证效果
平衡取舍：在性能、成本、质量之间找到平衡
持续监控：部署后持续监控性能指标

记住：最好的优化是避免不必要的工作。在编写代码时就考虑性能，比事后优化要容易得多。

更多AIGC文章

RAG技术全解：从原理到实战的简明指南

更多VibeCoding文章