智能客服系统

将具备以下特性：

混合意图识别：使用 ML.NET 快速分类用户意图（退货、咨询、投诉等），简单问题直接路由到固定答案，降低 LLM 成本。
RAG 增强对话：复杂问题通过 Semantic Kernel + 向量数据库检索知识库，生成精准回答。
插件化业务集成：通过函数调用查询订单状态、提交工单，无缝对接后端业务系统。
流式实时交互：使用 SignalR 实现打字机效果，提升用户体验。
全链路可观测性：集成 OpenTelemetry 记录 Token 消耗、延迟、错误率。
生产级部署：支持灰度发布、成本控制、安全审计。

1 需求分析与技术选型

1.1 业务需求

某电商平台需要构建智能客服系统，覆盖以下场景：

售前咨询：商品信息、促销活动、运费政策。
售中支持：订单状态查询、修改地址、催发货。
售后服务：退货申请、投诉建议、发票问题。
常见问题：自动回答 FAQ，如退换货政策、物流时效。

要求：

7x24 小时服务，响应时间 < 2 秒（简单问题）或 < 5 秒（复杂问题）。
能够处理多轮对话，记住上下文。
敏感数据（订单号、手机号）脱敏处理。
支持人工无缝转接。

1.2 技术选型

组件	技术选择	理由
后端框架	ASP.NET Core 9	高性能、依赖注入、与 .NET AI 生态无缝集成
前端通信	SignalR	双向实时通信，支持流式输出
传统分类	ML.NET (SGD Calibrated)	轻量级意图识别，CPU 推理 < 5ms，降低 LLM 成本
LLM 编排	Semantic Kernel	插件化架构、函数调用、RAG 支持
大模型	Azure OpenAI (GPT-4o)	企业级合规，高可用，与 .NET SDK 深度集成
嵌入模型	text-embedding-ada-002	用于知识库检索
向量数据库	Redis Stack	低延迟、支持向量相似性搜索，与现有 Redis 基础设施复用
知识库	内部文档 + FAQ	Markdown 格式，定期切片索引
业务插件	订单服务、工单服务的 .NET SDK	直接调用现有微服务
可观测性	OpenTelemetry + Application Insights	统一监控、成本追踪
容器化	Docker + Kubernetes	支持弹性伸缩和灰度发布

1.3 架构图

复制代码

┌─────────────────────────────────────────────────────────────┐
│                      前端 (Web/App)                          │
│                   SignalR Client (流式接收)                   │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   ASP.NET Core 网关层                         │
│  ┌─────────────────────────────────────────────────────┐    │
│  │   SignalR Hub                                       │    │
│  │   - 接收用户消息                                     │    │
│  │   - 流式返回 AI 响应                                 │    │
│  └─────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │   意图分类器 (ML.NET)                                │    │
│  │   - 实时分类用户意图                                 │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│  FAQ 路由       │ │  RAG 管道       │ │  插件调用       │
│  (固定答案)      │ │  (Semantic      │ │  (订单查询/     │
│                 │ │   Kernel)       │ │   工单提交)     │
└─────────────────┘ └─────────────────┘ └─────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  向量数据库      │
                    │  (Redis Stack)  │
                    └─────────────────┘

2 全链路编码实现

2.1 项目结构与依赖

创建解决方案，结构如下：

复制代码

CustomerService/
├── src/
│   ├── CustomerService.Api/               # ASP.NET Core 入口
│   ├── CustomerService.Application/       # 业务逻辑层 (意图分类、对话编排)
│   ├── CustomerService.Infrastructure/    # 基础设施 (向量库、插件实现)
│   ├── CustomerService.Domain/            # 领域模型
│   └── CustomerService.Shared/            # 公共工具
├── tests/
└── docs/

关键 NuGet 包（已在前面章节引入，此处汇总）：

Microsoft.ML
Microsoft.SemanticKernel
Microsoft.SemanticKernel.Plugins.Memory
NRedisStack
Microsoft.AspNetCore.SignalR
OpenTelemetry.Extensions.Hosting

2.2 意图分类器 (ML.NET)

1. 数据准备

收集历史客服对话，标注意图类别：

OrderInquiry (订单查询)
ReturnRequest (退货申请)
Complaint (投诉)
FAQ (常见问题)
Other (其他)

数据格式 CSV：text,label

2. 训练管道

csharp 复制代码

public class IntentClassifier
{
    private readonly MLContext _mlContext;
    private ITransformer _model;

    public IntentClassifier()
    {
        _mlContext = new MLContext(seed: 42);
    }

    public void Train(string dataPath)
    {
        var data = _mlContext.Data.LoadFromTextFile<IntentData>(dataPath, separatorChar: ',', hasHeader: true);
        var pipeline = _mlContext.Transforms.Text.FeaturizeText("Features", nameof(IntentData.Text))
            .Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy())
            .Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

        _model = pipeline.Fit(data);
        _mlContext.Model.Save(_model, null, "models/intent_model.zip");
    }

    public string Predict(string text)
    {
        if (_model == null)
            _model = _mlContext.Model.Load("models/intent_model.zip", out _);
        var predictionEngine = _mlContext.Model.CreatePredictionEngine<IntentData, IntentPrediction>(_model);
        var prediction = predictionEngine.Predict(new IntentData { Text = text });
        return prediction.PredictedLabel;
    }
}

public class IntentData
{
    [LoadColumn(0)]
    public string Text { get; set; }
    [LoadColumn(1)]
    public string Label { get; set; }
}

public class IntentPrediction
{
    [ColumnName("PredictedLabel")]
    public string PredictedLabel { get; set; }
}

3. 集成到服务

csharp 复制代码

public interface IIntentService
{
    string Classify(string text);
}

public class IntentService : IIntentService
{
    private readonly IntentClassifier _classifier = new();

    public string Classify(string text)
    {
        return _classifier.Predict(text);
    }
}

2.3 知识库索引与向量存储

1. 知识库文档处理

将 FAQ 文档（Markdown）切片为块，每个块约 500 字符，重叠 50 字符。

csharp 复制代码

public class KnowledgeBaseIndexer
{
    private readonly ITextEmbeddingGenerationService _embeddingService;
    private readonly IVectorDatabase _vectorDb;

    public async Task IndexAsync(string filePath)
    {
        var text = await File.ReadAllTextAsync(filePath);
        var chunks = SplitIntoChunks(text);
        var embeddings = await _embeddingService.GenerateEmbeddingsAsync(chunks);
        for (int i = 0; i < chunks.Count; i++)
        {
            await _vectorDb.AddAsync($"doc_chunk_{i}", chunks[i], embeddings[i].ToArray(),
                new Dictionary<string, object> { { "source", filePath } });
        }
    }

    private List<string> SplitIntoChunks(string text, int chunkSize = 500, int overlap = 50)
    {
        // 实现分块逻辑（略，参考第6章）
    }
}

2. Redis Stack 向量存储封装

csharp 复制代码

public interface IVectorDatabase
{
    Task AddAsync(string id, string text, float[] vector, Dictionary<string, object> metadata = null);
    Task<List<(string Text, float Score)>> SearchAsync(float[] queryVector, int topK = 3);
}

public class RedisVectorDatabase : IVectorDatabase
{
    private readonly IDatabase _db;
    private readonly string _indexName = "idx:knowledge";

    public RedisVectorDatabase(IConnectionMultiplexer redis)
    {
        _db = redis.GetDatabase();
        CreateIndex();
    }

    private void CreateIndex()
    {
        try
        {
            _db.Execute("FT.CREATE", _indexName, "ON", "HASH", "PREFIX", "1", "kb:", "SCHEMA",
                "text", "TEXT", "vector", "VECTOR", "HNSW", "6", "TYPE", "FLOAT32", "DIM", "1536", "DISTANCE_METRIC", "COSINE");
        }
        catch { /* 索引可能已存在 */ }
    }

    public async Task AddAsync(string id, string text, float[] vector, Dictionary<string, object> metadata = null)
    {
        var hash = new HashEntry[]
        {
            new("text", text),
            new("vector", vector.SelectMany(BitConverter.GetBytes).ToArray())
        };
        await _db.HashSetAsync($"kb:{id}", hash);
    }

    public async Task<List<(string Text, float Score)>> SearchAsync(float[] queryVector, int topK = 3)
    {
        var vectorBytes = queryVector.SelectMany(BitConverter.GetBytes).ToArray();
        var query = $"*=>[KNN {topK} @vector $vec AS score]";
        var parameters = new { vec = vectorBytes };
        var result = await _db.ExecuteAsync("FT.SEARCH", _indexName, query, "PARAMS", "2", "vec", vectorBytes, "SORTBY", "score", "ASC", "RETURN", "2", "text", "score");
        // 解析结果...
        return items;
    }
}

2.4 业务插件实现

订单查询插件

csharp 复制代码

public class OrderPlugin
{
    private readonly IOrderService _orderService;
    private readonly IDataMaskingService _masker;

    public OrderPlugin(IOrderService orderService, IDataMaskingService masker)
    {
        _orderService = orderService;
        _masker = masker;
    }

    [KernelFunction]
    [Description("Get the status of an order")]
    public async Task<string> GetOrderStatusAsync(
        [Description("Order number")] string orderId)
    {
        var order = await _orderService.GetOrderAsync(orderId);
        if (order == null) return "Order not found.";
        return $"Order {orderId} status: {order.Status}. Estimated delivery: {order.EstimatedDelivery}";
    }

    [KernelFunction]
    [Description("Submit a return request for an order")]
    public async Task<string> RequestReturnAsync(
        [Description("Order number")] string orderId,
        [Description("Reason for return")] string reason)
    {
        var result = await _orderService.SubmitReturnAsync(orderId, reason);
        return result ? "Return request submitted successfully." : "Failed to submit return request.";
    }
}

数据脱敏服务

csharp 复制代码

public class DataMaskingService : IDataMaskingService
{
    public string MaskPii(string text)
    {
        // 手机号脱敏: 保留前3后4
        text = Regex.Replace(text, @"1[3-9]\d{9}", m => $"{m.Value[..3]}****{m.Value[^4..]}");
        // 邮箱脱敏
        text = Regex.Replace(text, @"\w+@\w+\.\w+", m => $"{m.Value[0]}***@{m.Value.Split('@')[1]}");
        return text;
    }
}

2.5 对话编排与 Semantic Kernel 集成

Kernel 工厂：根据意图动态配置 Kernel 的插件和提示词。

csharp 复制代码

public class KernelFactory
{
    private readonly IServiceProvider _services;

    public Kernel CreateKernel(string intent)
    {
        var builder = Kernel.CreateBuilder();
        builder.Services.AddLogging();
        // 添加通用 AI 服务
        builder.AddAzureOpenAIChatCompletion(...);
        builder.AddAzureOpenAITextEmbeddingGeneration(...);
        // 添加公共插件
        builder.Plugins.AddFromType<OrderPlugin>();
        builder.Plugins.AddFromType<ReturnPlugin>();

        // 根据意图添加特定插件或内存
        if (intent == "FAQ" || intent == "OrderInquiry")
        {
            builder.Plugins.AddFromType<KnowledgeRetrievalPlugin>(); // RAG 插件
        }

        return builder.Build();
    }
}

RAG 插件（检索知识库）

csharp 复制代码

public class KnowledgeRetrievalPlugin
{
    private readonly IVectorDatabase _vectorDb;
    private readonly ITextEmbeddingGenerationService _embeddingService;

    public KnowledgeRetrievalPlugin(IVectorDatabase vectorDb, ITextEmbeddingGenerationService embeddingService)
    {
        _vectorDb = vectorDb;
        _embeddingService = embeddingService;
    }

    [KernelFunction]
    [Description("Search the knowledge base for relevant information")]
    public async Task<string> SearchAsync(
        [Description("Search query")] string query,
        [Description("Number of results")] int topK = 3)
    {
        var embedding = await _embeddingService.GenerateEmbeddingAsync(query);
        var results = await _vectorDb.SearchAsync(embedding.ToArray(), topK);
        return string.Join("\n\n", results.Select(r => r.Text));
    }
}

对话流编排服务

csharp 复制代码

public class ChatOrchestrator
{
    private readonly IIntentService _intentService;
    private readonly KernelFactory _kernelFactory;
    private readonly ILogger _logger;

    public async IAsyncEnumerable<string> StreamChatAsync(string userMessage, string userId, CancellationToken cancellationToken)
    {
        // 1. 意图识别
        var intent = _intentService.Classify(userMessage);
        _logger.LogInformation("Intent for user {UserId}: {Intent}", userId, intent);

        // 2. 简单意图直接返回 FAQ 答案（可选）
        if (intent == "FAQ")
        {
            var faqAnswer = await GetFaqAnswer(userMessage);
            if (!string.IsNullOrEmpty(faqAnswer))
            {
                yield return faqAnswer;
                yield break;
            }
        }

        // 3. 创建 Kernel 并构建提示词
        var kernel = _kernelFactory.CreateKernel(intent);
        var systemPrompt = GetSystemPrompt(intent);
        var prompt = $"{systemPrompt}\nUser: {userMessage}\nAssistant:";

        // 4. 流式调用 LLM
        var stream = kernel.InvokePromptStreamingAsync(prompt, cancellationToken: cancellationToken);
        await foreach (var chunk in stream)
        {
            yield return chunk;
        }
    }

    private string GetSystemPrompt(string intent)
    {
        return intent switch
        {
            "OrderInquiry" => "You are a customer service assistant. Use the OrderPlugin to help users check order status. Be concise and polite.",
            "ReturnRequest" => "You are a returns specialist. Use the ReturnPlugin to process return requests. Collect necessary details first.",
            _ => "You are a helpful customer service assistant. Answer questions based on your knowledge and available tools."
        };
    }

    private async Task<string> GetFaqAnswer(string question)
    {
        // 可选：使用轻量级 FAISS 或缓存
        // 这里简化：返回 null 表示走 LLM
        return null;
    }
}

2.6 SignalR Hub 实现

csharp 复制代码

public class ChatHub : Hub
{
    private readonly ChatOrchestrator _orchestrator;
    private readonly ILogger<ChatHub> _logger;

    public ChatHub(ChatOrchestrator orchestrator, ILogger<ChatHub> logger)
    {
        _orchestrator = orchestrator;
        _logger = logger;
    }

    public async IAsyncEnumerable<string> SendMessage(string message, [EnumeratorCancellation] CancellationToken cancellationToken)
    {
        var userId = Context.UserIdentifier ?? Context.ConnectionId;
        _logger.LogInformation("User {UserId} sent: {Message}", userId, message);

        await foreach (var chunk in _orchestrator.StreamChatAsync(message, userId, cancellationToken))
        {
            yield return chunk;
        }
    }
}

csharp 复制代码

// Program.cs
builder.Services.AddSignalR();
app.MapHub<ChatHub>("/chatHub");

2.7 可观测性与成本追踪

添加 OpenTelemetry（参考第10章），在 Semantic Kernel 调用前后记录 Token 用量。

在 ChatOrchestrator 中注入 ActivitySource：

csharp 复制代码

public async IAsyncEnumerable<string> StreamChatAsync(...)
{
    using var activity = AIDiagnostics.ActivitySource.StartActivity("ChatOrchestration");
    activity?.SetTag("user_id", userId);
    activity?.SetTag("intent", intent);
    // ... 流式调用
}

3 压力测试与生产环境踩坑经验总结

3.1 压力测试方案

工具：k6 或 JMeter，模拟并发用户数 100~500。
场景：
1. 意图识别路径（ML.NET）：目标延迟 < 10ms，QPS > 5000。
2. RAG + LLM 路径：目标延迟 < 5s，QPS > 50。
3. 混合路径：根据业务比例混合。
监控指标：CPU、内存、网络 I/O、GPU 利用率、Token 消耗、错误率。

3.2 生产环境踩坑与解决方案

坑1：ML.NET 模型加载慢

现象：首次预测耗时 1~2 秒，因为模型加载在第一次预测时发生。
解决：在应用启动时预加载模型（IntentClassifier 构造函数中加载），避免冷启动影响用户体验。

坑2：Redis Stack 向量搜索性能下降

现象：随着知识库增大（>10万条），搜索延迟从 <10ms 上升到 >200ms。
解决：使用 HNSW 索引并调整参数（ef_construction 和 M），定期重建索引。对于大规模，考虑迁移到专业向量数据库（如 Qdrant）。

坑3：SignalR 连接频繁断开

现象：在 Kubernetes 环境中，长时间连接被负载均衡器中断。
解决：配置 SignalR 的 WebSocket 超时，增加客户端心跳，使用 Redis 背板支持横向扩展。

坑4：LLM 函数调用参数错误

现象：模型生成的函数参数缺少必需字段，或类型不匹配。
解决：在插件方法中增加参数校验和默认值，并在提示词中明确说明参数格式。使用 Semantic Kernel 的 AutoFunctionInvocationFilter 进行预处理。

csharp 复制代码

public class ValidationFilter : IAutoFunctionInvocationFilter
{
    public async Task OnAutoFunctionInvocationAsync(AutoFunctionInvocationContext context, Func<AutoFunctionInvocationContext, Task> next)
    {
        // 验证参数，如果缺失则要求 LLM 补充
        await next(context);
    }
}

坑5：成本失控

现象：某日 Token 消耗异常增长，发现是因为恶意用户重复调用复杂提示词。
解决：实施用户级别配额（每日 Token 上限），并设置成本告警（参考第10章）。对 RAG 检索的上下文长度限制最大 2000 tokens。

坑6：多轮对话上下文截断

现象：长对话后，模型"忘记"早期内容。
解决：使用 Semantic Kernel 的 ChatHistory 管理对话，定期压缩历史（摘要生成），或利用向量数据库存储长期记忆。

坑7：GPU 显存泄漏

现象：ONNX Runtime + CUDA 在长时间运行后显存持续增长，最终 OOM。
解决：确保每个 InferenceSession 正确释放（Dispose），使用 using 或依赖注入管理生命周期。检查是否有未释放的 Tensor 对象。

3.3 运维最佳实践

健康检查 ：暴露 /health 端点，检测模型加载状态、向量数据库连接、LLM 服务可用性。
优雅关闭 ：在 IHostedService 中实现优雅关闭，等待正在处理的请求完成再释放资源。
日志脱敏：在日志输出前，对用户输入的敏感信息进行脱敏（手机号、邮箱）。
回滚策略：使用 Kubernetes 的滚动更新，并保留旧版本镜像，遇到问题可快速回滚。

总结

通过一个完整的智能客服系统案例，串联了核心技术：

ML.NET 实现了低成本的意图识别，过滤简单问题。
Semantic Kernel 承担了复杂的对话编排、函数调用和 RAG 增强。
向量数据库 支撑了知识库的语义检索。
SignalR 提供了流畅的流式对话体验。
OpenTelemetry 和成本控制确保了系统的可观测性和可持续性。

这个系统不仅是一个客服应用，更是一个可复用的 AI 架构模板，你可以将其中的模式应用到智能文档处理、代码生成助手、企业内部问答等场景。