从零搭建生产级AI智能客服系统(七)：基础优化与一键部署，打造开箱即用的生产级系统

作者：大洪聊AI

更新时间 ：2026年6月

本章目标 ：实现双层缓存降本增效、全局异常统一处理、前端性能优化，完成前后端一体化打包，实现一键启动开箱即用

前置条件：第六章意图识别架构100%完成，所有业务功能正常运行

前言

到第六章为止，我们已经完成了所有核心业务功能：流式聊天、RAG知识库、多轮上下文、意图识别、任务型对话，系统已经能完整跑通业务流程。

但离真正的「生产可用」还有最后一段距离：

❌ 成本高：重复问题每次都调大模型，API费用浪费严重
❌ 体验差：长对话前端卡顿、断网直接报错、异常提示混乱
❌ 部署难：要搭开发环境、要前后端分别启动，给客户演示非常麻烦

这一章我们就做最后一轮优化和打包部署，把项目打磨成可以直接交付的成品：用双层缓存节省90%以上的重复API调用，统一异常处理，优化前端性能，最终打包成单个可执行文件，双击就能启动运行。

一、双层缓存优化：节省90%重复API调用

1.1 缓存设计思路

系统里最费Token的两个环节分别是「意图识别」和「回答生成」，我们针对这两个环节做双层缓存，最大化节省API费用，同时大幅提升响应速度。

缓存层级	Key格式	缓存内容	节省的调用
第一层：意图缓存	`intent:用户原话`	意图分类结果	意图识别的同步LLM调用
第二层：回答缓存	`answer:MD5(增强Prompt)`	完整AI回答	流式生成的LLM调用

设计细节：

意图缓存直接用用户原话做Key，简单直接
回答缓存用MD5做Key，解决RAG增强Prompt过长的问题
增加全局开关，一键开启/关闭所有缓存，方便调试对比
底层用Spring Boot官方Cache抽象，默认Caffeine实现，后续切换Redis无需改业务代码

1.2 第一步：添加Spring Boot Cache依赖

在pom.xml中添加官方缓存Starter，自动集成Caffeine，无需手动指定版本：

xml 复制代码

<!-- Spring Boot 官方缓存 Starter（自动包含Caffeine） -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
</dependency>

添加完成后刷新Maven依赖。

1.3 第二步：配置缓存规则与全局开关

在application.yml中添加缓存配置，同时增加自定义的全局缓存开关：

yaml 复制代码

spring:
  # Spring Cache 配置
  cache:
    type: caffeine
    caffeine:
      spec: maximumSize=1000,expireAfterWrite=1h
    cache-names:
      - qaCache

# 自定义缓存开关
ai:
  cache:
    enabled: true  # 一键关闭所有缓存，测试调试用

参数说明：

maximumSize=1000：最多缓存1000条，超出自动淘汰最久未使用的内容
expireAfterWrite=1h：写入后1小时自动过期，保证知识不会过时
ai.cache.enabled：全局总开关，设为false即可完全关闭缓存

1.4 第三步：启动类开启缓存

在启动类上添加@EnableCaching注解，开启Spring Boot缓存功能：

java 复制代码

package org.example;

import org.mybatis.spring.annotation.MapperScan;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cache.annotation.EnableCaching;

@SpringBootApplication
@MapperScan("org.example.mapper")
@EnableCaching // 开启Spring Boot缓存
public class AiKefuApplication {
    public static void main(String[] args) {
        SpringApplication.run(AiKefuApplication.class, args);
    }
}

1.5 第四步：清理旧配置

如果你之前手动写过CacheConfig.java这类缓存配置类，请直接删除。Spring Boot会根据yml配置自动创建CacheManager，完全不需要手写Bean，这是官方推荐的最佳实践。

1.6 第五步：新建统一缓存服务

封装统一的缓存读写入口，业务代码不直接接触CacheManager，后续切换缓存实现零侵入。

新建文件：src/main/java/org/example/service/LocalCacheService.java

java 复制代码

package org.example.service;

import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.cache.Cache;
import org.springframework.cache.CacheManager;
import org.springframework.stereotype.Service;
import org.springframework.util.DigestUtils;

@Slf4j
@Service
public class LocalCacheService {

    @Autowired
    private CacheManager cacheManager;

    @Value("${ai.cache.enabled:true}")
    private boolean cacheEnabled;

    private static final String CACHE_NAME = "qaCache";
    private static final String INTENT_KEY_PREFIX = "intent:";
    private static final String ANSWER_KEY_PREFIX = "answer:";

    /**
     * 获取意图识别缓存
     */
    public String getIntentCache(String userMessage) {
        if (!cacheEnabled) {
            return null;
        }

        String key = INTENT_KEY_PREFIX + userMessage;
        Cache cache = cacheManager.getCache(CACHE_NAME);
        Cache.ValueWrapper valueWrapper = cache.get(key);

        if (valueWrapper != null) {
            log.info("✅ 意图识别缓存命中: {}", userMessage);
            return (String) valueWrapper.get();
        }

        log.info("❌ 意图识别缓存未命中: {}", userMessage);
        return null;
    }

    /**
     * 写入意图识别缓存
     */
    public void putIntentCache(String userMessage, String intent) {
        if (!cacheEnabled) {
            return;
        }

        String key = INTENT_KEY_PREFIX + userMessage;
        Cache cache = cacheManager.getCache(CACHE_NAME);
        cache.put(key, intent);
        log.info("📝 意图识别缓存已写入: {} → {}", userMessage, intent);
    }

    /**
     * 获取回答缓存（用MD5做Key，解决Prompt过长问题）
     */
    public String getAnswerCache(String prompt) {
        if (!cacheEnabled) {
            return null;
        }

        String md5Key = DigestUtils.md5DigestAsHex(prompt.getBytes());
        String key = ANSWER_KEY_PREFIX + md5Key;
        Cache cache = cacheManager.getCache(CACHE_NAME);
        Cache.ValueWrapper valueWrapper = cache.get(key);

        if (valueWrapper != null) {
            log.info("✅ 回答缓存命中，跳过大模型调用");
            return (String) valueWrapper.get();
        }

        log.info("❌ 回答缓存未命中，调用大模型生成");
        return null;
    }

    /**
     * 写入回答缓存
     */
    public void putAnswerCache(String prompt, String answer) {
        if (!cacheEnabled) {
            return;
        }

        String md5Key = DigestUtils.md5DigestAsHex(prompt.getBytes());
        String key = ANSWER_KEY_PREFIX + md5Key;
        Cache cache = cacheManager.getCache(CACHE_NAME);
        cache.put(key, answer);
        log.info("📝 回答缓存已写入，key: {}", md5Key);
    }

    /**
     * 清空所有缓存
     */
    public void clearAllCache() {
        Cache cache = cacheManager.getCache(CACHE_NAME);
        cache.clear();
        log.info("🗑️ 所有缓存已清空");
    }
}

1.7 第六步：接入意图识别缓存

修改IntentRecognizer.java，调用大模型之前先查缓存：

java 复制代码

package org.example.util;

import dev.langchain4j.model.chat.ChatLanguageModel;
import org.example.service.LocalCacheService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Component;

import java.util.Arrays;
import java.util.List;

@Component
public class IntentRecognizer {

    @Autowired
    @Qualifier("doubaoChatModel")
    private ChatLanguageModel chatModel;

    @Autowired
    private LocalCacheService localCacheService;

    private static final List<String> INTENTS = Arrays.asList(
            "咨询产品", "查询订单", "申请退款", "投诉建议", "转人工", "其他问题"
    );

    public String recognize(String message) {
        // 1. 先查缓存
        String cachedIntent = localCacheService.getIntentCache(message);
        if (cachedIntent != null) {
            return cachedIntent;
        }

        // 2. 缓存未命中，调用大模型识别
        StringBuilder prompt = new StringBuilder();
        prompt.append("你是专业的客服意图识别助手，请严格从下面的意图列表中选择最匹配的一项。\n");
        prompt.append("只返回意图名称本身，不要加任何解释、标点、序号。\n\n");
        prompt.append("可选意图列表：\n");
        prompt.append(String.join("\n", INTENTS)).append("\n\n");
        prompt.append("用户消息：").append(message);

        String result = chatModel.generate(prompt.toString()).trim();

        if (!INTENTS.contains(result)) {
            result = "其他问题";
        }

        // 3. 写入缓存
        localCacheService.putIntentCache(message, result);

        return result;
    }
}

1.8 第七步：接入流式回答缓存

修改ChatStreamHelper.java的streamWithContext方法，流式生成前先查缓存：

java 复制代码

package org.example.service.intent;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.StreamingResponseHandler;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.output.Response;
import org.example.service.LocalCacheService;
import org.example.service.MessageService;
import org.example.utils.CustomerServicePrompt;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.MediaType;
import org.springframework.stereotype.Component;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;

import java.io.IOException;
import java.util.List;

@Component
public class ChatStreamHelper {

    @Autowired
    private MessageService messageService;

    @Autowired
    private LocalCacheService localCacheService;

    private static final int WINDOW_SIZE = 10;

    public void streamWithContext(
            String sessionId,
            String userPrompt,
            StreamingChatLanguageModel streamingModel,
            SseEmitter emitter
    ) {
        // 1. 先查回答缓存
        String cachedAnswer = localCacheService.getAnswerCache(userPrompt);
        if (cachedAnswer != null) {
            try {
                emitter.send(cachedAnswer, MediaType.TEXT_PLAIN);
                emitter.send("[DONE]", MediaType.TEXT_PLAIN);
                emitter.complete();
                messageService.addMessage(sessionId, "assistant", cachedAnswer);
                return;
            } catch (IOException e) {
                emitter.completeWithError(e);
                return;
            }
        }

        // 2. 缓存未命中，正常构建上下文
        List<ChatMessage> contextMessages = messageService.buildSlidingWindowContext(sessionId, WINDOW_SIZE);

        if (!contextMessages.isEmpty() && contextMessages.get(contextMessages.size() - 1) instanceof UserMessage) {
            contextMessages.remove(contextMessages.size() - 1);
        }

        contextMessages.add(0, SystemMessage.from(CustomerServicePrompt.systemPrompt()));
        contextMessages.add(UserMessage.from(userPrompt));

        StringBuilder fullResponse = new StringBuilder();

        // 3. 流式调用大模型
        streamingModel.generate(contextMessages, new StreamingResponseHandler<AiMessage>() {
            @Override
            public void onNext(String token) {
                try {
                    emitter.send(token, MediaType.TEXT_PLAIN);
                    fullResponse.append(token);
                } catch (IOException e) {
                    emitter.completeWithError(e);
                }
            }

            @Override
            public void onComplete(Response<AiMessage> response) {
                try {
                    emitter.send("[DONE]", MediaType.TEXT_PLAIN);
                    emitter.complete();
                    messageService.addMessage(sessionId, "assistant", fullResponse.toString());

                    // 4. 只有纯问题、无上下文的情况才缓存
                    if (contextMessages.size() == 2) {
                        localCacheService.putAnswerCache(userPrompt, fullResponse.toString());
                    }
                } catch (IOException e) {
                    emitter.completeWithError(e);
                }
            }

            @Override
            public void onError(Throwable error) {
                emitter.completeWithError(error);
            }
        });
    }

    public void sendFixedResponse(String sessionId, String response, SseEmitter emitter) {
        try {
            emitter.send(response, MediaType.TEXT_PLAIN);
            emitter.send("[DONE]", MediaType.TEXT_PLAIN);
            emitter.complete();
            messageService.addMessage(sessionId, "assistant", response);
        } catch (IOException e) {
            emitter.completeWithError(e);
        }
    }
}

1.9 缓存效果验证

重启后端项目，清空控制台日志
第一次发送：「你们产品多少钱？」，日志显示两次缓存未命中，回答速度较慢

原话再发送一遍，日志显示：

复制代码

✅ 意图识别缓存命中: 你们产品多少钱？
✅ 回答缓存命中，跳过大模型调用

前端几乎瞬间收到完整回答，无打字机效果

✅ 双层缓存功能正常生效。

二、全局异常统一处理：友好的错误提示

2.1 为什么需要全局异常处理

之前的异常处理分散在各个方法里，返回格式不统一，出错时用户可能看到堆栈、乱码、空白页，体验非常差。用@RestControllerAdvice做全局异常拦截，可以：

所有异常返回统一的AjaxResult格式
对不同异常做分类提示，用户能看懂
后端打印完整堆栈，方便排查问题

2.2 实现全局异常处理器

新建文件：src/main/java/org/example/exception/GlobalExceptionHandler.java

java 复制代码

package org.example.exception;

import lombok.extern.slf4j.Slf4j;
import org.example.model.vo.AjaxResult;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;

@Slf4j
@RestControllerAdvice
public class GlobalExceptionHandler {

    /**
     * 捕获所有未处理的系统异常
     */
    @ExceptionHandler(Exception.class)
    public AjaxResult<String> handleException(Exception e) {
        log.error("系统异常：", e);
        return AjaxResult.error("系统繁忙，请稍后再试");
    }

    /**
     * 大模型调用相关异常
     */
    @ExceptionHandler(RuntimeException.class)
    public AjaxResult<String> handleRuntimeException(RuntimeException e) {
        log.error("运行时异常：", e);
        String message = e.getMessage();
        if (message != null) {
            if (message.contains("timeout")) {
                return AjaxResult.error("大模型响应超时，请稍后再试");
            }
            if (message.contains("401") || message.contains("Unauthorized")) {
                return AjaxResult.error("API密钥无效，请检查配置");
            }
            if (message.contains("429")) {
                return AjaxResult.error("请求过于频繁，请稍后再试");
            }
        }
        return AjaxResult.error("请求处理失败，请稍后再试");
    }
}

三、前端性能优化：长对话不卡顿，断网自动重连

3.1 优化1：虚拟滚动解决长对话卡顿

当聊天记录超过50条时，DOM节点过多会导致页面卡顿。虚拟滚动的原理是：只渲染可视区域内的消息，不可见的消息不渲染DOM，无论多少条记录都流畅。

修改src/views/ChatView.vue：

vue 复制代码

<template>
  <div class="chat-page">
    <div class="chat-header">
      <h1>AI智能客服</h1>
    </div>

    <!-- 聊天内容区：虚拟滚动容器 -->
    <div class="chat-content" ref="chatContentRef" @scroll="handleScroll">
      <div :style="{ height: totalHeight + 'px', position: 'relative' }">
        <div :style="{ transform: `translateY(${offsetY}px)` }">
          <div
            v-for="(msg, index) in visibleMessages"
            :key="index"
            :class="['message', msg.role]"
          >
            <div class="message-content">{{ msg.content }}</div>
          </div>
        </div>
      </div>
    </div>

    <div class="chat-input">
      <el-input
        v-model="inputMessage"
        placeholder="请输入您的问题..."
        @keyup.enter="handleSend"
        :disabled="isLoading"
      ></el-input>
      <el-button type="primary" @click="handleSend" :loading="isLoading">发送</el-button>
    </div>
  </div>
</template>

<script setup>
import { ref, computed, onMounted, nextTick } from 'vue'
import { ElMessage } from 'element-plus'

const messageList = ref([])
const inputMessage = ref('')
const isLoading = ref(false)
const chatContentRef = ref(null)
const currentSessionId = ref('')

// ========== 虚拟滚动相关 ==========
const itemHeight = 80 // 单条消息平均高度
const visibleCount = 12 // 可视区域最多显示条数
const scrollTop = ref(0)

// 计算可视范围内的消息
const visibleMessages = computed(() => {
  const startIndex = Math.floor(scrollTop.value / itemHeight)
  const endIndex = Math.min(startIndex + visibleCount, messageList.value.length)
  return messageList.value.slice(startIndex, endIndex)
})

// 总高度（撑开滚动条）
const totalHeight = computed(() => messageList.value.length * itemHeight)

// 偏移量（定位可视区域内容）
const offsetY = computed(() => Math.floor(scrollTop.value / itemHeight) * itemHeight)

// 滚动事件监听
const handleScroll = () => {
  scrollTop.value = chatContentRef.value.scrollTop
}
// ================================

const scrollToBottom = async () => {
  await nextTick()
  if (chatContentRef.value) {
    chatContentRef.value.scrollTop = chatContentRef.value.scrollHeight
    // 滚动到底部后同步scrollTop
    scrollTop.value = chatContentRef.value.scrollTop
  }
}

const loadHistory = async () => {
  if (!currentSessionId.value) return
  try {
    const res = await fetch(`/api/chat/history?sessionId=${currentSessionId.value}`)
    const data = await res.json()
    if (data.code === 200) {
      messageList.value = data.data
      await scrollToBottom()
    }
  } catch (error) {
    console.error('加载历史失败：', error)
  }
}

onMounted(async () => {
  let sessionId = localStorage.getItem('currentSessionId')
  if (!sessionId) {
    const res = await fetch('/api/chat/session', { method: 'POST' })
    const data = await res.json()
    sessionId = data.data
    localStorage.setItem('currentSessionId', sessionId)
  }
  currentSessionId.value = sessionId
  await loadHistory()
})

// ========== SSE自动重连 ==========
let retryCount = 0
const maxRetries = 3

const handleSend = async () => {
  const content = inputMessage.value.trim()
  if (!content || isLoading.value) return

  inputMessage.value = ''
  isLoading.value = true

  messageList.value.push({ role: 'user', content })
  await scrollToBottom()

  const aiIndex = messageList.value.length
  messageList.value.push({ role: 'ai', content: '' })
  await scrollToBottom()

  try {
    const response = await fetch('/api/chat/stream', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        sessionId: currentSessionId.value,
        message: content
      })
    })

    if (!response.ok || !response.body) {
      throw new Error(`HTTP ${response.status}`)
    }

    // 请求成功，重置重试次数
    retryCount = 0

    const reader = response.body.getReader()
    const decoder = new TextDecoder()
    let buffer = ''

    while (true) {
      const { done, value } = await reader.read()
      if (done) break

      buffer += decoder.decode(value, { stream: true })

      if (buffer.includes('[DONE]')) {
        messageList.value[aiIndex].content = buffer.replace('[DONE]', '')
        buffer = ''
        break
      } else {
        messageList.value[aiIndex].content = buffer
      }

      await scrollToBottom()
    }

    if (buffer && buffer !== '[DONE]') {
      messageList.value[aiIndex].content = buffer.replace('[DONE]', '')
    }

    if (!messageList.value[aiIndex].content) {
      messageList.value[aiIndex].content = '未收到回复，请重试。'
    }

  } catch (error) {
    console.error('流式请求失败：', error)

    // 自动重试
    if (retryCount < maxRetries) {
      retryCount++
      ElMessage.info(`连接断开，正在重试(${retryCount}/${maxRetries})...`)
      setTimeout(() => {
        // 重试时回退刚才的空消息
        messageList.value.splice(aiIndex, 1)
        handleSend()
      }, 2000)
      return
    }

    messageList.value[aiIndex].content = '连接出错，请检查网络后重试。'
    ElMessage.error('请求失败，请检查网络')
  } finally {
    isLoading.value = false
    await scrollToBottom()
  }
}
</script>

<style scoped>
/* 基础样式和之前保持一致，这里只列核心部分 */
.chat-page {
  width: 100vw;
  height: 100vh;
  display: flex;
  flex-direction: column;
  background-color: #f5f7fa;
}

.chat-header {
  padding: 16px 20px;
  background: white;
  border-bottom: 1px solid #e6e6e6;
  text-align: center;
}

.chat-content {
  flex: 1;
  overflow-y: auto;
  padding: 20px;
}

.message {
  margin-bottom: 16px;
  max-width: 70%;
  display: flex;
  min-height: 40px;
}

.message.user {
  margin-left: auto;
  justify-content: flex-end;
}

.message.ai {
  margin-right: auto;
  justify-content: flex-start;
}

.message-content {
  padding: 12px 16px;
  border-radius: 8px;
  line-height: 1.5;
  word-break: break-word;
}

.message.user .message-content {
  background-color: #409eff;
  color: white;
}

.message.ai .message-content {
  background-color: white;
  color: #303133;
  box-shadow: 0 1px 2px rgba(0,0,0,0.05);
}

.chat-input {
  padding: 16px 20px;
  background: white;
  border-top: 1px solid #e6e6e6;
  display: flex;
  gap: 10px;
}

.chat-input .el-input {
  flex: 1;
}
</style>

3.2 优化效果

长对话流畅：即使有几百上千条聊天记录，页面也不会卡顿
断网自动恢复：网络波动时自动重试3次，用户几乎无感知
失败有提示：重试失败后给出明确提示，不会空白卡死

四、前后端一体化打包：单文件可执行

4.1 打包目标

把前端静态文件打进后端Jar包，最终生成单个可执行Jar文件，不需要Node.js、不需要VSCode、不需要分别启动前后端，有Java环境就能跑。

4.2 第一步：打包前端项目

在前端项目根目录执行打包命令：

bash 复制代码

npm run build

执行完成后，前端根目录会生成一个dist文件夹，里面就是编译后的纯静态文件。

4.3 第二步：静态文件移入后端

把dist文件夹里的所有文件 ，复制到后端项目的src/main/resources/static目录下。如果static目录不存在，手动新建一个即可。

最终目录结构：

复制代码

src/main/resources/
├── application.yml
└── static/
    ├── index.html
    ├── assets/
    └── ...其他前端静态文件

4.4 第三步：确认后端打包配置

Spring Boot默认会把resources/static下的文件打进Jar包，不需要额外复杂配置。只需要确保pom.xml里的spring-boot-maven-plugin正常即可：

xml 复制代码

<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
            <configuration>
                <excludes>
                    <exclude>
                        <groupId>org.projectlombok</groupId>
                        <artifactId>lombok</artifactId>
                    </exclude>
                </excludes>
                </excludes>
            </configuration>
        </plugin>
    </plugins>
</build>

4.5 第四步：打包生成可执行Jar

在后端项目根目录执行打包命令：

bash 复制代码

mvn clean package -DskipTests

打包完成后，在target目录下会生成ai-kefu-0.0.1-SNAPSHOT.jar，这就是一体化的可执行文件。

五、本地部署：一键启动，开箱即用

5.1 部署环境要求

部署机器只需要满足三个条件，不需要任何开发工具：

JDK 8 或 OpenJDK 8
MySQL 8.0+（服务运行中）
Python 3.10+（安装了Chroma）

5.2 部署文件准备

新建一个部署文件夹（比如D:\ai-kefu-deploy），放入以下内容：

复制代码

ai-kefu-deploy/
├── ai-kefu-0.0.1-SNAPSHOT.jar   # 刚才打包的Jar包
├── config/
│   └── application.yml           # 外置配置文件
└── start.bat                     # 一键启动脚本

外置配置说明 ：把application.yml放到config文件夹里，Spring Boot会优先读取这个配置，修改参数不需要重新打包。记得把配置里的数据库地址、API密钥等改成部署环境的对应值。

5.3 编写一键启动脚本

在部署目录下新建start.bat：

bat 复制代码

@echo off
title AI智能客服系统

echo ======================================
echo   AI智能客服系统 一键启动
echo ======================================
echo.
echo [1/2] 正在启动Chroma向量数据库...
start "Chroma向量数据库" cmd /k "chroma run --host 0.0.0.0 --port 8000 --path ./chroma-data"

echo [2/2] 等待Chroma启动中...
timeout /t 5 /nobreak >nul

echo 正在启动AI客服服务...
echo.
echo 启动完成后请访问：http://localhost:8080
echo 按 Ctrl+C 可停止服务
echo ======================================
echo.

java -jar ai-kefu-0.0.1-SNAPSHOT.jar

pause

5.4 启动使用

确保MySQL服务正常运行
双击start.bat启动系统
等待10秒左右，服务启动完成
浏览器访问 http://localhost:8080 即可直接使用

✅ 整个系统只需要一个文件夹，复制到任何Windows机器上都能直接运行。

六、完整功能验证清单

部署完成后，按以下清单逐一验证：

✅ 页面正常打开，聊天界面显示正常
✅ 发送普通问题，流式打字机效果正常
✅ 相同问题第二次发送瞬间返回，后端日志显示缓存命中
✅ 发送「我要查订单」，能正常追问订单号
✅ 上传知识库文档，RAG问答正常，不会编造内容
✅ 刷新页面，历史聊天记录不丢失
✅ 发送「转人工」，正常返回人工客服信息
✅ 故意断网再恢复，能自动重连，有友好提示

七、常见问题排查

问题1：缓存不生效，每次都调大模型

检查ai.cache.enabled是否为true
检查启动类是否加了@EnableCaching
确认两次发送的问题完全一致（标点、空格都相同）
查看后端日志是否有缓存相关的输出

问题2：打包后访问页面404

检查前端文件是否复制到了src/main/resources/static目录下
确认访问地址是http://localhost:8080，不要加多余路径
重新执行mvn clean package，确保静态文件打进Jar包

问题3：Jar包启动报错找不到数据库

检查config/application.yml中的数据库地址、账号密码是否正确
确认MySQL服务已经启动，数据库ai_kefu已经创建
检查服务器防火墙是否开放了数据库端口

问题4：虚拟滚动后消息显示错位

调整itemHeight的值，匹配实际的单条消息高度
消息内容高度差异过大时，可以适当调大itemHeight留有余量

八、本章总结 & 全系列收尾

本章完成内容

✅ 双层缓存优化：意图+回答双层缓存，节省90%以上重复API调用
✅ 全局异常处理：统一错误返回格式，友好的用户提示
✅ 前端性能优化：虚拟滚动解决长对话卡顿，SSE自动重连
✅ 前后端一体化打包：单个Jar文件，包含全部前后端代码
✅ 一键部署方案：双击启动，开箱即用，无需开发环境

全系列项目成果

🎉 恭喜你，到这里整个AI智能客服系统就全部开发完成了！你已经拥有了一个完整的、生产级别的AI客服系统：

✅ SSE流式聊天，ChatGPT同款打字机体验
✅ RAG知识库系统，防编造，只说真话
✅ 多轮上下文管理，AI拥有记忆
✅ 意图识别+任务型对话，支持查订单、转人工等标准化业务
✅ 策略模式+工厂模式架构，符合开闭原则，可无限扩展
✅ 双层缓存降本增效，大幅降低API成本
✅ 数据持久化，重启不丢失
✅ 一键打包部署，开箱即用

后续扩展方向

增加会话列表、多会话管理
开发知识库管理后台，支持文档在线管理
增加用户权限体系，区分管理员和普通用户
接入更多大模型，支持一键切换
增加数据统计看板，查看调用量、命中率、常见问题
切换Redis分布式缓存，支持集群部署

整个系列教程到这里就全部完结了！ 如果觉得有帮助，欢迎点赞收藏关注，后续会持续更新更多AI实战项目。