LangChain4j 流式响应

1. 什么是流式响应

LLMs 一次生成一个文本，因此许多 LLM 提供商提供了一种方法，可以逐个响应生成文本，而不是等待整个文本生成。

流式响应的优势：

大大改善用户体验
用户不需要等待未知时间
可以立即开始阅读响应

对于 ChatModel 和 LanguageModel 接口，有相应的 StreamingChatModel 和 StreamingLanguageModel 接口。

它们的 API 相似，但可以流式传输响应。它们接受一个实现 StreamingChatResponseHandler 接口的参数。

2. StreamingChatResponseHandler 详解

通过实现 StreamingChatResponseHandler，您可以定义以下事件的处理程序：

2.1 部分文本响应

当下一个部分文本响应生成时：

onPartialResponse(String) 被调用
或 onPartialResponse(PartialResponse, PartialResponseContext) 被调用

根据 LLM 提供商的不同，部分响应文本可以由单个或多个令牌组成。

例如，当令牌可用时，您可以将其直接发送到 UI。

2.2 部分思维/推理文本

当下一个部分思维/推理文本生成时：

onPartialThinking(PartialThinking) 被调用
或 onPartialThinking(PartialThinking, PartialThinkingContext) 被调用

2.3 部分工具调用

当下一个部分工具调用生成时：

onPartialToolCall(PartialToolCall) 被调用
或 onPartialToolCall(PartialToolCall, PartialToolCallContext) 被调用

2.4 工具调用完成

当 LLM 完成单个工具调用的流式传输时：

onCompleteToolCall(CompleteToolCall) 被调用

2.5 生成完成

当 LLM 完成生成时：

onCompleteResponse(ChatResponse) 被调用
ChatResponse 对象包含完整的响应（AiMessage）以及 ChatResponseMetadata

2.6 错误处理

当发生错误时：

onError(Throwable error) 被调用

3. LambdaStreamingResponseHandler 简化用法

使用 LambdaStreamingResponseHandler 类可以更紧凑地发送响应。

这个实用类提供静态方法，用于使用 lambda 表达式创建一个 StreamingChatResponseHandler。

3.1 基础用法

java 复制代码

import static dev.langchain4j.model.LambdaStreamingResponseHandler.onPartialResponse;

model.chat("Tell me a joke", onPartialResponse(System.out::print));

3.2 带错误处理

onPartialResponseAndError() 方法允许您定义 onPartialResponse() 和 onError() 事件的处理程序：

java 复制代码

import static dev.langchain4j.model.LambdaStreamingResponseHandler.onPartialResponseAndError;

model.chat("Tell me a joke", onPartialResponseAndError(System.out::print, Throwable::printStackTrace));

3.3 阻塞式调用

LangChain4j 还提供了阻塞式调用的方法：

java 复制代码

import static dev.langchain4j.model.LambdaStreamingResponseHandler.onPartialResponseBlocking;

onPartialResponseBlocking(model, "Why is the sky blue?", System.out::print);

java 复制代码

import static dev.langchain4j.model.LambdaStreamingResponseHandler.onPartialResponseAndErrorBlocking;

onPartialResponseAndErrorBlocking(model, "Why is the sky blue?", 
    System.out::print, Throwable::printStackTrace);

4. 流式取消

如果您想取消流式传输，您可以通过以下方法之一来取消：

onPartialResponse(PartialResponse, PartialResponseContext)
onPartialThinking(PartialThinking, PartialThinkingContext)
onPartialToolCall(PartialToolCall, PartialToolCallContext)

上下文对象包含 StreamingHandle，可以用来取消流：

java 复制代码

model.chat(userMessage, new StreamingChatResponseHandler() {

    @Override
    public void onPartialResponse(PartialResponse partialResponse, PartialResponseContext context) {
        process(partialResponse);
        if (shouldCancel()) {
            context.streamingHandle().cancel();
        }
    }

    @Override
    public void onCompleteResponse(ChatResponse completeResponse) {
        System.out.println("onCompleteResponse: " + completeResponse);
    }

    @Override
    public void onError(Throwable error) {
        error.printStackTrace();
    }
});

当调用 StreamingHandle.cancel() 时，LangChain4j 将关闭连接并停止流式传输。调用 StreamingHandle.cancel() 之后，StreamingChatResponseHandler 将不会再收到任何后续回调。

5. 实际代码实现

5.1 StreamingChatService（服务层）

StreamingChatService.java 实现了完整的流式响应功能：

java 复制代码

@Service
public class StreamingChatService {

    @Resource
    private ChatModel chatModel;

    @Resource
    private StreamingChatModel streamingChatModel;

    private final Map<String, MessageWindowChatMemory> memories = new ConcurrentHashMap<>();

    private static final int DEFAULT_MAX_MESSAGES = 10;

    public String chat(String memoryId, String userMessage) {
        MessageWindowChatMemory memory = getOrCreateMemory(memoryId);
        memory.add(UserMessage.from(userMessage));

        List<ChatMessage> messages = new ArrayList<>(memory.messages());
        
        ChatResponse response = chatModel.chat(messages);
        String aiResponse = response.aiMessage().text();
        
        memory.add(AiMessage.from(aiResponse));

        return aiResponse;
    }

    public SseEmitter chatStream(String memoryId, String userMessage) {
        SseEmitter emitter = new SseEmitter(60000L);
        
        emitter.onTimeout(() -> {
            emitter.complete();
        });
        
        emitter.onError((e) -> {
            // 错误处理
        });
        
        MessageWindowChatMemory memory = getOrCreateMemory(memoryId);
        memory.add(UserMessage.from(userMessage));

        List<ChatMessage> messages = new ArrayList<>(memory.messages());
        StringBuilder fullResponse = new StringBuilder();

        try {
            streamingChatModel.chat(messages, new StreamingChatResponseHandler() {
                @Override
                public void onPartialResponse(String partialResponse) {
                    try {
                        if (partialResponse != null && !partialResponse.isEmpty()) {
                            emitter.send(SseEmitter.event()
                                    .id(String.valueOf(System.currentTimeMillis()))
                                    .name("message")
                                    .data(partialResponse)
                                    .build());
                            fullResponse.append(partialResponse);
                        }
                    } catch (IOException e) {
                        emitter.completeWithError(e);
                    }
                }

                @Override
                public void onCompleteResponse(ChatResponse completeResponse) {
                    try {
                        memory.add(AiMessage.from(fullResponse.toString()));
                        emitter.send(SseEmitter.event()
                                .name("done")
                                .data("[DONE]")
                                .build());
                        emitter.complete();
                    } catch (IOException e) {
                        emitter.completeWithError(e);
                    }
                }

                @Override
                public void onError(Throwable error) {
                    try {
                        emitter.send(SseEmitter.event()
                                .name("error")
                                .data("[ERROR]: " + error.getMessage())
                                .build());
                    } catch (IOException e) {
                        // ignore
                    }
                    emitter.completeWithError(error);
                }
            });
        } catch (Exception e) {
            emitter.completeWithError(e);
        }

        return emitter;
    }

    public SseEmitter chatStreamWithSystem(String memoryId, String systemPrompt, String userMessage) {
        MessageWindowChatMemory memory = getOrCreateMemory(memoryId);

        if (!hasSystemMessage(memory)) {
            memory.add(SystemMessage.from(systemPrompt));
        }

        return chatStream(memoryId, userMessage);
    }

    private boolean hasSystemMessage(MessageWindowChatMemory memory) {
        return memory.messages().stream()
            .anyMatch(msg -> msg instanceof SystemMessage);
    }

    private MessageWindowChatMemory getOrCreateMemory(String memoryId) {
        return memories.computeIfAbsent(memoryId, id ->
            MessageWindowChatMemory.builder()
                .id(id)
                .maxMessages(DEFAULT_MAX_MESSAGES)
                .build()
        );
    }
}

5.2 核心功能说明

1. 双模型支持

使用这个模型呢、要先配置对应的配置文件

java 复制代码

@Configuration
public class LangChain4jConfig {

    @Value("${langchain4j.community.dashscope.chat-model.api-key}")
    private String apiKey;

    @Value("${langchain4j.community.dashscope.chat-model.model-name}")
    private String modelName;

    @Bean
    @Primary
    public ChatModel chatModel() {
        return QwenChatModel.builder()
                .apiKey(apiKey)
                .modelName(modelName)
                .enableSearch(true)
                .temperature(0.3F)
                .build();
    }

    @Bean
    public StreamingChatModel streamingChatModel() {
        return QwenStreamingChatModel.builder()
                .apiKey(apiKey)
                .modelName(modelName)
                .enableSearch(true)
                .temperature(0.3F)
                .build();
    }
}

java 复制代码

@Resource
private ChatModel chatModel;  // 普通对话

@Resource
private StreamingChatModel streamingChatModel;  // 流式对话

2. SSE 发射器配置

java 复制代码

SseEmitter emitter = new SseEmitter(60000L);  // 60秒超时

emitter.onTimeout(() -> {
    emitter.complete();
});

emitter.onError((e) -> {
    // 错误处理
});

3. StreamingChatResponseHandler 实现

java 复制代码

streamingChatModel.chat(messages, new StreamingChatResponseHandler() {
    @Override
    public void onPartialResponse(String partialResponse) {
        // 发送部分响应到客户端
        emitter.send(SseEmitter.event()
            .name("message")
            .data(partialResponse)
            .build());
    }

    @Override
    public void onCompleteResponse(ChatResponse completeResponse) {
        // 发送完成事件
        emitter.send(SseEmitter.event()
            .name("done")
            .data("[DONE]")
            .build());
        emitter.complete();
    }

    @Override
    public void onError(Throwable error) {
        // 发送错误事件
        emitter.send(SseEmitter.event()
            .name("error")
            .data("[ERROR]: " + error.getMessage())
            .build());
        emitter.completeWithError(error);
    }
});

4. 会话管理

使用 MessageWindowChatMemory 管理对话历史
自动保留最近 10 条消息
支持系统提示词设置
多用户/多会话隔离

6. API 接口说明

6.1 StreamingChatController（控制器层）

StreamingChatController.java 提供了 SSE 流式接口：

java 复制代码

@RestController
@RequestMapping("/streaming-chat")
@Tag(name = "流式对话", description = "基于 StreamingChatModel 的流式响应功能")
public class StreamingChatController {

    @Resource
    private StreamingChatService streamingChatService;

    @PostMapping(value = "/chat", produces = "text/event-stream")
    @Operation(summary = "流式发送消息", description = "使用 StreamingChatModel 进行流式对话，返回 SSE 流")
    public SseEmitter chatStream(
        @RequestParam String memoryId,
        @RequestParam String message
    ) {
        return streamingChatService.chatStream(memoryId, message);
    }

    @PostMapping(value = "/chat-with-system", produces = "text/event-stream")
    @Operation(summary = "流式对话（带系统提示）", description = "使用系统提示词进行流式对话，返回 SSE 流")
    public SseEmitter chatStreamWithSystem(
        @RequestParam String memoryId,
        @RequestParam String systemPrompt,
        @RequestParam String message
    ) {
        return streamingChatService.chatStreamWithSystem(memoryId, systemPrompt, message);
    }
}

6.2 接口使用示例

1. 基础流式对话

bash 复制代码

curl -X POST http://localhost:8080/streaming-chat/chat \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "memoryId=user123&message=请写一首关于春天的诗"

响应格式（SSE）:

text 复制代码

id: 1706789000000
name: message
data: 春

id: 1706789000001
name: message
data: 天

id: 1706789000002
name: message
data: 来

id: 1706789000003
name: done
data: [DONE]

2. 带系统提示的流式对话

bash 复制代码

curl -X POST http://localhost:8080/streaming-chat/chat-with-system \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "memoryId=user123&systemPrompt=你是一个诗人&message=请写一首关于春天的诗"

6.3 前端 JavaScript 示例

javascript 复制代码

const eventSource = new EventSource(
  '/streaming-chat/chat?memoryId=user123&message=' + 
  encodeURIComponent('请写一首关于春天的诗')
);

eventSource.addEventListener('message', (event) => {
  console.log('收到数据:', event.data);
  // 逐字显示到页面
  updateResponse(event.data);
});

eventSource.addEventListener('done', (event) => {
  console.log('响应完成');
  eventSource.close();
});

eventSource.addEventListener('error', (event) => {
  console.error('发生错误:', event.data);
  eventSource.close();
});

完整事例

html 复制代码

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>流式对话测试</title>
    <style>
        * {
            margin: 0;
            padding: 0;
            box-sizing: border-box;
        }

        body {
            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            min-height: 100vh;
            padding: 20px;
        }

        .container {
            max-width: 800px;
            margin: 0 auto;
            background: white;
            border-radius: 12px;
            box-shadow: 0 10px 40px rgba(0, 0, 0, 0.1);
            overflow: hidden;
        }

        .header {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 30px;
            text-align: center;
        }

        .header h1 {
            font-size: 28px;
            margin-bottom: 10px;
        }

        .header p {
            opacity: 0.9;
            font-size: 14px;
        }

        .chat-container {
            padding: 20px;
            min-height: 400px;
            max-height: 500px;
            overflow-y: auto;
            background: #f8f9fa;
        }

        .message {
            margin-bottom: 20px;
            padding: 15px;
            border-radius: 8px;
            animation: slideIn 0.3s ease-out;
        }

        @keyframes slideIn {
            from {
                opacity: 0;
                transform: translateY(10px);
            }
            to {
                opacity: 1;
                transform: translateY(0);
            }
        }

        .message.user {
            background: #e3f2fd;
            border-left: 4px solid #2196f3;
        }

        .message.assistant {
            background: #f3e5f5;
            border-left: 4px solid #9c27b0;
        }

        .message .label {
            font-weight: bold;
            margin-bottom: 8px;
            font-size: 12px;
            color: #666;
            text-transform: uppercase;
        }

        .message .content {
            line-height: 1.6;
            color: #333;
        }

        .input-container {
            padding: 20px;
            background: white;
            border-top: 1px solid #e0e0e0;
        }

        .input-group {
            display: flex;
            gap: 10px;
            margin-bottom: 10px;
        }

        input[type="text"] {
            flex: 1;
            padding: 12px 15px;
            border: 2px solid #e0e0e0;
            border-radius: 8px;
            font-size: 14px;
            transition: border-color 0.3s;
        }

        input[type="text"]:focus {
            outline: none;
            border-color: #667eea;
        }

        button {
            padding: 12px 30px;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            border: none;
            border-radius: 8px;
            font-size: 14px;
            font-weight: bold;
            cursor: pointer;
            transition: transform 0.2s, box-shadow 0.2s;
        }

        button:hover {
            transform: translateY(-2px);
            box-shadow: 0 5px 15px rgba(102, 126, 234, 0.4);
        }

        button:active {
            transform: translateY(0);
        }

        button:disabled {
            opacity: 0.6;
            cursor: not-allowed;
        }

        .system-prompt {
            margin-bottom: 10px;
        }

        .status {
            padding: 10px;
            margin-bottom: 10px;
            border-radius: 6px;
            font-size: 13px;
            display: none;
        }

        .status.streaming {
            background: #fff3cd;
            color: #856404;
            border: 1px solid #ffeaa7;
            display: block;
        }

        .status.error {
            background: #f8d7da;
            color: #721c24;
            border: 1px solid #f5c6cb;
            display: block;
        }

        .typing-indicator {
            display: inline-block;
            animation: blink 1s infinite;
        }

        @keyframes blink {
            0%, 50% { opacity: 1; }
            51%, 100% { opacity: 0; }
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="header">
            <h1>🚀 LangChain4j 流式对话测试</h1>
            <p>实时流式响应体验 - 支持多轮对话</p>
        </div>

        <div class="chat-container" id="chatContainer">
            <div class="message assistant">
                <div class="label">助手</div>
                <div class="content">你好！我是 AI 助手，有什么可以帮助你的吗？</div>
            </div>
        </div>

        <div class="status" id="status"></div>

        <div class="input-container">
            <div class="system-prompt">
                <input type="text" id="systemPrompt" placeholder="系统提示词（可选）例如：你是一个诗人" />
            </div>
            <div class="input-group">
                <input type="text" id="userMessage" placeholder="输入你的消息..." autofocus />
                <button id="sendButton" onclick="sendMessage()">发送</button>
            </div>
        </div>
    </div>

    <script>
        const chatContainer = document.getElementById('chatContainer');
        const userMessageInput = document.getElementById('userMessage');
        const systemPromptInput = document.getElementById('systemPrompt');
        const sendButton = document.getElementById('sendButton');
        const statusDiv = document.getElementById('status');

        let memoryId = 'user_' + Date.now();
        let currentAssistantMessage = null;

        function showStatus(message, type = 'streaming') {
            statusDiv.textContent = message;
            statusDiv.className = 'status ' + type;
        }

        function hideStatus() {
            statusDiv.className = 'status';
            statusDiv.textContent = '';
        }

        function addMessage(role, content) {
            const messageDiv = document.createElement('div');
            messageDiv.className = 'message ' + role;
            messageDiv.innerHTML = `
                <div class="label">${role === 'user' ? '用户' : '助手'}</div>
                <div class="content">${content}</div>
            `;
            chatContainer.appendChild(messageDiv);
            chatContainer.scrollTop = chatContainer.scrollHeight;
        }

        async function sendMessage() {
            const message = userMessageInput.value.trim();
            if (!message) return;

            const systemPrompt = systemPromptInput.value.trim();
            
            addMessage('user', message);
            userMessageInput.value = '';
            sendButton.disabled = true;

            currentAssistantMessage = document.createElement('div');
            currentAssistantMessage.className = 'message assistant';
            currentAssistantMessage.innerHTML = `
                <div class="label">助手 <span class="typing-indicator">●</span></div>
                <div class="content"></div>
            `;
            chatContainer.appendChild(currentAssistantMessage);

            const contentDiv = currentAssistantMessage.querySelector('.content');
            showStatus('🔄 正在接收流式响应...');

            try {
                const url = systemPrompt 
                    ? `streaming-chat/chat-with-system?memoryId=${memoryId}&systemPrompt=${encodeURIComponent(systemPrompt)}&message=${encodeURIComponent(message)}`
                    : `streaming-chat/chat?memoryId=${memoryId}&message=${encodeURIComponent(message)}`;

                const response = await fetch(url, {
                    method: 'POST'
                });

                if (!response.ok) {
                    throw new Error(`HTTP error! status: ${response.status}`);
                }

                const reader = response.body.getReader();
                const decoder = new TextDecoder();
                let buffer = '';

                while (true) {
                    const { done, value } = await reader.read();
                    if (done) break;

                    buffer += decoder.decode(value, { stream: true });
                    
                    const lines = buffer.split('\n');
                    buffer = lines.pop() || '';

                    for (const line of lines) {
                        if (line.startsWith('data:')) {
                            const data = line.substring(5).trim();
                            if (data === '[DONE]') {
                                hideStatus();
                                continue;
                            }
                            if (data.startsWith('[ERROR]')) {
                                showStatus('❌ ' + data, 'error');
                                continue;
                            }
                            contentDiv.textContent += data;
                            chatContainer.scrollTop = chatContainer.scrollHeight;
                        }
                    }
                }

                currentAssistantMessage.querySelector('.label').innerHTML = '助手';
                hideStatus();

            } catch (error) {
                console.error('Error:', error);
                showStatus('❌ 发生错误: ' + error.message, 'error');
                contentDiv.textContent = '抱歉，发生了错误：' + error.message;
            } finally {
                sendButton.disabled = false;
                userMessageInput.focus();
            }
        }

        userMessageInput.addEventListener('keypress', (e) => {
            if (e.key === 'Enter' && !sendButton.disabled) {
                sendMessage();
            }
        });
    </script>
</body>
</html>

6.4 样例展示

总结

本教程涵盖了 LangChain4j 流式响应的核心概念：

流式响应原理：LLM逐个生成令牌，无需等待完整响应
StreamingChatResponseHandler：处理部分响应、完成事件和错误
LambdaStreamingResponseHandler：简化流式响应处理
SSE 集成：Spring 的 Server Sent Events 推送
会话管理：MessageWindowChatMemory 自动管理对话历史

关键优势

✅ 实时响应：用户立即看到生成的文本
✅ 节省时间：无需等待完整响应
✅ 良好体验：类似打字机的效果
✅ 状态管理：自动维护对话上下文

使用场景

聊天机器人
内容生成器
代码助手
文档撰写工具

通过这个实现，您可以构建具有实时流式响应能力的 AI 应用！

LangChain4j 流式响应

目录

1. 什么是流式响应

2. StreamingChatResponseHandler 详解

2.1 部分文本响应

2.2 部分思维/推理文本

2.3 部分工具调用

2.4 工具调用完成

2.5 生成完成

2.6 错误处理

3. LambdaStreamingResponseHandler 简化用法

3.1 基础用法

3.2 带错误处理

3.3 阻塞式调用

4. 流式取消

5. 实际代码实现

5.1 StreamingChatService（服务层）

5.2 核心功能说明

6. API 接口说明

6.1 StreamingChatController（控制器层）

6.2 接口使用示例

6.3 前端 JavaScript 示例

6.4 样例展示

总结

关键优势

使用场景