1.什么是流式输出?
就是让AI的回答一个字一个字地显示出来,而不是等全部生成完再一次性显示。是提升用户体验的关键功能。
2.流式与非流式的区别
模式
| 模式 | 请求参数 | 用户体验 | 技术特点 |
|---|---|---|---|
| 非流式 | stream:false | 等待几秒-一次性显示全部内容 | 一次请求,一次响应 |
| 流式 | stream: true | 实时显示,一个字一个字出来 | 一次请求,持续接收数据块 |
3.流式响应格式:千问使用SSE(Server-Sent Events)格式:
csharp
data: {"choices":[{"delta":{"content":"你"}}]}
data: {"choices":[{"delta":{"content":"好"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: [DONE]
注意:
【1】 "data:" 后面有空格吗?→ 千问API没有空格,直接跟 JSON
【2】 每块之间两个换行符:\n\n
【3】 最后一块内容是 data: [DONE]
4.流式响应的结构
csharp
{
"choices": [
{
"delta": {
"content": "一个字或几个字",
"role": "assistant"
},
"index": 0,
"finish_reason": null // 最后一个块会有 "stop"
}
],
"usage": null // 流式响应中usage只在最后一块出现
}
5.示例:
主方法:
csharp
using ConsoleApp1.Common;
using ConsoleApp1.Model;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
namespace ConsoleApp1.BLL
{
public class CommonClass
{
///流式请求
///stream = true
public static async Task RequstAI_Stream(List<ChatMessage> chatHistory,Action<string> onTokenReceived,Action<string> onComplete=null, string model = "qwen-turbo", float? temperature = null, int? maxTokens = null)
{
// 替换成你的阿里云百炼 API Key
const string apiKey = ConfigCommon.apiKey; //此处写你申请的API Key
const string url = ConfigCommon.url_chat; //"https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions";
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Authorization", $"Bearer {apiKey}");
//1. 构建请求体(stream: true 是关键)
var requestBody = new
{
model = model,
messages = chatHistory, //MessagesIn
temperature = temperature, // 低温度让输出更稳定 高温度更创新
max_tokens= maxTokens,
stream = true
};
//2. 序列化请求
var json = JsonSerializer.Serialize(requestBody);
var content = new StringContent(json, Encoding.UTF8, "application/json"); //使用 response_format: json_object 时,必须在 messages 中的某个地方明确提到"json"这个词,否则会调用报错。
//Console.WriteLine("正在调用阿里云百炼 AI...\n");
try
{
// 4. 创建HttpRequestMessage(支持HttpCompletionOption) //var response = await client.PostAsync(url, content);
var httpRequest = new HttpRequestMessage(HttpMethod.Post, url)
{
Content = content
};
using (var response =await client.SendAsync(httpRequest, HttpCompletionOption.ResponseHeadersRead))
{
response.EnsureSuccessStatusCode();
using (var stream=await response.Content.ReadAsStreamAsync())
{
using (var reader=new StreamReader(stream, Encoding.UTF8))
{
string line;
var fullContent = new StringBuilder();
while((line=await reader.ReadLineAsync()) != null)
{
if (line.StartsWith("data:"))
{
var data = line.Substring(5); // "data:" 长度是5,去掉后还要去掉空格
//去掉可能的前导空格
data = data.Trim();
if (data == "[DONE]")
{
//结束时调用回调
onComplete?.Invoke(fullContent.ToString());
break;
}
try
{
var chunk = JsonSerializer.Deserialize<StreamChunk>(data);
var contentDelta = chunk?.Choices?.FirstOrDefault()?.Delta?.Content;
if(!string.IsNullOrEmpty(contentDelta))
{
//每收到一个字调用回调
onTokenReceived(contentDelta);
fullContent.Append(contentDelta);
}
}
catch(JsonException)
{
//忽略解析错误,继续读取
continue;
}
}
}
}
}
}
}
catch (Exception ex)
{
Console.WriteLine($"异常:{ex.Message}");
onComplete?.Invoke($"错误:{ex.Message}");
}
finally
{
client.Dispose();
}
}
}
用到的Model实体类1ChatMessage:
csharp
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.Json.Serialization;
using System.Threading.Tasks;
namespace ConsoleApp1.Model
{
//消息体
public class ChatMessage
{
/// <summary>
/// 消息角色:system / user / assistant
/// </summary>
[JsonPropertyName("role")]
public string Role { get; set; } = string.Empty;
/// <summary>
/// 消息内容
/// </summary>
[JsonPropertyName("content")]
public string Content { get; set; } = string.Empty;
public ChatMessage() { }
public ChatMessage(string role, string content)
{
Role = role;
Content = content;
}
}
}
用到的Model实体类2StreamChunk:
csharp
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.Json.Serialization;
using System.Threading.Tasks;
namespace ConsoleApp1.Model
{
/// <summary>
/// 流式响应的每个数据块
/// </summary>
public class StreamChunk
{
[JsonPropertyName("choices")]
public List<StreamChoice> Choices { get; set; } = new List<StreamChoice>();
}
public class StreamChoice
{
[JsonPropertyName("index")]
public int Index { get; set; }
[JsonPropertyName("delta")]
public StreamDelta Delta { get; set; } = new StreamDelta();
[JsonPropertyName("finish_reason")]
public string FinishReason { get; set; }
}
public class StreamDelta
{
[JsonPropertyName("role")]
public string Role { get; set; }
[JsonPropertyName("content")]
public string Content { get; set; }
}
}
调用示例:
csharp
public static async Task Day()
{
try
{
Console.WriteLine($"流式输出\r\n");
//准备对话历史
var chatHistory = new List<ChatMessage>
{
new ChatMessage("system","你是一个C#编程助手"),
new ChatMessage("user","用一句话介绍C#")
};
//调用流式方法
await CommonClass.RequstAI_Stream(
chatHistory, onTokenReceived: (token) =>
{
Console.Write(token);
Console.Out.Flush(); //实时显示
},
onComplete: (fullResponse) =>
{
Console.WriteLine("\n\n 回答完成");
Console.WriteLine($"完整内容长度:{fullResponse?.Length ?? 0}字");
},
model: "qwen-turbo",
temperature: 0.7f,
maxTokens: 500
);
}
catch (Exception ex)
{
Console.WriteLine($"异常:{ex.Message}");
}
}
关键点:
流式响应用的是 delta,不是 message
content 可能只返回一个字或几个字
finish_reason 在最后一个chunk才出现
ResponseHeadersRead与ResponseContentRead的区别:
ResponseHeadersRead:收到响应头就返回,边读边处理, 适用流式输出
ResponseContentRead:等全部内容读完才返回,不适用流式输出。
常见问题总结:
【1】流式输出时控制台卡顿:原因:Console.Write 默认有缓冲区,不会立即刷新。每次输出后调用Console.Out.Flush(); //强制立即输出
【2】如何处理网络中断:增加超时和重试
csharp
public QwenStreamClient(string apiKey)
{
_httpClient = new HttpClient();
_httpClient.Timeout = TimeSpan.FromMinutes(5); // 流式需要较长超时
// ...
}