服务器+移动端:AI助手的最佳架构?
为什么 Server + Client 是最佳选择?
在前面几篇文章中,我们探讨了 Termux、Chaquopy、Kivy、BeeWare 等方案。它们各有优缺点,但都有一个共同的问题:
AI 应用太重了。
- LLM 推理需要大量算力
- 模型文件动辄几 GB
- 移动端硬件限制明显
- 第三方库兼容性差
Server + Client 架构完美解决了这些问题:
┌─────────────────┐ ┌─────────────────┐
│ 移动端 Client │ <---> │ 服务器 Server │
│ (轻量 UI) │ API │ (AI 逻辑) │
└─────────────────┘ └─────────────────┘
优点: 优点:
✅ 原生体验 ✅ 算力充足
✅ 快速迭代 ✅ 模型自由
✅ 离线缓存 ✅ 统一维护
架构设计
整体架构
┌────────────────────────────────────────────────────────────┐
│ 客户端层 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ iOS App │ │Android App│ │ Web App │ │ Desktop │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
│ │ │ │
└─────────────┴──────┬──────┴─────────────┘
│
┌────────▼────────┐
│ API Gateway │
│ (Nginx/Traefik) │
└────────┬────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ 服务层 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ REST API │ │WebSocket │ │ Auth │ │ Rate Limiter│ │
│ │ (FastAPI)│ │ (实时通信) │ │ (JWT) │ │ (限流保护) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
│ │ │ │
└─────────────┴──────┬──────┴─────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ AI 服务层 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ LLM API │ │ TTS/ASR │ │ 向量存储 │ │ 工具执行 │ │
│ │(Ollama) │ │(Whisper) │ │(Chroma) │ │(Python) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
API 设计
REST API:基础操作
python
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from typing import Optional, List
app = FastAPI(title="AI Assistant API")
class ChatRequest(BaseModel):
message: str
conversation_id: Optional[str] = None
stream: bool = False
class ChatResponse(BaseModel):
reply: str
conversation_id: str
tokens_used: int
@app.post("/api/chat", response_model=ChatResponse)
async def chat(
request: ChatRequest,
user_id: str = Depends(get_current_user)
):
"""发送消息给 AI 助手"""
conversation = get_or_create_conversation(
request.conversation_id,
user_id
)
# 调用 LLM
reply = await llm_client.chat(
messages=conversation.messages + [{"role": "user", "content": request.message}],
stream=request.stream
)
# 保存对话
conversation.add_message("user", request.message)
conversation.add_message("assistant", reply.content)
return ChatResponse(
reply=reply.content,
conversation_id=conversation.id,
tokens_used=reply.tokens
)
WebSocket:实时通信
对于流式输出和实时交互,WebSocket 是更好的选择:
python
from fastapi import WebSocket, WebSocketDisconnect
import asyncio
@app.websocket("/ws/chat/{user_id}")
async def websocket_chat(websocket: WebSocket, user_id: str):
await websocket.accept()
conversation_id = None
try:
while True:
# 接收消息
data = await websocket.receive_json()
message = data.get("message")
# 获取或创建对话
conversation = get_or_create_conversation(
data.get("conversation_id"),
user_id
)
conversation_id = conversation.id
# 流式调用 LLM
async for chunk in llm_client.stream_chat(
messages=conversation.messages + [{"role": "user", "content": message}]
):
await websocket.send_json({
"type": "chunk",
"content": chunk.content,
"conversation_id": conversation_id
})
# 发送结束信号
await websocket.send_json({
"type": "done",
"conversation_id": conversation_id
})
except WebSocketDisconnect:
# 处理断开连接
pass
客户端实现
iOS (Swift)
swift
import Foundation
class AIAssistantClient: ObservableObject {
private var webSocket: URLSessionWebSocketTask?
@Published var messages: [Message] = []
@Published var currentResponse: String = ""
func connect() {
let url = URL(string: "wss://your-api.com/ws/chat/user123")!
let session = URLSession(configuration: .default)
webSocket = session.webSocketTask(with: url)
webSocket?.resume()
// 开始接收消息
receiveMessage()
}
func send(message: String) {
let payload: [String: Any] = ["message": message]
let data = try! JSONSerialization.data(withJSONObject: payload)
webSocket?.send(.data(data)) { error in
if let error = error {
print("发送失败: \(error)")
}
}
}
private func receiveMessage() {
webSocket?.receive { [weak self] result in
switch result {
case .success(let message):
if case .data(let data) = message {
self?.handleMessage(data)
}
self?.receiveMessage() // 继续接收
case .failure(let error):
print("接收失败: \(error)")
}
}
}
private func handleMessage(_ data: Data) {
guard let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
let type = json["type"] as? String else { return }
DispatchQueue.main.async {
if type == "chunk" {
self.currentResponse += json["content"] as? String ?? ""
} else if type == "done" {
self.messages.append(Message(
role: .assistant,
content: self.currentResponse
))
self.currentResponse = ""
}
}
}
}
Android (Kotlin)
kotlin
class AIAssistantClient : ViewModel() {
private val _messages = MutableStateFlow<List<Message>>(emptyList())
val messages: StateFlow<List<Message>> = _messages
private val _currentResponse = MutableStateFlow("")
val currentResponse: StateFlow<String> = _currentResponse
private var webSocket: WebSocket? = null
fun connect(userId: String) {
val client = OkHttpClient.Builder()
.pingInterval(30, TimeUnit.SECONDS)
.build()
val request = Request.Builder()
.url("wss://your-api.com/ws/chat/$userId")
.build()
webSocket = client.newWebSocket(request, object : WebSocketListener() {
override fun onMessage(webSocket: WebSocket, text: String) {
val json = JSONObject(text)
when (json.getString("type")) {
"chunk" -> {
_currentResponse.value += json.getString("content")
}
"done" -> {
val newMessage = Message(
role = Role.ASSISTANT,
content = _currentResponse.value
)
_messages.value = _messages.value + newMessage
_currentResponse.value = ""
}
}
}
})
}
fun send(message: String) {
val payload = JSONObject().put("message", message)
webSocket?.send(payload.toString())
}
}
离线支持策略
移动端网络不稳定,离线支持很重要:
1. 消息队列
python
# 客户端:离线消息队列
class MessageQueue:
def __init__(self, db_path: str):
self.db = sqlite3.connect(db_path)
self._create_table()
def enqueue(self, message: str, conversation_id: str):
"""消息入队"""
self.db.execute(
"INSERT INTO pending_messages (message, conversation_id, created_at) VALUES (?, ?, ?)",
(message, conversation_id, datetime.now())
)
self.db.commit()
def get_pending(self) -> List[dict]:
"""获取待发送消息"""
cursor = self.db.execute(
"SELECT id, message, conversation_id FROM pending_messages ORDER BY created_at"
)
return [{"id": row[0], "message": row[1], "conversation_id": row[2]} for row in cursor]
def mark_sent(self, message_id: int):
"""标记已发送"""
self.db.execute("DELETE FROM pending_messages WHERE id = ?", (message_id,))
self.db.commit()
2. 本地缓存
python
# 客户端:对话缓存
class ConversationCache:
def __init__(self, max_size: int = 100):
self.cache = OrderedDict()
self.max_size = max_size
def get(self, conversation_id: str) -> Optional[Conversation]:
if conversation_id in self.cache:
self.cache.move_to_end(conversation_id)
return self.cache[conversation_id]
return None
def put(self, conversation: Conversation):
if conversation.id in self.cache:
self.cache.move_to_end(conversation.id)
else:
if len(self.cache) >= self.max_size:
self.cache.popitem(last=False)
self.cache[conversation.id] = conversation
3. 网络状态检测
kotlin
// Android: 网络状态监听
class NetworkMonitor(context: Context) {
private val connectivityManager = context.getSystemService(ConnectivityManager::class.java)
val isOnline: Flow<Boolean> = callbackFlow {
val callback = object : ConnectivityManager.NetworkCallback() {
override fun onAvailable(network: Network) {
trySend(true)
}
override fun onLost(network: Network) {
trySend(false)
}
}
val request = NetworkRequest.Builder()
.addCapability(NetworkCapabilities.NET_CAPABILITY_INTERNET)
.build()
connectivityManager.registerNetworkCallback(request, callback)
awaitClose { connectivityManager.unregisterNetworkCallback(callback) }
}
}
// 使用
networkMonitor.isOnline.collect { online ->
if (online) {
// 发送离线消息队列
messageQueue.flush()
}
}
安全考虑
1. 认证
python
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt
security = HTTPBearer()
async def get_current_user(
credentials: HTTPAuthorizationCredentials = Depends(security)
) -> str:
try:
payload = jwt.decode(
credentials.credentials,
settings.JWT_SECRET,
algorithms=["HS256"]
)
return payload["user_id"]
except jwt.ExpiredSignatureError:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Token 已过期"
)
except jwt.InvalidTokenError:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="无效 Token"
)
2. 限流
python
from fastapi import FastAPI, Request
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
@app.post("/api/chat")
@limiter.limit("30/minute") # 每分钟 30 次
async def chat(request: Request, ...):
...
3. 数据加密
python
# 敏感数据加密存储
from cryptography.fernet import Fernet
class SecureStorage:
def __init__(self, key: bytes):
self.cipher = Fernet(key)
def encrypt(self, data: str) -> bytes:
return self.cipher.encrypt(data.encode())
def decrypt(self, encrypted: bytes) -> str:
return self.cipher.decrypt(encrypted).decode()
成本优化
1. 模型选择
python
class LLMRouter:
"""根据任务复杂度选择模型"""
def __init__(self):
self.models = {
"simple": "gpt-3.5-turbo", # 简单任务
"medium": "gpt-4-turbo", # 中等任务
"complex": "gpt-4", # 复杂任务
"local": "llama3:8b" # 本地模型
}
def select_model(self, prompt: str) -> str:
# 简单启发式规则
if len(prompt) < 100:
return self.models["simple"]
elif "代码" in prompt or "分析" in prompt:
return self.models["medium"]
elif "复杂推理" in prompt:
return self.models["complex"]
else:
return self.models["local"] # 默认本地模型
2. 缓存策略
python
from hashlib import md5
import redis
class ResponseCache:
def __init__(self, redis_url: str):
self.redis = redis.from_url(redis_url)
self.ttl = 3600 # 1 小时
def get(self, prompt: str) -> Optional[str]:
key = md5(prompt.encode()).hexdigest()
cached = self.redis.get(key)
return cached.decode() if cached else None
def set(self, prompt: str, response: str):
key = md5(prompt.encode()).hexdigest()
self.redis.setex(key, self.ttl, response)
总结
Server + Client 架构是 AI 助手的最佳选择:
| 优势 | 说明 |
|---|---|
| 算力充足 | 服务器可以使用高端 GPU |
| 模型自由 | 不受移动端限制 |
| 快速迭代 | 服务端更新,客户端无需升级 |
| 统一维护 | 一套代码,多端使用 |
| 成本可控 | 按需选择模型,缓存优化 |
当然也有缺点:
| 劣势 | 说明 |
|---|---|
| 网络依赖 | 需要网络连接 |
| 延迟 | 网络传输增加延迟 |
| 服务器成本 | 需要维护服务器 |
但对于 AI 应用来说,这些缺点是可以接受的。毕竟,你不会想在手机上跑一个 70B 参数的模型,对吧?
下一篇预告:Chrome DevTools Protocol:浏览器自动化入门。我们将探讨如何用 CDP 实现浏览器自动化,为社交媒体自动发布做准备。
你用过 Server + Client 架构吗?欢迎在评论区分享你的经验!