服务器+移动端:AI助手的最佳架构?

服务器+移动端:AI助手的最佳架构?

为什么 Server + Client 是最佳选择?

在前面几篇文章中,我们探讨了 Termux、Chaquopy、Kivy、BeeWare 等方案。它们各有优缺点,但都有一个共同的问题:

AI 应用太重了。

  • LLM 推理需要大量算力
  • 模型文件动辄几 GB
  • 移动端硬件限制明显
  • 第三方库兼容性差

Server + Client 架构完美解决了这些问题:

复制代码
┌─────────────────┐         ┌─────────────────┐
│   移动端 Client  │  <--->  │   服务器 Server  │
│   (轻量 UI)      │  API    │   (AI 逻辑)      │
└─────────────────┘         └─────────────────┘
      优点:                      优点:
      ✅ 原生体验                 ✅ 算力充足
      ✅ 快速迭代                 ✅ 模型自由
      ✅ 离线缓存                 ✅ 统一维护

架构设计

整体架构

复制代码
┌────────────────────────────────────────────────────────────┐
│                        客户端层                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │  iOS App │  │Android App│ │  Web App │  │ Desktop  │    │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘    │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
        │             │             │             │
        └─────────────┴──────┬──────┴─────────────┘
                             │
                    ┌────────▼────────┐
                    │   API Gateway    │
                    │   (Nginx/Traefik) │
                    └────────┬────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                        服务层                                │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ REST API │  │WebSocket │  │  Auth    │  │ Rate Limiter│  │
│  │ (FastAPI)│  │ (实时通信) │  │ (JWT)   │  │ (限流保护)  │    │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘    │
└───────┼─────────────┼─────────────┼─────────────┼──────────┘
        │             │             │             │
        └─────────────┴──────┬──────┴─────────────┘
                             │
┌────────────────────────────▼────────────────────────────────┐
│                        AI 服务层                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
│  │ LLM API  │  │ TTS/ASR  │  │ 向量存储  │  │ 工具执行  │    │
│  │(Ollama)  │  │(Whisper) │  │(Chroma)  │  │(Python)  │    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
└─────────────────────────────────────────────────────────────┘

API 设计

REST API:基础操作
python 复制代码
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from typing import Optional, List

app = FastAPI(title="AI Assistant API")

class ChatRequest(BaseModel):
    message: str
    conversation_id: Optional[str] = None
    stream: bool = False

class ChatResponse(BaseModel):
    reply: str
    conversation_id: str
    tokens_used: int

@app.post("/api/chat", response_model=ChatResponse)
async def chat(
    request: ChatRequest,
    user_id: str = Depends(get_current_user)
):
    """发送消息给 AI 助手"""
    conversation = get_or_create_conversation(
        request.conversation_id, 
        user_id
    )
    
    # 调用 LLM
    reply = await llm_client.chat(
        messages=conversation.messages + [{"role": "user", "content": request.message}],
        stream=request.stream
    )
    
    # 保存对话
    conversation.add_message("user", request.message)
    conversation.add_message("assistant", reply.content)
    
    return ChatResponse(
        reply=reply.content,
        conversation_id=conversation.id,
        tokens_used=reply.tokens
    )
WebSocket:实时通信

对于流式输出和实时交互,WebSocket 是更好的选择:

python 复制代码
from fastapi import WebSocket, WebSocketDisconnect
import asyncio

@app.websocket("/ws/chat/{user_id}")
async def websocket_chat(websocket: WebSocket, user_id: str):
    await websocket.accept()
    
    conversation_id = None
    
    try:
        while True:
            # 接收消息
            data = await websocket.receive_json()
            message = data.get("message")
            
            # 获取或创建对话
            conversation = get_or_create_conversation(
                data.get("conversation_id"),
                user_id
            )
            conversation_id = conversation.id
            
            # 流式调用 LLM
            async for chunk in llm_client.stream_chat(
                messages=conversation.messages + [{"role": "user", "content": message}]
            ):
                await websocket.send_json({
                    "type": "chunk",
                    "content": chunk.content,
                    "conversation_id": conversation_id
                })
            
            # 发送结束信号
            await websocket.send_json({
                "type": "done",
                "conversation_id": conversation_id
            })
            
    except WebSocketDisconnect:
        # 处理断开连接
        pass

客户端实现

iOS (Swift)
swift 复制代码
import Foundation

class AIAssistantClient: ObservableObject {
    private var webSocket: URLSessionWebSocketTask?
    @Published var messages: [Message] = []
    @Published var currentResponse: String = ""
    
    func connect() {
        let url = URL(string: "wss://your-api.com/ws/chat/user123")!
        let session = URLSession(configuration: .default)
        webSocket = session.webSocketTask(with: url)
        webSocket?.resume()
        
        // 开始接收消息
        receiveMessage()
    }
    
    func send(message: String) {
        let payload: [String: Any] = ["message": message]
        let data = try! JSONSerialization.data(withJSONObject: payload)
        
        webSocket?.send(.data(data)) { error in
            if let error = error {
                print("发送失败: \(error)")
            }
        }
    }
    
    private func receiveMessage() {
        webSocket?.receive { [weak self] result in
            switch result {
            case .success(let message):
                if case .data(let data) = message {
                    self?.handleMessage(data)
                }
                self?.receiveMessage()  // 继续接收
            case .failure(let error):
                print("接收失败: \(error)")
            }
        }
    }
    
    private func handleMessage(_ data: Data) {
        guard let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
              let type = json["type"] as? String else { return }
        
        DispatchQueue.main.async {
            if type == "chunk" {
                self.currentResponse += json["content"] as? String ?? ""
            } else if type == "done" {
                self.messages.append(Message(
                    role: .assistant,
                    content: self.currentResponse
                ))
                self.currentResponse = ""
            }
        }
    }
}
Android (Kotlin)
kotlin 复制代码
class AIAssistantClient : ViewModel() {
    private val _messages = MutableStateFlow<List<Message>>(emptyList())
    val messages: StateFlow<List<Message>> = _messages
    
    private val _currentResponse = MutableStateFlow("")
    val currentResponse: StateFlow<String> = _currentResponse
    
    private var webSocket: WebSocket? = null
    
    fun connect(userId: String) {
        val client = OkHttpClient.Builder()
            .pingInterval(30, TimeUnit.SECONDS)
            .build()
        
        val request = Request.Builder()
            .url("wss://your-api.com/ws/chat/$userId")
            .build()
        
        webSocket = client.newWebSocket(request, object : WebSocketListener() {
            override fun onMessage(webSocket: WebSocket, text: String) {
                val json = JSONObject(text)
                when (json.getString("type")) {
                    "chunk" -> {
                        _currentResponse.value += json.getString("content")
                    }
                    "done" -> {
                        val newMessage = Message(
                            role = Role.ASSISTANT,
                            content = _currentResponse.value
                        )
                        _messages.value = _messages.value + newMessage
                        _currentResponse.value = ""
                    }
                }
            }
        })
    }
    
    fun send(message: String) {
        val payload = JSONObject().put("message", message)
        webSocket?.send(payload.toString())
    }
}

离线支持策略

移动端网络不稳定,离线支持很重要:

1. 消息队列

python 复制代码
# 客户端:离线消息队列
class MessageQueue:
    def __init__(self, db_path: str):
        self.db = sqlite3.connect(db_path)
        self._create_table()
    
    def enqueue(self, message: str, conversation_id: str):
        """消息入队"""
        self.db.execute(
            "INSERT INTO pending_messages (message, conversation_id, created_at) VALUES (?, ?, ?)",
            (message, conversation_id, datetime.now())
        )
        self.db.commit()
    
    def get_pending(self) -> List[dict]:
        """获取待发送消息"""
        cursor = self.db.execute(
            "SELECT id, message, conversation_id FROM pending_messages ORDER BY created_at"
        )
        return [{"id": row[0], "message": row[1], "conversation_id": row[2]} for row in cursor]
    
    def mark_sent(self, message_id: int):
        """标记已发送"""
        self.db.execute("DELETE FROM pending_messages WHERE id = ?", (message_id,))
        self.db.commit()

2. 本地缓存

python 复制代码
# 客户端:对话缓存
class ConversationCache:
    def __init__(self, max_size: int = 100):
        self.cache = OrderedDict()
        self.max_size = max_size
    
    def get(self, conversation_id: str) -> Optional[Conversation]:
        if conversation_id in self.cache:
            self.cache.move_to_end(conversation_id)
            return self.cache[conversation_id]
        return None
    
    def put(self, conversation: Conversation):
        if conversation.id in self.cache:
            self.cache.move_to_end(conversation.id)
        else:
            if len(self.cache) >= self.max_size:
                self.cache.popitem(last=False)
            self.cache[conversation.id] = conversation

3. 网络状态检测

kotlin 复制代码
// Android: 网络状态监听
class NetworkMonitor(context: Context) {
    private val connectivityManager = context.getSystemService(ConnectivityManager::class.java)
    
    val isOnline: Flow<Boolean> = callbackFlow {
        val callback = object : ConnectivityManager.NetworkCallback() {
            override fun onAvailable(network: Network) {
                trySend(true)
            }
            override fun onLost(network: Network) {
                trySend(false)
            }
        }
        
        val request = NetworkRequest.Builder()
            .addCapability(NetworkCapabilities.NET_CAPABILITY_INTERNET)
            .build()
        
        connectivityManager.registerNetworkCallback(request, callback)
        
        awaitClose { connectivityManager.unregisterNetworkCallback(callback) }
    }
}

// 使用
networkMonitor.isOnline.collect { online ->
    if (online) {
        // 发送离线消息队列
        messageQueue.flush()
    }
}

安全考虑

1. 认证

python 复制代码
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt

security = HTTPBearer()

async def get_current_user(
    credentials: HTTPAuthorizationCredentials = Depends(security)
) -> str:
    try:
        payload = jwt.decode(
            credentials.credentials,
            settings.JWT_SECRET,
            algorithms=["HS256"]
        )
        return payload["user_id"]
    except jwt.ExpiredSignatureError:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Token 已过期"
        )
    except jwt.InvalidTokenError:
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="无效 Token"
        )

2. 限流

python 复制代码
from fastapi import FastAPI, Request
from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)
app = FastAPI()

@app.post("/api/chat")
@limiter.limit("30/minute")  # 每分钟 30 次
async def chat(request: Request, ...):
    ...

3. 数据加密

python 复制代码
# 敏感数据加密存储
from cryptography.fernet import Fernet

class SecureStorage:
    def __init__(self, key: bytes):
        self.cipher = Fernet(key)
    
    def encrypt(self, data: str) -> bytes:
        return self.cipher.encrypt(data.encode())
    
    def decrypt(self, encrypted: bytes) -> str:
        return self.cipher.decrypt(encrypted).decode()

成本优化

1. 模型选择

python 复制代码
class LLMRouter:
    """根据任务复杂度选择模型"""
    
    def __init__(self):
        self.models = {
            "simple": "gpt-3.5-turbo",      # 简单任务
            "medium": "gpt-4-turbo",          # 中等任务
            "complex": "gpt-4",               # 复杂任务
            "local": "llama3:8b"              # 本地模型
        }
    
    def select_model(self, prompt: str) -> str:
        # 简单启发式规则
        if len(prompt) < 100:
            return self.models["simple"]
        elif "代码" in prompt or "分析" in prompt:
            return self.models["medium"]
        elif "复杂推理" in prompt:
            return self.models["complex"]
        else:
            return self.models["local"]  # 默认本地模型

2. 缓存策略

python 复制代码
from hashlib import md5
import redis

class ResponseCache:
    def __init__(self, redis_url: str):
        self.redis = redis.from_url(redis_url)
        self.ttl = 3600  # 1 小时
    
    def get(self, prompt: str) -> Optional[str]:
        key = md5(prompt.encode()).hexdigest()
        cached = self.redis.get(key)
        return cached.decode() if cached else None
    
    def set(self, prompt: str, response: str):
        key = md5(prompt.encode()).hexdigest()
        self.redis.setex(key, self.ttl, response)

总结

Server + Client 架构是 AI 助手的最佳选择:

优势 说明
算力充足 服务器可以使用高端 GPU
模型自由 不受移动端限制
快速迭代 服务端更新,客户端无需升级
统一维护 一套代码,多端使用
成本可控 按需选择模型,缓存优化

当然也有缺点:

劣势 说明
网络依赖 需要网络连接
延迟 网络传输增加延迟
服务器成本 需要维护服务器

但对于 AI 应用来说,这些缺点是可以接受的。毕竟,你不会想在手机上跑一个 70B 参数的模型,对吧?


下一篇预告:Chrome DevTools Protocol:浏览器自动化入门。我们将探讨如何用 CDP 实现浏览器自动化,为社交媒体自动发布做准备。


你用过 Server + Client 架构吗?欢迎在评论区分享你的经验!

相关推荐
wanhengidc2 小时前
服务器 网络科技运行
运维·服务器
another heaven2 小时前
【软考 DES与AES加密算法详解(原理、特点、流程、对比)】
运维·服务器·网络
普密斯科技2 小时前
高精度车载插座多维度检测方案——基于3D线激光轮廓传感器的实践应用
大数据·人工智能·深度学习·计算机视觉·3d·测量
桌面运维家2 小时前
服务器负载均衡:策略选择与Session一致性保障指南
运维·服务器·负载均衡
360智汇云2 小时前
360 智汇云 Kafka 云原生架构演进
云原生·架构·kafka
倾一生爱恋换一世纯真2 小时前
使用python代码生成ragas的测试文档testset.json
人工智能·python·ragas·rag测试
AI服务老曹2 小时前
异构计算时代的架构突围:基于 Docker 的 AI 视频平台如何实现 X86/ARM 与 GPU/NPU 全兼容(源码交付)
人工智能·docker·架构
电子科技圈2 小时前
芯科科技闪耀2026嵌入式世界展以Connected Intelligence赋能,构建边缘智能网联新生态
人工智能·嵌入式硬件·mcu·物联网·智慧城市·健康医疗·智能硬件
RanMaxLi2 小时前
【ssh】vscode使用ssh链接服务器失败
服务器·vscode·ssh