消息总线设计：asyncio.Queue实战

搞了两天，终于把 nanobot 的消息总线重构完了。说实话，一开始觉得这玩意儿挺简单的，不就是个队列吗？结果踩了一堆坑，记录一下。

为什么需要消息总线？

nanobot 是一个多渠道 AI 助手，要同时支持 Telegram、Discord、飞书、WhatsApp 等 20 多个平台。每个平台发来的消息都要：

解析格式
调用 LLM 生成回复
把回复发回对应平台

如果用同步的方式，一个平台卡住，其他平台都得等着。用户体验极差。

解决方案：消息总线。每个渠道独立收发消息，通过队列异步处理。

asyncio.Queue 基础

Python 的 asyncio.Queue 是协程安全的队列，非常适合做消息总线。

python 复制代码

import asyncio

# 创建队列
message_queue = asyncio.Queue()

# 生产者：往队列放消息
async def send_message(msg):
    await message_queue.put(msg)

# 消费者：从队列取消息
async def process_messages():
    while True:
        msg = await message_queue.get()
        # 处理消息
        await handle_message(msg)
        message_queue.task_done()

看起来很简单对吧？但实际用起来有几个坑。

坑一：队列满了怎么办？

asyncio.Queue(maxsize=N) 可以设置队列大小。但问题是：队列满了会阻塞。

python 复制代码

# 这行代码会一直阻塞，直到队列有空位
await queue.put(msg)

如果消费者挂了，生产者就会卡死。解决方案是用 put_nowait() + 异常处理：

python 复制代码

try:
    queue.put_nowait(msg)
except asyncio.QueueFull:
    # 队列满了，记录日志，丢弃消息或重试
    logger.warning(f"Queue full, dropping message: {msg}")

坑二：消费者异常退出

消费者是个 while True 循环，如果里面抛了异常，整个消费者就停了。队列里的消息没人处理，越积越多。

解决方案 ：用 try-except 包住整个循环，异常时记录日志并继续：

python 复制代码

async def process_messages():
    while True:
        try:
            msg = await queue.get()
            await handle_message(msg)
            queue.task_done()
        except Exception as e:
            logger.error(f"Error processing message: {e}")
            # 继续处理下一条

更好的做法是用 asyncio.Task 包装，异常时自动重启：

python 复制代码

async def run_consumer():
    while True:
        try:
            await process_messages()
        except Exception as e:
            logger.error(f"Consumer crashed: {e}, restarting...")
            await asyncio.sleep(1)  # 避免疯狂重启

坑三：多消费者竞争

nanobot 需要处理多种类型的消息：文本、图片、语音、文件等。如果只用一个队列，所有消息混在一起，处理逻辑会很乱。

解决方案：多队列 + 多消费者。

python 复制代码

# 按消息类型分队列
text_queue = asyncio.Queue()
media_queue = asyncio.Queue()
voice_queue = asyncio.Queue()

# 每个队列一个消费者
asyncio.create_task(process_text(text_queue))
asyncio.create_task(process_media(media_queue))
asyncio.create_task(process_voice(voice_queue))

# 路由消息
async def route_message(msg):
    if msg.type == "text":
        await text_queue.put(msg)
    elif msg.type in ["image", "video", "file"]:
        await media_queue.put(msg)
    elif msg.type == "voice":
        await voice_queue.put(msg)

这样每种消息类型有独立的处理流程，互不干扰。

完整的消息总线架构

nanobot 最终的架构是这样的：

python 复制代码

class MessageBus:
    def __init__(self):
        self.queues = {}  # channel_id -> Queue
        self.agent_loop = None
    
    async def start(self):
        """启动消息总线"""
        self.agent_loop = asyncio.create_task(self._agent_loop())
    
    async def receive(self, channel_id: str, message: dict):
        """接收来自渠道的消息"""
        if channel_id not in self.queues:
            self.queues[channel_id] = asyncio.Queue()
        
        await self.queues[channel_id].put(message)
    
    async def _agent_loop(self):
        """Agent 主循环：处理所有队列的消息"""
        while True:
            for channel_id, queue in self.queues.items():
                if not queue.empty():
                    msg = await queue.get()
                    # 调用 LLM 生成回复
                    reply = await self.call_llm(msg)
                    # 发回对应渠道
                    await self.send_to_channel(channel_id, reply)
            
            await asyncio.sleep(0.01)  # 避免 CPU 100%

这个架构的好处：

每个渠道独立队列：一个渠道慢了不影响其他渠道
统一 Agent 循环：所有消息走同一个 LLM 调用逻辑
易于扩展：新增渠道只需注册一个队列

性能优化：优先级队列

有些消息需要优先处理，比如用户取消请求。这时候可以用 asyncio.PriorityQueue：

python 复制代码

from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class PrioritizedMessage:
    priority: int
    message: Any = field(compare=False)

# priority 越小越优先
queue = asyncio.PriorityQueue()
await queue.put(PrioritizedMessage(priority=1, message=cancel_request))
await queue.put(PrioritizedMessage(priority=10, message=normal_message))

写在最后

消息总线看起来简单，但要做好容错、扩展性、性能，还是有不少细节要考虑的。

nanobot 用这套架构跑了快一年了，支持 20+ 平台，每天处理几千条消息，还算稳定。

如果你也在做多渠道接入，可以参考这个设计。有问题欢迎评论区交流。

相关文章：