【Bug】RuntimeError: Engine loop has died

目录

报错前置条件

使用vllm启动qwen2.5-32b-instruct模型后发生的报错

GPU是GeForce RTX 4090 Laptop GPU

系统是Windows 11

运行系统是WSL2-Ubuntu22.04

报错内容

bash 复制代码
INFO 10-22 22:29:31 engine.py:290] Added request chat-993cbe95e73d4a1db5d1e89e433f727a.
ERROR 10-22 22:29:32 client.py:250] RuntimeError('Engine loop has died')
ERROR 10-22 22:29:32 client.py:250] Traceback (most recent call last):
ERROR 10-22 22:29:32 client.py:250]   File "/home/ai/miniconda3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 150, in run_heartbeat_loop
ERROR 10-22 22:29:32 client.py:250]     await self._check_success(
ERROR 10-22 22:29:32 client.py:250]   File "/home/ai/miniconda3/lib/python3.10/site-packages/vllm/engine/multiprocessing/client.py", line 314, in _check_success
ERROR 10-22 22:29:32 client.py:250]     raise response
ERROR 10-22 22:29:32 client.py:250] RuntimeError: Engine loop has died
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 259, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 255, in wrap
    await func()
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 232, in listen_for_disconnect
    message = await receive()
  File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 555, in receive
    await self.message_event.wait()
  File "/home/ai/miniconda3/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f385017b9d0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/ai/miniconda3/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
    raise exc
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
    await app(scope, receive, sender)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    await response(scope, receive, send)
  File "/home/ai/miniconda3/lib/python3.10/site-packages/starlette/responses.py", line 252, in __call__
    async with anyio.create_task_group() as task_group:
  File "/home/ai/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 763, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)

解决方案

判断是内存不足导致

bash 复制代码
d$ free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       6.9Gi       8.2Gi        80Mi       435Mi       8.2Gi
Swap:          4.0Gi       4.0Gi       0.0Ki

从输出可以看到,系统总内存为 15GB,目前使用了约 6.9GB,剩余约 8.2GB 可用

交换空间(Swap)总共为 4GB,目前已全部使用,且没有可用的交换空间。

如果交换空间不足,会严重影响系统性能

要将交换空间设置为与你的物理内存相同的大小(15GB),可以按照以下步骤操作:

  1. 创建一个新的交换文件

    bash 复制代码
    sudo fallocate -l 15G /swapfile
  2. 设置正确的权限

    bash 复制代码
    sudo chmod 600 /swapfile
  3. 将文件设置为交换空间

    bash 复制代码
    sudo mkswap /swapfile
  4. 启用交换文件

    bash 复制代码
    sudo swapon /swapfile
  5. 确认交换空间已启用

    bash 复制代码
    free -h
  6. 要使更改永久生效 ,请编辑 /etc/fstab 文件,添加以下行:

    bash 复制代码
    sudo vim /etc/fstab
    /swapfile swap swap defaults 0 0
    :wq

这样,就能将交换空间设置为 15GB,性能完全发挥

如果/etc/fstab编辑后不起作用,可以将前面5个步骤的命令写入~/.bashrc

相关推荐
一个通信老学姐5 天前
专业125+总分400+南京理工大学818考研经验南理工电子信息与通信工程,真题,大纲,参考书。
考研·信息与通信·信号处理·1024程序员节
sheng12345678rui5 天前
mfc140.dll文件缺失的修复方法分享,全面分析mfc140.dll的几种解决方法
游戏·电脑·dll文件·dll修复工具·1024程序员节
huipeng9266 天前
第十章 类和对象(二)
java·开发语言·学习·1024程序员节
earthzhang20216 天前
《深入浅出HTTPS》读书笔记(19):密钥
开发语言·网络协议·算法·https·1024程序员节
爱吃生蚝的于勒7 天前
计算机基础 原码反码补码问题
经验分享·笔记·计算机网络·其他·1024程序员节
earthzhang20217 天前
《深入浅出HTTPS》读书笔记(20):口令和PEB算法
开发语言·网络协议·算法·https·1024程序员节
一个通信老学姐7 天前
专业140+总分410+浙江大学842信号系统与数字电路考研经验浙大电子信息与通信工程,真题,大纲,参考书。
考研·信息与通信·信号处理·1024程序员节
earthzhang20217 天前
《深入浅出HTTPS》读书笔记(18):公开密钥算法RSA(续)
网络·网络协议·算法·https·1024程序员节
明明真系叻8 天前
第二十五周机器学习笔记:卷积神经网络复习、动手深度学习—线性回归、感知机
笔记·机器学习·线性回归·1024程序员节
java李杨勇10 天前
基于大数据爬虫数据挖掘技术+Python的网络用户购物行为分析与可视化平台(源码+论文+PPT+部署文档教程等)
大数据·爬虫·数据挖掘·1024程序员节·网络用户购物行为分析可视化平台·大数据毕业设计