尝试在exo集群下使用deepseek模型：第一步，调通llama

exo是一个 多机协同AI大模型集群软件**，它可以将多种设备统一成一个强大的GPU，支持多种模型，并具有动态模型分区、自动设备发现等功能**‌。

问题

实践：多机协同AI大模型集群软件exo：体验github日榜第一名的魅力！-CSDN博客

在安装了exo后，一直运行没有成功，在网页运行的时候，报错让使用debug>2去调试

原来可以命令行调试运行：DEBUG=9 exo run llama-3.2-1b --disable-tui --prompt "hello"

总结

先上结论，这个模型可以：llama-3.2-1b

调试可以加上跟踪信息，比如DEBUG=9

调试

mac exo网页端交互，chat输入信息后，报错：

Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Invalid Metal library. b''

Hide Details

Error: Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Invalid Metal library. b'' at Proxy.openaiChatCompletion (http://192.168.0.108:52415/index.js:416:17) at async Proxy.processMessage (http://192.168.0.108:52415/index.js:320:19)

网页调试不便，使用命令手工运行模型：

debug=9 exo run llama-3.2-1b --disable-tui --prompt "hello"

报错信息：AssertionError: Invalid Metal library. b''

AI说可能是源代码问题，编译器问题，环境问题，tinygrad库问题。

也可能是mac太老的缘故吧，只有8g内存，估计跑不起来。deepseek还是厉害啊，8G也能跑。

先放弃mac下。

后面都是在Ubuntu下调试

报错Unsupported model 'deepseek-r1-distill-qwen-1.5b' for inference engine TinygradDynamicShardInferenceEngine

命令：DEBUG=9 exo run deepseek-r1-distill-qwen-1.5b --disable-tui --prompt "What is the mean

ing of exo?"

Error: Unsupported model 'deepseek-r1-distill-qwen-1.5b' for inference engine TinygradDynamicShardInferenceEngine

Task was destroyed but it is pending!

说Tinygrad不支持deepseek啊！

DEBUG=9 exo run deepseek-r1-distill-qwen-1.5b --disable-tui --prompt "What is the mean

ing of exo?"

换了一个模型llama-3.2-1b，这个模型报错不一样：File not found

DEBUG=9 exo run llama-3.2-1b --disable-tui --prompt "What is the meaning of exo?"

报错：FileNotFoundError: File not found: https://hf-mirror.com/unsloth/Llama-3.2-1B-Instruct/resolve/main/model.safetensors.index.json

Downloading shard.model_id='llama-3.2-1b' with allow_patterns=['*']

这个模型是可以的：

llama-3.2-3b

但是下载太耗时

使用llama-3.2-1b测试，发现报错：Error processing prompt: [Errno 2] No such file or directory: 'clang'

no clang

安装

复制代码

pip install clang

还是报这个clang错误

证明pip安装clang不行啊

报错信息：

File "/home/skywalk/py312/lib/python3.12/site-packages/tinygrad/runtime/ops_clang.py", line 32, in compile

obj = subprocess.check_output(['clang', '-c', '-x', 'c', *args, *arch_args, '-', '-o', '-'], input=src.encode('utf-8'))

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/skywalk/py312/lib/python3.12/subprocess.py", line 468, in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/skywalk/py312/lib/python3.12/subprocess.py", line 550, in run

with Popen(*popenargs, **kwargs) as process:

^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/skywalk/py312/lib/python3.12/subprocess.py", line 1028, in init

self._execute_child(args, executable, preexec_fn, close_fds,

File "/home/skywalk/py312/lib/python3.12/subprocess.py", line 1963, in _execute_child

raise child_exception_type(errno_num, err_msg, err_filename)

FileNotFoundError: [Errno 2] No such file or directory: 'clang'

Task was destroyed but it is pending!

task: <Task pending name='Task-6' coro=<Node.periodic_topology_collection() running at /home/skywalk/github/exo/exo/orchestration/node.py:530> wait_for=<Future pending cb=[Task.task_wakeup()]>>

Task was destroyed but it is pending!

task: <Task pending name='Task-11' coro=<TinygradDynamicShardInferenceEngine.ensure_shard() running at /home/skywalk/github/exo/exo/inference/tinygrad/inference.py:152> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /home/skywalk/py312/lib/python3.12/asyncio/futures.py:389, Task.task_wakeup()]>>

报错信息提示没有装clang，确认一下；

clang --version

真的米有啊，一直以为安装了clang了呢。用apt安装：

复制代码

sudo apt install clang

现在看看版本:

clang --version

Ubuntu clang version 14.0.0-1ubuntu1.1

Target: x86_64-pc-linux-gnu

Thread model: posix

InstalledDir: /usr/bin

clang问题解决

报错：ModuleNotFoundError: No module named 'llvmlite'

安装库

复制代码

pip install llvmlite -i https://pypi.tuna.tsinghua.edu.cn/simple
# 货
uv pip install llvmlite -i https://pypi.tuna.tsinghua.edu.cn/simple

在测试，终于出曙光了：

复制代码

exo run llama-3.2-1b --disable-tui --prompt "What is the meaning of exo?"

Detected system: Linux
Inference engine name after selection: tinygrad
Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: SingletonShardDownloader
[61315, 49752, 52571, 64414, 59701, 50907, 60899, 49960, 51965, 57009, 59299, 56902, 63535, 54565, 59561, 55710, 65069, 52294, 52290]
Chat interface started:
 - http://172.25.183.186:52415
 - http://127.0.0.1:52415
ChatGPT API endpoint served at:
 - http://172.25.183.186:52415/v1/chat/completions
 - http://127.0.0.1:52415/v1/chat/completions
has_read=True, has_write=True
Processing prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 20 Feb 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the meaning of exo?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


ram used:  4.94 GB, freqs_cis                                         : 100%|█████████| 148/148 [00:38<00:00,  3.82it/s]
loaded weights in 38754.27 ms, 4.94 GB loaded at 0.13 GB/s
ram used:  9.89 GB, freqs_cis                                         : 100%|█████████| 148/148 [00:09<00:00, 16.30it/s]
loaded weights in 9081.58 ms, 4.94 GB loaded at 0.54 GB/s

长时间运行后报错raise TimeoutError from exc_val

Error processing prompt:

Traceback (most recent call last):

File "/home/skywalk/py312/lib/python3.12/asyncio/tasks.py", line 520, in wait_for

return await fut

^^^^^^^^^

File "/home/skywalk/py312/lib/python3.12/asyncio/locks.py", line 293, in wait_for

await self.wait()

File "/home/skywalk/py312/lib/python3.12/asyncio/locks.py", line 266, in wait

await fut

asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/home/skywalk/github/exo/exo/main.py", line 243, in run_model_cli

await callback.wait(on_token, timeout=300)

File "/home/skywalk/github/exo/exo/helpers.py", line 111, in wait

await asyncio.wait_for(self.condition.wait_for(lambda: self.result is not None and check_condition(*self.result)), timeout)

File "/home/skywalk/py312/lib/python3.12/asyncio/tasks.py", line 519, in wait_for

async with timeouts.timeout(timeout):

^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/skywalk/py312/lib/python3.12/asyncio/timeouts.py", line 115, in aexit

raise TimeoutError from exc_val

TimeoutError

Task was destroyed but it is pending!

task: <Task pending name='Task-6' coro=<Node.periodic_topology_collection() running at /home/skywalk/github/exo/exo/orchestration/node.py:530> wait_for=<Future pending cb=[Task.task_wakeup()]>>

Task was destroyed but it is pending!

task: <Task pending name='Task-1211' coro=<Node.forward_tensor() running at /home/skywalk/github/exo/exo/orchestration/node.py:445> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /home/skywalk/py312/lib/python3.12/asyncio/futures.py:389, Task.task_wakeup()]>>

没事，就是超时罢了，设备配置低的问题，换个简单的问题测试：

exo run llama-3.2-1b --disable-tui --prompt "hello"

loaded weights in 7460.65 ms, 4.94 GB loaded at 0.66 GB/s

Generated response:

HelloHello! How can I assist you today?<|eot_id|>

它回答了，真棒！第一次运行成功！

怀念deepseek啊，比llama快多了啊。下一阶段目标，跑起来deepseek!