RKLLM 部署 LLM
在
RK3588开发板上,利用Rockchip NPU运行大语言模型(LLM),测试模型包括:
Qwen3-1.7Bdeepseekocr_w8a8_rk3588.rkllm(已转换模型)
1、环境
设备
NanoPi M6SoC:RK3588SNPU:6 TOPSNPU核心数:3 核- 系统:
Linux(Debian/Ubuntu)
PC 端环境
用于下载 Hugging Face 模型并转换为 .rkllm
Ubuntu/Linux PCPython虚拟环境rknn-llm release-v1.2.3rkllm-toolkit 1.2.3
板端运行环境
rkllm-runtime 1.2.3rknpu driver v0.9.8
开发板上验证:
bash
sudo cat /sys/kernel/debug/rknpu/version
输出示例:
bash
RKNPU driver: v0.9.8
说明 NPU 驱动正常。
2、目录结构
PC 端
bash
./project-llm/
├── models/
│ ├── DeepSeek-R1-Distill-Qwen-1.5B
│ └── Qwen3-1.7B
└── rknn-llm/
├── examples/
│ └── rkllm_api_demo/
│ ├── deploy/
│ └── export/
└── scripts/
开发板端
bash
./my-project/
├── DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm
├── Qwen3-1.7B_W8A8_RK3588.rkllm
├── deepseekocr_w8a8_rk3588.rkllm
├── fix_freq_rk3588.sh
└── demo_Linux_aarch64/
├── llm_demo
└── lib/
└── librkllmrt.so
3、从零开始的部署流程
3.1、PC 端准备 rknn-llm
拉取新版(release-v1.2.3):
bash
git clone https://github.com/airockchip/rknn-llm.git
cd rknn-llm
git checkout release-v1.2.3
创建 Python 虚拟环境并安装依赖:
bash
python3.10 -m venv .venv-llm
source .venv-llm/bin/activate
python -m pip install --upgrade pip
pip install -r rkllm-toolkit/packages/requirements.txt
pip install rkllm-toolkit/packages/rkllm_toolkit-1.2.3-cp310-cp310-linux_x86_64.whl
注意:写成
pip install rkllm-toolkit/packages/requirements.txt会报错,正确写法必须带
-r
3.2、下载模型
1、git lfs
DeepSeek-R1-Distill-Qwen-1.5B
bash
cd project-llm/models
git lfs install
git clone https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Qwen3-1.7B
bash
cd project-llm/models
git lfs install
git clone https://huggingface.co/Qwen/Qwen3-1.7B
git lfs install在非 Git 仓库目录执行时,会出现
fatal: 不是 git 仓库但不影响 Git LFS 初始化和模型下载。
2、huggingface-cli
bash
cd project-llm/models
# 安装工具
pip install -U huggingface_hub
# 登获取登录Token
https://huggingface.co/settings/tokens
# 登录
hf auth login
# 下载
hf download google/gemma-3-1b-it --local-dir ./gemma-3-1b-it
3.3、生成量化数据
进入导出目录:
bash
cd project-llm/rknn-llm/examples/rkllm_api_demo/export
运行:
bash
python generate_data_quant.py -m project-llm/models/模型目录
例如:
bash
python generate_data_quant.py -m project-llm/models/Qwen3-1.7B
这会生成 data_quant.json。
3.4、修改 export_rkllm.py
重点在于:
modelpath = 'project-llm/models/Qwen3-1.7B'device='cpu'target_platform = "RK3588"
Qwen3-1.7B 可用示例
python
from rkllm.api import RKLLM
import os
os.environ['CUDA_VISIBLE_DEVICES']='0'
modelpath = 'project-llm/models/Qwen3-1.7B'
llm = RKLLM()
ret = llm.load_huggingface(
model=modelpath,
model_lora=None,
device='cpu',
dtype="float32",
custom_config=None,
load_weight=True
)
if ret != 0:
print('Load model failed!')
exit(ret)
dataset = "./data_quant.json"
qparams = None
target_platform = "RK3588"
optimization_level = 1
quantized_dtype = "W8A8"
quantized_algorithm = "normal"
num_npu_core = 3
ret = llm.build(
do_quantization=True,
optimization_level=optimization_level,
quantized_dtype=quantized_dtype,
quantized_algorithm=quantized_algorithm,
target_platform=target_platform,
num_npu_core=num_npu_core,
extra_qparams=qparams,
dataset=dataset,
hybrid_rate=0,
max_context=4096
)
if ret != 0:
print('Build model failed!')
exit(ret)
ret = llm.export_rkllm(f"./{os.path.basename(modelpath)}_{quantized_dtype}_{target_platform}.rkllm")
if ret != 0:
print('Export model failed!')
exit(ret)
3.5、导出模型
执行:
bash
python export_rkllm.py
Qwen3-1.7B 导出成功日志(关键部分)
bash
INFO: rkllm-toolkit version: 1.2.3
...
INFO: Setting token_id of eos to 151645
INFO: Setting token_id of bos to 151643
INFO: Setting add_bos_token to False
...
INFO: Model has been saved to ./Qwen3-1.7B_W8A8_RK3588.rkllm!
说明 Qwen3-1.7B 的转换成功。
3.6、编译板端 llm_demo
在
PC上交叉编译板端可执行的llm_demo
进入部署目录:
bash
cd project-llm/rknn-llm/examples/rkllm_api_demo/deploy
修改交叉编译器前缀:
bash
sed -i 's|^GCC_COMPILER_PATH=.*|GCC_COMPILER_PATH=/usr/bin/aarch64-linux-gnu|' build-linux.sh
chmod +x build-linux.sh
./build-linux.sh
编译成功后,会生成:
bash
install/demo_Linux_aarch64/
├── llm_demo
└── lib/
└── librkllmrt.so
3.7、将文件拷贝到设备
例如:使用scp 或直接u盘拷贝
bash
scp Qwen3-1.7B_W8A8_RK3588.rkllm mo@192.168.0.54:~/my-project/
scp -r install/demo_Linux_aarch64 mo@192.168.0.54:~/my-project/
scp project-llm/rknn-llm/scripts/fix_freq_rk3588.sh mo@192.168.0.54:~/my-project/
3.8、板端运行
进入:
bash
cd ~/my-project/demo_Linux_aarch64
export LD_LIBRARY_PATH=$PWD/lib:$LD_LIBRARY_PATH
export RKLLM_LOG_LEVEL=1
执行模型:
bash
./llm_demo ../Qwen3-1.7B_W8A8_RK3588.rkllm 256 1024
4、遇到的问题记录
4.1、PC 端导出 DeepSeek-R1-Distill-Qwen-1.5B 时进程被 "已杀死"
现象:
bash
INFO: rkllm-toolkit version: 1.2.3
已杀死
原因分析:
- 不是脚本错误
- 更像是 PC 内存不足(OOM)
- 之前用 CPU +
float32加载模型时比较吃内存 至少16GB - 换一台内存更大的主机后导出成功
4.2、板端执行 fix_freq_rk3588.sh 提示 "No such file or directory"
现象:
bash
sudo ./fix_freq_rk3588.sh
sudo: unable to execute ./fix_freq_rk3588.sh: No such file or directory
虽然 ls 能看到文件,但 file 显示:
bash
fix_freq_rk3588.sh: a /system/bin/sh script, ASCII text executable
并且首行为:
bash
#!/system/bin/sh
原因分析:
- 该脚本是按
Android环境写的 NanoPi M6上使用的是Linux/Debian/Ubuntu- 没有
/system/bin/sh
解决方法:
将首行改为:
bash
#!/bin/sh
或者直接:
bash
sudo sh ./fix_freq_rk3588.sh
4.3、模型输出乱码
现象:
- 板端能初始化成功
- 可以回答开头几句
- 随后大量刷
[PAD151935] - 最终总是跑满生成上限
日志类似:
bash
robot: 首先,总共有2[PAD151935][PAD151935]...
复现和官方 issue 高度一致。
导出日志中的关键警告
bash
WARNING: The bos token has two ids: 151646 and 151643, please ensure that the bos token ids in config.json and tokenizer_config.json are consistent!
INFO: Setting token_id of bos to 151646
INFO: Setting token_id of eos to 151643
INFO: Setting add_bos_token to True
INFO: Setting add_eos_token to False
原因分析:
- 该模型在
RKLLM v1.2.3下的token / BOS-EOS处理异常 - 属于已知兼容问题,不是操作错误
- 官方仓库也有类似
issue
结论:
DeepSeek-R1-Distill-Qwen-1.5B无法使用rkllm转换
4.4、NPU 内存分配失败
现象:
bash
E RKNN: failed to allocate handle, ret: -1, errno: 14, errstr: Bad address
E RKNN: failed to malloc npu memory, size: 1720451072, flags: 0x2
E rkllm: rkllm_init failed
原因分析:
- 之前的
llm_demo用了^Z挂起,而不是退出 - 旧模型进程未释放
NPU运行时资源 - 导致
Qwen3-1.7B初始化申请NPU内存失败
当时终端中显示:
bash
[1]+ 已停止 ./llm_demo ../deepseekocr_w8a8_rk3588.rkllm 512 1024
解决方法:
- 重启开发板
- 后续不要用
^Z挂起,直接Ctrl + C
重启后再次运行,Qwen3-1.7B 正常工作。
5、 查看NPU占用率
查看 NPU 占用率:
bash
sudo cat /sys/kernel/debug/rknpu/load
查看 NPU 当前频率:
bash
cat /sys/class/devfreq/fdab0000.npu/cur_freq
循环刷新查看:
bash
sudo watch -n 0.5 'echo "--- load ---"; cat /sys/kernel/debug/rknpu/load; echo "--- freq ---"; cat /sys/class/devfreq/fdab0000.npu/cur_freq'
6、生成参数建议
快速验证
bash
./llm_demo ../Qwen3-1.7B_W8A8_RK3588.rkllm 256 1024
更完整回答
bash
./llm_demo ../Qwen3-1.7B_W8A8_RK3588.rkllm 512 1024
更长上下文
bash
./llm_demo ../Qwen3-1.7B_W8A8_RK3588.rkllm 512 2048