MLC-LLM 是一种高性能通用部署解决方案,允许使用具有编译器加速功能的本机 API 来本机部署任何大型语言模型。该项目的使命是让每个人都能利用机器学习编译技术在每个人的设备上本地开发、优化和部署人工智能模型。
AMD GPU | NVIDIA GPU | Apple GPU | Intel GPU | |
---|---|---|---|---|
Linux / Win | ✅ Vulkan, ROCm | ✅ Vulkan, CUDA | N/A | ✅ Vulkan |
macOS | ✅ Metal (dGPU) | N/A | ✅ Metal | ✅ Metal (iGPU) |
Web Browser | ✅ WebGPU and WASM | |||
iOS / iPadOS | ✅ Metal on Apple A-series GPU | |||
Android | ✅ OpenCL on Adreno GPU | ✅ OpenCL on Mali GPU |
支持的模型列表:
环境安装
lua
conda create --name mlc python=3.11
conda activate mlc
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-chat-nightly-cu121 mlc-ai-nightly-cu121
python -c "import mlc_chat; print(mlc_chat)"
模型转换
模型转换分为两步:
- 转换模型权重
- 生成mlc chat 的配置
下面以qwen模型为例,暂时不支持qwen2
转换模型权重
bash
mlc_chat convert_weight /home/chuan/models/qwen/Qwen-7B-Chat \
--quantization q4f16_1 \
-o /home/chuan/models/qwen/Qwen-7B-Chat/mlc
参数列表
--CONFIG
It can be one of the following:
- Path to a HuggingFace model directory that contains a
config.json
or - Path to
config.json
in HuggingFace format, or - The name of a pre-defined model architecture.
--quantization QUANTIZATION_MODE
可选项 q0f16
, q0f32
, q3f16_1
, q4f16_1
, q4f32_1
, and q4f16_awq
.推荐使用q4f16_1
--model-type MODEL_TYPE
Model architecture such as "llama". If not set, it is inferred from config.json
--device DEVICE
The device used to do quantization such as "cuda" or "cuda:0". Will detect from local available GPUs if not specified.
--source SOURCE
The path to original model weight, infer from config
if missing.
--source-format SOURCE_FORMAT
The format of source model weight, infer from config
if missing.
--output OUTPUT
The output directory to save the quantized model weight. Will create params_shard_*.bin
and ndarray-cache.json
in this directory.
生成mlc chat 的配置
arduino
mlc_chat gen_config /home/chuan/models/qwen/Qwen-7B-Chat \
--quantization q4f16_1 --conv-template chatml \
-o /home/chuan/models/qwen/Qwen-7B-Chat/mlc
注意
conv-template的值参照github.com/mlc-ai/mlc-...,
如果不包含你的模型也可以自定义,但是要从源代码重新编译mlc
gen_config的参数列表
--CONFIG
It can be one of the following:
- Path to a HuggingFace model directory that contains a
config.json
or - Path to
config.json
in HuggingFace format, or - The name of a pre-defined model architecture.
--quantization QUANTIZATION_MODE
--model-type MODEL_TYPE
--conv-template CONV_TEMPLATE
--context-window-size CONTEXT_WINDOW_SIZE
最大句子长度
--output OUTPUT
其他不重要的参数没有列出来
运行mlc
python
from mlc_chat import ChatModule
cm = ChatModule(model="/home/chuan/models/qwen/Qwen1___5-7B-Chat/mlc")
print(cm.generate("hello"))
把qwen模型编译成android app
我已经编译好了一个版本的app, 欢迎下载试用,需要科学上网
- 首先安装android studio,下载ndk、cmake,如图所示:
- 设置环境变量
bash
export ANDROID_NDK=/home/chuan/Android/Sdk/ndk/26.2.11394342
export TVM_NDK_CC=/home/chuan/Android/Sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android34-clang
export TVM_HOME=/home/chuan/github/mlc-llm/3rdparty/tvm
export JAVA_HOME=/home/chuan/tools/jdk-17.0.10
android 的api级别对应表,因为我的手机是android 14 ,因此选择 34
developer.android.com/guide/topic...
- 下载mlc ,并编译模型
bash
git clone --recursive https://github.com/mlc-ai/mlc-llm/
cd ./mlc-llm/
MODEL_NAME=/home/chuan/models/qwen/Qwen-7B-Chat
QUANTIZATION=q4f16_1
mlc_chat convert_weight $MODEL_NAME --quantization $QUANTIZATION -o $MODEL_NAME/mlc
mlc_chat gen_config $MODEL_NAME --quantization $QUANTIZATION \
--conv-template chatml --context-window-size 768 -o $MODEL_NAME/mlc
mlc_chat compile $MODEL_NAME/mlc/mlc-chat-config.json \
--device android -o $MODEL_NAME/mlc/Qwen-7B-Chat-${QUANTIZATION}-android.tar
- 将模型上传到huggingface
bash
git clone https://huggingface.co/chuan-niy/qwen_q4f16_1
git config user.name chuan-niy
git config user.email 1500546481@qq.com
cd qwen_q4f16_1
cp /home/chuan/models/qwen/Qwen-7B-Chat/mlc/* ./
git add . && git commit -m "Add qwen model weights for android"
git push origin main
- 编译android 库
bash
cd ./android/library
vim ./src/main/assets/app-config.json
json
{
"model_list": [
{
"model_url": "https://huggingface.co/chuan-niy/qwen_q4f16_1",
"model_lib": "qwen_q4f16_1",
"estimated_vram_bytes": 4348727787,
"model_id": "Qwen-7B-Chat-hf-q4f16_1"
}
],
"model_lib_path_for_prepare_libs": {
"Qwen-7B-Chat-hf-q4f16_1": "/home/chuan/models/qwen/Qwen-7B-Chat/mlc/Qwen-7B-Chat-q4f16_1-android.tar"
}
}
在CMakeLists.txt中添加以下信息
vi CMakeLists.txt
bash
set(JAVA_AWT_LIBRARY "/home/chuan/tools/jdk-17.0.10/include/linux")
set(JAVA_JVM_LIBRARY "/home/chuan/tools/jdk-17.0.10/include/linux")
set(JAVA_INCLUDE_PATH "/home/chuan/tools/jdk-17.0.10/include")
set(JAVA_INCLUDE_PATH2 "/home/chuan/tools/jdk-17.0.10/include/linux")
set(JAVA_AWT_INCLUDE_PATH "/home/chuan/tools/jdk-17.0.10/include")
需要修改#ifdef TVM4J_ANDROID代码的地方
vi mlc-llm/3rdparty/tvm/jvm/native/src/main/native/org_apache_tvm_native_c_api.cc
arduino
#ifdef TVM4J_ANDROID
_jvm->AttachCurrentThread(reinterpret_cast<void**>(&env), nullptr);
#else
_jvm->AttachCurrentThread(reinterpret_cast<void**>(&env), nullptr);
最后运行编译
bash
./prepare_libs.sh
- 接着需要用android studio 打开android项目,连接手机调试,然后运行
在手机上运行结果如下: