Chinese-llama-2部署踩坑记录

Chinese-llama-2部署踩坑记录

  • [1. Chinese-LLaMA-Alpaca-2](#1. Chinese-LLaMA-Alpaca-2)
    • [A. 部署](#A. 部署)
      • [a. inference_with_transformers_zh](#a. inference_with_transformers_zh)
      • [b. text generation webui_zh](#b. text generation webui_zh)
      • [c. api_calls_zh](#c. api_calls_zh)
      • [d. llamacpp_zh](#d. llamacpp_zh)
      • [e. privategpt_zh](#e. privategpt_zh)
      • [f. langchain_zh](#f. langchain_zh)
  • [Tool Github](#Tool Github)

1. Chinese-LLaMA-Alpaca-2

A. 部署

a. inference_with_transformers_zh

本地命令行方式交互

复制代码
python scripts/inference/inference_hf.py --base_model meta-llama/Llama-2-7b-chat-hf --with_prompt --interactive --load_in_8bit

报错:

复制代码
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

报错原因:transformers版本不兼容导致 报错版本:transformers版本为4.36.2(在有的环境中不会报错)

transformers的版本改为4.35.0则不会报错(但是随之datasets的版本也要降低)

原生的接口推理速度较慢

Web图形界面可以在localhost:xxx端口运行

通过gradio构建

复制代码
pip install gradio
pip install mdtex2html

完整模型加载:

复制代码
python scripts/inference/gradio_demo.py --base_model meta-llama/Llama-2-7b-chat-hf 

冲突:gradio最新版本4.13.0版本依赖的huggingface-hub与tokenizer 0.14.1依赖的huggingface-hub有冲突

加载时报错:

复制代码
AttributeError: 'Textbox' object has no attribute 'style'

冲突和报错的解决办法:降低gradio的版本
pip uninstall gradio
pip install gradio==3.50.0
pip默认安装最新版本,降低版本即可。

b. text generation webui_zh

复制代码
./start_linux.sh

由于python版本为3.11.5 而 yaml最多支持到3.8,所以安装失败

要尝试手动安装大概

GitHub上有相关issue

c. api_calls_zh

通过fastapi构建类chatgpt api的效果,可以快速部署

复制代码
pip install fastapi uvicorn shortuuid sse_starlette

python scripts/openai_server_demo/openai_api_server.py --base_model meta-llama/Llama-2-7b-chat-hf --gpus 0

GPUS设置为多个时可能会报错

复制代码
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

改为单个GPU即可

发送简单的请求:

复制代码
curl http://localhost:19327/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "告诉我中国的首都在哪里"
  }'

回复:

复制代码
{"id":"cmpl-NoAwmqpY9WrdybiAQHTjyr","object":"text_completion","created":1704452701,"model":"chinese-llama-alpaca-2","choices":[{"index":0,"text":"Ah, a great question! China's capital city is none other than Beijing (北京). Located in the northern part of the country, Beijing has a rich history and culture that spans over 3,000 years. It's known for its iconic landmarks such as the Great Wall of China, the Forbidden City, and the Temple of Heaven. Today, Beijing is a bustling metropolis with a vibrant economy, modern infrastructure, and a blend of traditional and contemporary cultures."}]}

d. llamacpp_zh

llamacpp可以使得模型在CPU上进行较快速的推理

复制项目

复制代码
git clone https://github.com/ggerganov/llama.cpp

make编译

复制代码
make

该命令可能运行不了,但是不影响推理结果(会影响推理速度)

复制代码
make LLAMA_CUBLAS=1

将.bin或者.pth的完整模型权重转换为GGML的FP16格式 (不能是软连接)

复制代码
python convert.py ../llama_from_hf/chinese-alpaca-2-7b/

output:
Wrote ../llama_from_hf/chinese-alpaca-2-7b/ggml-model-f16.gguf

将转换好的FP16格式的gguf量化为4bit的

复制代码
./quantize ../llama_from_hf/chinese-alpaca-2-7b/ggml-model-f16.gguf ../llama_from_hf/chinese-alpaca-2-7b/ggml-model-q4_0.gguf q4_0

本地使用:

chmod更改文件权限 -x为chat.sh添加可执行权限

复制代码
chmod +x chat.sh
./chat.sh ../llama_from_hf/chinese-alpaca-2-7b/ggml-model-f16.gguf '请列举5条文明乘车的建议'

服务器架设

复制代码
./server -m ../llama_from_hf/chinese-alpaca-2-7b/ggml-model-q4_0.gguf -c 4096 -ngl 1

简单的调用(脚本在Chinese-llama-alpaca-2中)

复制代码
cd scripts/llamacpp

bash server_curl_example.sh

response:
{"content":" 好的,那我来为您介绍一些有关机器学习的基本概念和应用。\n\n**1.** 什么是机器学习?\n\n机器学习是人工智能的一个分支,通过让计算机自动学习数据并改进其性能来实现目标任务的方法。它使用统计学、模式识别以及优化算法等方法来自动发现数据之间的规律性和关联性。机器学习的目标是开发能够从经验中学习的模型或算法,以便在没有明确编程的情况下自主做出决策和预测结果。\n\n**2.** 常见的应用场景是什么?\n\n1. 图像识别:如人脸识别、车辆检测等;\n2.","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"../llama_from_hf/chinese-alpaca-2-7b/ggml-model-q4_0.gguf","n_ctx":4096,"n_keep":0,"n_predict":128,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"seed":4294967295,"stop":[],"stream":false,"temperature":0.800000011920929,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"model":"../llama_from_hf/chinese-alpaca-2-7b/ggml-model-q4_0.gguf","prompt":"[INST] <<SYS>>\nYou are a helpful assistant. 你是一个乐于助人的助手。\n<</SYS>>\n\n [/INST]","slot_id":0,"stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":8846.486,"predicted_n":128,"predicted_per_second":14.469021937071961,"predicted_per_token_ms":69.113171875,"prompt_ms":1689.315,"prompt_n":35,"prompt_per_second":20.71845688933088,"prompt_per_token_ms":48.26614285714286},"tokens_cached":163,"tokens_evaluated":35,"tokens_predicted":128,"truncated":false}

e. privategpt_zh

在本地部署私人GPT

pip 无法直接安装python 3.11.x版本

复制代码
解决办法:从conda-forge中可以找到
conda create -n gpt11 python=3.11 -c conda-forge

f. langchain_zh

当前主要集成了QA和检索式回答两个功能

环境准备:

复制代码
pip install langchain==0.0.351
pip install sentence_transformers==2.2.2
pip install pydantic==1.10.13
pip install faiss-gpu==1.7.2

text2vec-large-chinese 和 chinese-alpaca-2-7b 都需要在huggingface上下载

复制代码
cd scripts/langchain
python langchain_qa.py \
  --embedding_path GanymedeNil/text2vec-large-chinese \
  --model_path hfl/chinese-alpaca-2-7b \
  --file_path doc.txt \
  --chain_type refine

Tool Github

ymcui/Chinese-LLaMA-Alpaca-2

langchain-ai/langchain

vllm-project/vllm

imartinez/privateGPT

ggerganov/llama.cpp

oobabooga/text-generation-webui

相关推荐
不是吧这都有重名16 小时前
[论文阅读]Deeply-Supervised Nets
论文阅读·人工智能·算法·大语言模型
亚里随笔4 天前
StreamRL:弹性、可扩展、异构的RLHF架构
人工智能·架构·大语言模型·rlhf·推理加速
deephub11 天前
加速LLM大模型推理,KV缓存技术详解与PyTorch实现
人工智能·pytorch·深度学习·缓存·大语言模型
苹果二14 天前
【学习资源】知识图谱与大语言模型融合
知识图谱·大语言模型
deephub14 天前
SecMulti-RAG:兼顾数据安全与智能检索的多源RAG框架,为企业构建不泄密的智能搜索引擎
人工智能·深度学习·大语言模型·rag·智能检索
cxr82814 天前
基于Playwright的浏览器自动化MCP服务
人工智能·自动化·大语言模型·mcp
维度攻城狮15 天前
通过DeepSeek大语言模型控制panda机械臂,听懂人话,拟人性回答。智能机械臂助手又进一步啦
大语言模型·控制·ros2·moveit·deepseek·rviz2
changzz200815 天前
低配置电脑预训练minimind的实践
llm·大语言模型·minimind·低配置
cxr82816 天前
微调灾情分析报告生成模型
人工智能·大语言模型·模型微调
deephub16 天前
CLIMB自举框架:基于语义聚类的迭代数据混合优化及其在LLM预训练中的应用
人工智能·深度学习·大语言模型·聚类