Meta Llama 3本地部署

感谢阅读

环境安装

项目文件

下载完后在根目录进入命令终端(windows下cmd、linux下终端、conda的话activate)

运行

python 复制代码
pip install -e .

不要控制台,因为还要下载模型。这里挂着是节省时间

模型申请链接

复制如图所示的链接

然后在刚才的控制台

python 复制代码
bash download.sh

在验证哪里直接输入刚才链接即可

如果报错没有wget,则点我下载wget

然后放到C:\Windows\System32 下

python 复制代码
torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir Meta-Llama-3-8B-Instruct/ \
    --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \
    --max_seq_len 512 --max_batch_size 6

收尾

创建chat.py脚本

python 复制代码
# Copyright (c) Meta Platforms, Inc. and affiliates.
# This software may be used and distributed in accordance with the terms of the Llama 3 Community License Agreement.

from typing import List, Optional

import fire

from llama import Dialog, Llama


def main(
    ckpt_dir: str,
    tokenizer_path: str,
    temperature: float = 0.6,
    top_p: float = 0.9,
    max_seq_len: int = 512,
    max_batch_size: int = 4,
    max_gen_len: Optional[int] = None,
):
    """
    Examples to run with the models finetuned for chat. Prompts correspond of chat
    turns between the user and assistant with the final one always being the user.

    An optional system prompt at the beginning to control how the model should respond
    is also supported.

    The context window of llama3 models is 8192 tokens, so `max_seq_len` needs to be <= 8192.

    `max_gen_len` is optional because finetuned models are able to stop generations naturally.
    """
    generator = Llama.build(
        ckpt_dir=ckpt_dir,
        tokenizer_path=tokenizer_path,
        max_seq_len=max_seq_len,
        max_batch_size=max_batch_size,
    )

    # Modify the dialogs list to only include user inputs
    dialogs: List[Dialog] = [
        [{"role": "user", "content": ""}],  # Initialize with an empty user input
    ]

    # Start the conversation loop
    while True:
        # Get user input
        user_input = input("You: ")
        
        # Exit loop if user inputs 'exit'
        if user_input.lower() == 'exit':
            break
        
        # Append user input to the dialogs list
        dialogs[0][0]["content"] = user_input

        # Use the generator to get model response
        result = generator.chat_completion(
            dialogs,
            max_gen_len=max_gen_len,
            temperature=temperature,
            top_p=top_p,
        )[0]

        # Print model response
        print(f"Model: {result['generation']['content']}")

if __name__ == "__main__":
    fire.Fire(main)

然后运行

python 复制代码
torchrun --nproc_per_node 1 chat.py     --ckpt_dir Meta-Llama-3-8B-Instruct/     --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model     --max_seq_len 512 --max_batch_size 6
相关推荐
木卫二号Coding7 小时前
第七十九篇-E5-2680V4+V100-32G+llama-cpp编译运行+Qwen3-Next-80B
linux·llama
lili-felicity8 小时前
CANN优化LLaMA大语言模型推理:KV-Cache与FlashAttention深度实践
人工智能·语言模型·llama
大傻^2 天前
大模型基于llama.cpp量化详解
llama·大模型量化
大傻^2 天前
大模型微调-基于llama-factory详解
llama·模型微调
空中楼阁,梦幻泡影2 天前
主流4 大模型(GPT、LLaMA、DeepSeek、QWE)的训练与推理算力估算实例详细数据
人工智能·gpt·llama
蓝田生玉1232 天前
LLaMA论文阅读笔记
论文阅读·笔记·llama
木卫二号Coding2 天前
第七十七篇-V100+llama-cpp-python-server+Qwen3-30B+GGUF
开发语言·python·llama
木卫二号Coding2 天前
第七十六篇-V100+llama-cpp-python+Qwen3-30B+GGUF
开发语言·python·llama
姚华军3 天前
在本地(Windows环境)部署LLaMa-Factory,进行模型微调步骤!!!
windows·ai·llama·llama-factory
Honmaple3 天前
openclaw使用llama.cpp 本地大模型部署教程
llama