
就在今天凌晨,万众期待的 Llama 3 就发布了。我一大早赶集似的就去申请Llama 3,申请也比较简单,问你姓名,地区和联系方式就这些,等了一会儿就通过了。而此次 Llama 3 只有8B8B-Instruct,70B70B-Instruct。据说还有其他的版本需要等到之后的时间安排才能发布。反正对我而言勉强都能跑,也大差不差。

Llama 3 库

我原本更新了Llama的Github库,但是据官方介绍,此次独立了一个新的Llama 3的库


记得先更新transformers pip install transformers --upgrade

8B(Huggingface CUDA)

ini 复制代码
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda", // 官方的推荐 auto, 运行会提示错误让你选择,我选择cuda,测试 cpu 运行 70B巨慢

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},

prompt = pipeline.tokenizer.apply_chat_template(

terminators = [

outputs = pipeline(
输出:Arrr, shiver me timbers! Me name be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas! I be here to swab the decks with me trusty keyboard and answer yer questions, savvy? So hoist the colors and let's set sail fer a swashbucklin' good time, matey!

DeepL翻译:啊,我的心在颤抖!我是聊天船长 七大洋上最卑鄙的海盗聊天机器人 我在这里用我可靠的键盘敲打甲板 回答你们的问题,明白吗?那就升起旗帜,让我们扬帆起航,享受一段海盗的美好时光,伙计们!

70B llama.cpp

推荐 MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF(huggingface)

huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --local-dir . --include 'Q2_Kgguf'

makefile 复制代码
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 1024, n_keep = 0

Building a website can be done in 10 simple steps:
Step 1: Choose a Domain Name
Step 2: Register Your Domain Name
Step 3: Choose a Web Host
Step 4: Plan Your Website
Step 5: Design Your Website
Step 6: Build Your Website
Step 7: Add Content to Your Website
Step 8: Launch Your Website
Step 9: Test and Refine Your Website
Step 10: Maintain and Update Your Website

Here are some additional tips to help you create a successful website:

* Make sure your website is mobile-friendly, as more and more users are accessing the internet through their mobile devices.
* Use search engine optimization (SEO) techniques to improve your website's visibility in search engines.
* Use social media to promote your website and engage with your audience.
* Regularly update your website with fresh content to keep your audience engaged.
* Use analytics tools to track your website's performance and make data-driven decisions.
* Make sure your website is secure and protected from cyber threats.

Remember, creating a successful website takes time and effort, but with the right planning and execution, it can be a valuable asset for your business or personal brand.assistant

Here are some additional tips to help you create a successful website:

1. **Define your target audience**: Understand who your website is for and what they are looking for. This will help you create content and design that resonates with them.
2. **Keep it simple and intuitive**: Make it easy for users to navigate your website and find what they're looking for.
3. **Use high-quality images and graphics**: Visual content can help engage users and make your website more appealing.
4. **Write compelling and concise content**: Use clear and concise language to communicate your message and keep users engaged.
5. **Make it fast and responsive**: Ensure that your website loads quickly and is optimized for different devices and screen sizes.
6. **Use calls-to-action (CTAs)**: Encourage users to take action by using clear and prominent CTAs.
7. **Use analytics and tracking tools**: Monitor your website's performance and make data-driven decisions to improve it.
8. **Keep it up-to-date and fresh**: Regularly update your website with new content and features to keep users engaged and coming back.
9. **Make it secure**: Ensure that your website is secure and protected from cyber threats by using HTTPS and keeping your software up-to-date.
llama_print_timings:        load time =   16648.95 ms
llama_print_timings:      sample time =      37.87 ms /   691 runs   (    0.05 ms per token, 18248.56 tokens per second)
llama_print_timings: prompt eval time =    5782.04 ms /    17 tokens (  340.12 ms per token,     2.94 tokens per second)
llama_print_timings:        eval time =  683846.34 ms /   690 runs   (  991.08 ms per token,     1.01 tokens per second)
llama_print_timings:       total time =  690448.33 ms /   707 tokens


makefile 复制代码
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "上海是在一座" -n 256 -e
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 256, n_keep = 0



小岛的主人是一个名叫 "莉莉丝" 的少女,她有着一头银发和一双碧绿色的眼睛,她总是穿着一件白色的长裙,裙子上绣着一些精美的花纹。她拥有着神奇的力量,可以控制小岛上的各种生物和植物,整个小岛都是她魔法的世界。



llama_print_timings:        load time =    1866.31 ms
llama_print_timings:      sample time =      13.83 ms /   233 runs   (    0.06 ms per token, 16847.43 tokens per second)
llama_print_timings: prompt eval time =    1577.67 ms /     4 tokens (  394.42 ms per token,     2.54 tokens per second)
llama_print_timings:        eval time =  229418.17 ms /   232 runs   (  988.87 ms per token,     1.01 tokens per second)
llama_print_timings:       total time =  231358.70 ms /   236 tokens

这里我打错了,但是Llama 3非常Nice的,编撰了一段小说剧情,对"魔都"挺有想象力的。

makefile 复制代码
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "上海是一座" -n 256 -e
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 256, n_keep = 0

* The city of Shanghai is a metropolis. (上海是一个大都市)
* The city of Shanghai is situated at the mouth of the Yangtze River. (上海位于长江口)
* Shanghai is a global financial center. (上海是一个全球金融中心)
* Shanghai is a city with a rich cultural heritage. (上海是一个文化遗产丰富的城市)

Note that in English, it's common to use "Shanghai" as a standalone noun to refer to the city, whereas in Chinese, "上海" is typically used with a preceding noun or phrase to clarify what is being referred to.

In Chinese, when referring to a specific aspect of the city, you would typically use a phrase like "上海市" (Shanghai city) or "上海都市" (Shanghai metropolis). For example:

* 上海市是中国最大的城市。 (Shanghai city is the largest city in China.)
* 上海都市的经济发展很快。 (The economy of Shanghai metropolis is developing very quickly.)

However, when referring to the city in a more general sense, it's common to simply use "上海" on its own. For example:

* 我喜欢上海。 (I like Shanghai.)
* 上海是一个很
llama_print_timings:        load time =    1894.65 ms
llama_print_timings:      sample time =      14.08 ms /   256 runs   (    0.06 ms per token, 18180.53 tokens per second)
llama_print_timings: prompt eval time =    1308.29 ms /     3 tokens (  436.10 ms per token,     2.29 tokens per second)
llama_print_timings:        eval time =  251453.62 ms /   255 runs   (  986.09 ms per token,     1.01 tokens per second)
llama_print_timings:       total time =  252938.58 ms /   258 tokens

啊?中英混合生成。他可是支持30多种语言的,这岂不是说比肩专业翻译。具体的估计仍需要同行来实测给出Benchmark,但我觉得 Llama 3 绝对会给开源 AI 注入一剂"内啡肽",让我们拭目以待!

