【全网首发】Meta刚发布的Llama3测试体验

就在今天凌晨,万众期待的 Llama 3 就发布了。我一大早赶集似的就去申请Llama 3,申请也比较简单,问你姓名,地区和联系方式就这些,等了一会儿就通过了。而此次 Llama 3 只有8B8B-Instruct,70B70B-Instruct。据说还有其他的版本需要等到之后的时间安排才能发布。反正对我而言勉强都能跑,也大差不差。

Llama 3 库

我原本更新了Llama的Github库,但是据官方介绍,此次独立了一个新的Llama 3的库

代码运行

记得先更新transformers pip install transformers --upgrade

8B(Huggingface CUDA)

ini 复制代码
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda", // 官方的推荐 auto, 运行会提示错误让你选择,我选择cuda,测试 cpu 运行 70B巨慢
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
/**
输出:Arrr, shiver me timbers! Me name be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas! I be here to swab the decks with me trusty keyboard and answer yer questions, savvy? So hoist the colors and let's set sail fer a swashbucklin' good time, matey!
*/

DeepL翻译:啊,我的心在颤抖!我是聊天船长 七大洋上最卑鄙的海盗聊天机器人 我在这里用我可靠的键盘敲打甲板 回答你们的问题,明白吗?那就升起旗帜,让我们扬帆起航,享受一段海盗的美好时光,伙计们!

70B llama.cpp

推荐 MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF(huggingface)

huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --local-dir . --include 'Q2_Kgguf'

makefile 复制代码
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1713504125
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv   5:                          llama.block_count u32              = 80
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 17
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,128256]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q5_K:  481 tensors
llama_model_loader: - type q6_K:   81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 46.51 GiB (5.66 BPW)
llm_load_print_meta: general.name     = hub
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.28 MiB
llm_load_tensors:        CPU buffer size = 47628.36 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB, V (f16):   80.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   266.50 MiB
llama_new_context_with_model: graph nodes  = 2566
llama_new_context_with_model: graph splits = 1

system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 1024, n_keep = 0


Building a website can be done in 10 simple steps:
Step 1: Choose a Domain Name
Step 2: Register Your Domain Name
Step 3: Choose a Web Host
Step 4: Plan Your Website
Step 5: Design Your Website
Step 6: Build Your Website
Step 7: Add Content to Your Website
Step 8: Launch Your Website
Step 9: Test and Refine Your Website
Step 10: Maintain and Update Your Website

Here are some additional tips to help you create a successful website:

* Make sure your website is mobile-friendly, as more and more users are accessing the internet through their mobile devices.
* Use search engine optimization (SEO) techniques to improve your website's visibility in search engines.
* Use social media to promote your website and engage with your audience.
* Regularly update your website with fresh content to keep your audience engaged.
* Use analytics tools to track your website's performance and make data-driven decisions.
* Make sure your website is secure and protected from cyber threats.

Remember, creating a successful website takes time and effort, but with the right planning and execution, it can be a valuable asset for your business or personal brand.assistant

Here are some additional tips to help you create a successful website:

1. **Define your target audience**: Understand who your website is for and what they are looking for. This will help you create content and design that resonates with them.
2. **Keep it simple and intuitive**: Make it easy for users to navigate your website and find what they're looking for.
3. **Use high-quality images and graphics**: Visual content can help engage users and make your website more appealing.
4. **Write compelling and concise content**: Use clear and concise language to communicate your message and keep users engaged.
5. **Make it fast and responsive**: Ensure that your website loads quickly and is optimized for different devices and screen sizes.
6. **Use calls-to-action (CTAs)**: Encourage users to take action by using clear and prominent CTAs.
7. **Use analytics and tracking tools**: Monitor your website's performance and make data-driven decisions to improve it.
8. **Keep it up-to-date and fresh**: Regularly update your website with new content and features to keep users engaged and coming back.
9. **Make it secure**: Ensure that your website is secure and protected from cyber threats by using HTTPS and keeping your software up-to-date.
10. **Test and iterate**: Contin and ** to to that to to ** that that **** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **** **** ** ****
 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **** ** ** ** ** ** ** ** ** ** ** ** **** ** **** **** ** ** ** ** **** ** ******** ** **** ** **
** ** ** ** ** ** **** ** ****** ** ** ** ** ** ** ** **** ** ** **** ** **** **
 ** ** ** ** **** ** ** ** ** ** **** ** ** ********.
** ******** **** ** ** **** **** ** ** **** **** **** ** ** **.
.
 ** **** **

llama_print_timings:        load time =   16648.95 ms
llama_print_timings:      sample time =      37.87 ms /   691 runs   (    0.05 ms per token, 18248.56 tokens per second)
llama_print_timings: prompt eval time =    5782.04 ms /    17 tokens (  340.12 ms per token,     2.94 tokens per second)
llama_print_timings:        eval time =  683846.34 ms /   690 runs   (  991.08 ms per token,     1.01 tokens per second)
llama_print_timings:       total time =  690448.33 ms /   707 tokens

速度慢,我没有跑到底,但是感觉鲁棒性似乎和Mistral家的不太一样。

makefile 复制代码
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "上海是在一座" -n 256 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1713504870
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv   5:                          llama.block_count u32              = 80
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 17
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,128256]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q5_K:  481 tensors
llama_model_loader: - type q6_K:   81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 46.51 GiB (5.66 BPW)
llm_load_print_meta: general.name     = hub
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.28 MiB
llm_load_tensors:        CPU buffer size = 47628.36 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB, V (f16):   80.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   266.50 MiB
llama_new_context_with_model: graph nodes  = 2566
llama_new_context_with_model: graph splits = 1

system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 256, n_keep = 0


上海是在一座小岛上,整个小岛被一层柔和的绿色光芒所笼罩着,岛上遍布着各种奇特的生物和植物,整个小岛充满着浓郁的神秘色彩。


在小岛的中心有一座高大的古树,树干上刻着一些古老的符号,符号似乎在发出着微弱的光芒,照亮着周围的环境。


小岛的主人是一个名叫 "莉莉丝" 的少女,她有着一头银发和一双碧绿色的眼睛,她总是穿着一件白色的长裙,裙子上绣着一些精美的花纹。她拥有着神奇的力量,可以控制小岛上的各种生物和植物,整个小岛都是她魔法的世界。


莉莉丝是一个温柔、善良的人,她总是帮助小岛上的生物和来访的客人,她的魔法也总是为了帮助他人和保护小岛。


然而,小岛上也隐藏着一些秘

llama_print_timings:        load time =    1866.31 ms
llama_print_timings:      sample time =      13.83 ms /   233 runs   (    0.06 ms per token, 16847.43 tokens per second)
llama_print_timings: prompt eval time =    1577.67 ms /     4 tokens (  394.42 ms per token,     2.54 tokens per second)
llama_print_timings:        eval time =  229418.17 ms /   232 runs   (  988.87 ms per token,     1.01 tokens per second)
llama_print_timings:       total time =  231358.70 ms /   236 tokens

这里我打错了,但是Llama 3非常Nice的,编撰了一段小说剧情,对"魔都"挺有想象力的。

makefile 复制代码
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "上海是一座" -n 256 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1713505108
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv   5:                          llama.block_count u32              = 80
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 17
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,128256]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128001
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  161 tensors
llama_model_loader: - type q5_K:  481 tensors
llama_model_loader: - type q6_K:   81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q5_K - Medium
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 46.51 GiB (5.66 BPW)
llm_load_print_meta: general.name     = hub
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.28 MiB
llm_load_tensors:        CPU buffer size = 47628.36 MiB
...................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB, V (f16):   80.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.49 MiB
llama_new_context_with_model:        CPU compute buffer size =   266.50 MiB
llama_new_context_with_model: graph nodes  = 2566
llama_new_context_with_model: graph splits = 1

system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 256, n_keep = 0


上海是一座城市的名称)
* The city of Shanghai is a metropolis. (上海是一个大都市)
* The city of Shanghai is situated at the mouth of the Yangtze River. (上海位于长江口)
* Shanghai is a global financial center. (上海是一个全球金融中心)
* Shanghai is a city with a rich cultural heritage. (上海是一个文化遗产丰富的城市)

Note that in English, it's common to use "Shanghai" as a standalone noun to refer to the city, whereas in Chinese, "上海" is typically used with a preceding noun or phrase to clarify what is being referred to.

In Chinese, when referring to a specific aspect of the city, you would typically use a phrase like "上海市" (Shanghai city) or "上海都市" (Shanghai metropolis). For example:

* 上海市是中国最大的城市。 (Shanghai city is the largest city in China.)
* 上海都市的经济发展很快。 (The economy of Shanghai metropolis is developing very quickly.)

However, when referring to the city in a more general sense, it's common to simply use "上海" on its own. For example:

* 我喜欢上海。 (I like Shanghai.)
* 上海是一个很
llama_print_timings:        load time =    1894.65 ms
llama_print_timings:      sample time =      14.08 ms /   256 runs   (    0.06 ms per token, 18180.53 tokens per second)
llama_print_timings: prompt eval time =    1308.29 ms /     3 tokens (  436.10 ms per token,     2.29 tokens per second)
llama_print_timings:        eval time =  251453.62 ms /   255 runs   (  986.09 ms per token,     1.01 tokens per second)
llama_print_timings:       total time =  252938.58 ms /   258 tokens

啊?中英混合生成。他可是支持30多种语言的,这岂不是说比肩专业翻译。具体的估计仍需要同行来实测给出Benchmark,但我觉得 Llama 3 绝对会给开源 AI 注入一剂"内啡肽",让我们拭目以待!

相关推荐
HPC_fac1305206781644 分钟前
以科学计算为切入点:剖析英伟达服务器过热难题
服务器·人工智能·深度学习·机器学习·计算机视觉·数据挖掘·gpu算力
小陈phd3 小时前
OpenCV从入门到精通实战(九)——基于dlib的疲劳监测 ear计算
人工智能·opencv·计算机视觉
Guofu_Liao4 小时前
大语言模型---LoRA简介;LoRA的优势;LoRA训练步骤;总结
人工智能·语言模型·自然语言处理·矩阵·llama
gz7seven6 小时前
BLIP-2模型的详解与思考
大模型·llm·多模态·blip·多模态大模型·blip-2·q-former
ZHOU_WUYI8 小时前
3.langchain中的prompt模板 (few shot examples in chat models)
人工智能·langchain·prompt
如若1238 小时前
主要用于图像的颜色提取、替换以及区域修改
人工智能·opencv·计算机视觉
老艾的AI世界9 小时前
AI翻唱神器,一键用你喜欢的歌手翻唱他人的曲目(附下载链接)
人工智能·深度学习·神经网络·机器学习·ai·ai翻唱·ai唱歌·ai歌曲
DK221519 小时前
机器学习系列----关联分析
人工智能·机器学习
Robot2519 小时前
Figure 02迎重大升级!!人形机器人独角兽[Figure AI]商业化加速
人工智能·机器人·微信公众平台
浊酒南街10 小时前
Statsmodels之OLS回归
人工智能·数据挖掘·回归