就在今天凌晨,万众期待的 Llama 3 就发布了。我一大早赶集似的就去申请Llama 3,申请也比较简单,问你姓名,地区和联系方式就这些,等了一会儿就通过了。而此次 Llama 3 只有8B
,8B-Instruct
,70B
和 70B-Instruct
。据说还有其他的版本需要等到之后的时间安排才能发布。反正对我而言勉强都能跑,也大差不差。
Llama 3 库
我原本更新了Llama的Github库,但是据官方介绍,此次独立了一个新的Llama 3的库
代码运行
记得先更新transformers pip install transformers --upgrade
8B(Huggingface CUDA)
ini
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", // 官方的推荐 auto, 运行会提示错误让你选择,我选择cuda,测试 cpu 运行 70B巨慢
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
/**
输出:Arrr, shiver me timbers! Me name be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas! I be here to swab the decks with me trusty keyboard and answer yer questions, savvy? So hoist the colors and let's set sail fer a swashbucklin' good time, matey!
*/
DeepL翻译:啊,我的心在颤抖!我是聊天船长 七大洋上最卑鄙的海盗聊天机器人 我在这里用我可靠的键盘敲打甲板 回答你们的问题,明白吗?那就升起旗帜,让我们扬帆起航,享受一段海盗的美好时光,伙计们!
70B llama.cpp
推荐 MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF(huggingface)
huggingface-cli download MaziyarPanahi/Meta-Llama-3-70B-Instruct-GGUF --local-dir . --include 'Q2_Kgguf'
makefile
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed = 1713504125
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = hub
llama_model_loader: - kv 2: llama.vocab_size u32 = 128256
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 8192
llama_model_loader: - kv 5: llama.block_count u32 = 80
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 8: llama.attention.head_count u32 = 64
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 12: general.file_type u32 = 17
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 161 tensors
llama_model_loader: - type q5_K: 481 tensors
llama_model_loader: - type q6_K: 81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 46.51 GiB (5.66 BPW)
llm_load_print_meta: general.name = hub
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.28 MiB
llm_load_tensors: CPU buffer size = 47628.36 MiB
...................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 160.00 MiB
llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 266.50 MiB
llama_new_context_with_model: graph nodes = 2566
llama_new_context_with_model: graph splits = 1
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 1024, n_keep = 0
Building a website can be done in 10 simple steps:
Step 1: Choose a Domain Name
Step 2: Register Your Domain Name
Step 3: Choose a Web Host
Step 4: Plan Your Website
Step 5: Design Your Website
Step 6: Build Your Website
Step 7: Add Content to Your Website
Step 8: Launch Your Website
Step 9: Test and Refine Your Website
Step 10: Maintain and Update Your Website
Here are some additional tips to help you create a successful website:
* Make sure your website is mobile-friendly, as more and more users are accessing the internet through their mobile devices.
* Use search engine optimization (SEO) techniques to improve your website's visibility in search engines.
* Use social media to promote your website and engage with your audience.
* Regularly update your website with fresh content to keep your audience engaged.
* Use analytics tools to track your website's performance and make data-driven decisions.
* Make sure your website is secure and protected from cyber threats.
Remember, creating a successful website takes time and effort, but with the right planning and execution, it can be a valuable asset for your business or personal brand.assistant
Here are some additional tips to help you create a successful website:
1. **Define your target audience**: Understand who your website is for and what they are looking for. This will help you create content and design that resonates with them.
2. **Keep it simple and intuitive**: Make it easy for users to navigate your website and find what they're looking for.
3. **Use high-quality images and graphics**: Visual content can help engage users and make your website more appealing.
4. **Write compelling and concise content**: Use clear and concise language to communicate your message and keep users engaged.
5. **Make it fast and responsive**: Ensure that your website loads quickly and is optimized for different devices and screen sizes.
6. **Use calls-to-action (CTAs)**: Encourage users to take action by using clear and prominent CTAs.
7. **Use analytics and tracking tools**: Monitor your website's performance and make data-driven decisions to improve it.
8. **Keep it up-to-date and fresh**: Regularly update your website with new content and features to keep users engaged and coming back.
9. **Make it secure**: Ensure that your website is secure and protected from cyber threats by using HTTPS and keeping your software up-to-date.
10. **Test and iterate**: Contin and ** to to that to to ** that that **** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **** **** ** ****
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **** ** ** ** ** ** ** ** ** ** ** ** **** ** **** **** ** ** ** ** **** ** ******** ** **** ** **
** ** ** ** ** ** **** ** ****** ** ** ** ** ** ** ** **** ** ** **** ** **** **
** ** ** ** **** ** ** ** ** ** **** ** ** ********.
** ******** **** ** ** **** **** ** ** **** **** **** ** ** **.
.
** **** **
llama_print_timings: load time = 16648.95 ms
llama_print_timings: sample time = 37.87 ms / 691 runs ( 0.05 ms per token, 18248.56 tokens per second)
llama_print_timings: prompt eval time = 5782.04 ms / 17 tokens ( 340.12 ms per token, 2.94 tokens per second)
llama_print_timings: eval time = 683846.34 ms / 690 runs ( 991.08 ms per token, 1.01 tokens per second)
llama_print_timings: total time = 690448.33 ms / 707 tokens
速度慢,我没有跑到底,但是感觉鲁棒性似乎和Mistral家的不太一样。
makefile
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "上海是在一座" -n 256 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed = 1713504870
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = hub
llama_model_loader: - kv 2: llama.vocab_size u32 = 128256
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 8192
llama_model_loader: - kv 5: llama.block_count u32 = 80
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 8: llama.attention.head_count u32 = 64
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 12: general.file_type u32 = 17
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 161 tensors
llama_model_loader: - type q5_K: 481 tensors
llama_model_loader: - type q6_K: 81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 46.51 GiB (5.66 BPW)
llm_load_print_meta: general.name = hub
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.28 MiB
llm_load_tensors: CPU buffer size = 47628.36 MiB
...................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 160.00 MiB
llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 266.50 MiB
llama_new_context_with_model: graph nodes = 2566
llama_new_context_with_model: graph splits = 1
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 256, n_keep = 0
上海是在一座小岛上,整个小岛被一层柔和的绿色光芒所笼罩着,岛上遍布着各种奇特的生物和植物,整个小岛充满着浓郁的神秘色彩。
在小岛的中心有一座高大的古树,树干上刻着一些古老的符号,符号似乎在发出着微弱的光芒,照亮着周围的环境。
小岛的主人是一个名叫 "莉莉丝" 的少女,她有着一头银发和一双碧绿色的眼睛,她总是穿着一件白色的长裙,裙子上绣着一些精美的花纹。她拥有着神奇的力量,可以控制小岛上的各种生物和植物,整个小岛都是她魔法的世界。
莉莉丝是一个温柔、善良的人,她总是帮助小岛上的生物和来访的客人,她的魔法也总是为了帮助他人和保护小岛。
然而,小岛上也隐藏着一些秘
llama_print_timings: load time = 1866.31 ms
llama_print_timings: sample time = 13.83 ms / 233 runs ( 0.06 ms per token, 16847.43 tokens per second)
llama_print_timings: prompt eval time = 1577.67 ms / 4 tokens ( 394.42 ms per token, 2.54 tokens per second)
llama_print_timings: eval time = 229418.17 ms / 232 runs ( 988.87 ms per token, 1.01 tokens per second)
llama_print_timings: total time = 231358.70 ms / 236 tokens
这里我打错了,但是Llama 3非常Nice的,编撰了一段小说剧情,对"魔都"挺有想象力的。
makefile
$ llama.cpp/main -m Meta-Llama-3-70B-Instruct.Q5_K_M.gguf -p "上海是一座" -n 256 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed = 1713505108
llama_model_loader: loaded meta data with 22 key-value pairs and 723 tensors from Meta-Llama-3-70B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = hub
llama_model_loader: - kv 2: llama.vocab_size u32 = 128256
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 8192
llama_model_loader: - kv 5: llama.block_count u32 = 80
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 28672
llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 8: llama.attention.head_count u32 = 64
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 12: general.file_type u32 = 17
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,128256] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 161 tensors
llama_model_loader: - type q5_K: 481 tensors
llama_model_loader: - type q6_K: 81 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 8192
llm_load_print_meta: n_head = 64
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 80
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 28672
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 70B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 70.55 B
llm_load_print_meta: model size = 46.51 GiB (5.66 BPW)
llm_load_print_meta: general.name = hub
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.28 MiB
llm_load_tensors: CPU buffer size = 47628.36 MiB
...................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 160.00 MiB
llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 266.50 MiB
llama_new_context_with_model: graph nodes = 2566
llama_new_context_with_model: graph splits = 1
system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 256, n_keep = 0
上海是一座城市的名称)
* The city of Shanghai is a metropolis. (上海是一个大都市)
* The city of Shanghai is situated at the mouth of the Yangtze River. (上海位于长江口)
* Shanghai is a global financial center. (上海是一个全球金融中心)
* Shanghai is a city with a rich cultural heritage. (上海是一个文化遗产丰富的城市)
Note that in English, it's common to use "Shanghai" as a standalone noun to refer to the city, whereas in Chinese, "上海" is typically used with a preceding noun or phrase to clarify what is being referred to.
In Chinese, when referring to a specific aspect of the city, you would typically use a phrase like "上海市" (Shanghai city) or "上海都市" (Shanghai metropolis). For example:
* 上海市是中国最大的城市。 (Shanghai city is the largest city in China.)
* 上海都市的经济发展很快。 (The economy of Shanghai metropolis is developing very quickly.)
However, when referring to the city in a more general sense, it's common to simply use "上海" on its own. For example:
* 我喜欢上海。 (I like Shanghai.)
* 上海是一个很
llama_print_timings: load time = 1894.65 ms
llama_print_timings: sample time = 14.08 ms / 256 runs ( 0.06 ms per token, 18180.53 tokens per second)
llama_print_timings: prompt eval time = 1308.29 ms / 3 tokens ( 436.10 ms per token, 2.29 tokens per second)
llama_print_timings: eval time = 251453.62 ms / 255 runs ( 986.09 ms per token, 1.01 tokens per second)
llama_print_timings: total time = 252938.58 ms / 258 tokens
啊?中英混合生成。他可是支持30多种语言的,这岂不是说比肩专业翻译。具体的估计仍需要同行来实测给出Benchmark,但我觉得 Llama 3 绝对会给开源 AI 注入一剂"内啡肽",让我们拭目以待!