从零开始AI跟我学手撸grok，站在AI大神GG肩上

大神直接用了一台 M2 Ultra的果子机，在35秒的视频里用GPU跑了 9token/s

我的实验

yaml 复制代码

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   32
  On-line CPU(s) list:    0-31
Vendor ID:                GenuineIntel
  Model name:             13th Gen Intel(R) Core(TM) i9-13900K

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           125Gi       2.4Gi       123Gi       293Mi       828Mi       123Gi
Swap:          128Gi       610Mi       127Gi

GPU显存不够，且独显架构，以及我技术有限，不会 GPUDirect Storage（回头研究一下）只好将就一下用 CPU。

yaml 复制代码

After you have launched your website, you should optimize it for search engines. This involves optimizing your website's
llama_print_timings:        load time =    1144.33 ms
llama_print_timings:      sample time =      20.57 ms /   400 runs   (    0.05 ms per token, 19447.69 tokens per second)
llama_print_timings: prompt eval time =     998.69 ms /    19 tokens (   52.56 ms per token,    19.03 tokens per second)
llama_print_timings:        eval time =   34346.82 ms /   399 runs   (   86.08 ms per token,    11.62 tokens per second)
llama_print_timings:       total time =   35548.74 ms /   418 tokens
Log end
$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
    CPU family:           6
    Model:                69
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             1
    CPU(s) scaling MHz:   62%
    CPU max MHz:          2600.0000
    CPU min MHz:          800.0000
    BogoMIPS:             4589.13

嗯，性能还行。。。额，不好意思贴错了（哈哈哈~）

vbnet 复制代码

$ llama.cpp/main -m grok-GGUF/grok-1-IQ3_S-split-00001-of-00009.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 1024 -e
Log start
main: build = 2647 (8228b66d)
main: built with cc (GCC) 13.2.1 20230801 for x86_64-pc-linux-gnu
main: seed  = 1713085616
llama_model_loader: additional 8 GGUFs metadata loaded.
llama_model_loader: loaded meta data with 21 key-value pairs and 2114 tensors from grok-GGUF/grok-1-IQ3_S-split-00001-of-00009.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = grok
llama_model_loader: - kv   1:                               general.name str              = grok-1
llama_model_loader: - kv   2:                           grok.block_count u32              = 64
llama_model_loader: - kv   3:                        grok.context_length u32              = 8192
llama_model_loader: - kv   4:                      grok.embedding_length u32              = 6144
llama_model_loader: - kv   5:                   grok.feed_forward_length u32              = 32768
llama_model_loader: - kv   6:                  grok.attention.head_count u32              = 48
llama_model_loader: - kv   7:               grok.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                        grok.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv   9:      grok.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          grok.expert_count u32              = 8
llama_model_loader: - kv  11:                     grok.expert_used_count u32              = 2
llama_model_loader: - kv  12:                          general.file_type u32              = 26
llama_model_loader: - kv  13:                                   split.no u16              = 0
llama_model_loader: - kv  14:                                split.count u16              = 9
llama_model_loader: - kv  15:                        split.tensors.count i32              = 2114
llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,131072]  = ["[PAD]", "[BOS]", "[EOS]", "[UNK]", ...
llama_model_loader: - kv  18:                      tokenizer.ggml.scores arr[f32,131072]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,131072]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  321 tensors
llama_model_loader: - type q8_0:  128 tensors
llama_model_loader: - type q5_K:   64 tensors
llama_model_loader: - type q6_K:    1 tensors
llama_model_loader: - type iq3_s: 1600 tensors
llm_load_vocab: mismatch in special tokens definition ( 284/131072 vs 260/131072 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = grok
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 131072
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 6144
llm_load_print_meta: n_head           = 48
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 64
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 6
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 32768
llm_load_print_meta: n_expert         = 8
llm_load_print_meta: n_expert_used    = 2
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 314B
llm_load_print_meta: model ftype      = IQ3_S - 3.4375 bpw
llm_load_print_meta: model params     = 315.68 B
llm_load_print_meta: model size       = 127.69 GiB (3.47 BPW)
llm_load_print_meta: general.name     = grok-1
llm_load_print_meta: BOS token        = 1 '[BOS]'
llm_load_print_meta: EOS token        = 2 '[EOS]'
llm_load_print_meta: UNK token        = 0 '[PAD]'
llm_load_print_meta: LF token         = 79 '<0x0A>'
llm_load_tensors: ggml ctx size =    1.00 MiB
llm_load_tensors:        CPU buffer size = 131388.02 MiB
....................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   128.00 MiB
llama_new_context_with_model: KV self size  =  128.00 MiB, K (f16):   64.00 MiB, V (f16):   64.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   357.01 MiB
llama_new_context_with_model: graph nodes  = 3720
llama_new_context_with_model: graph splits = 1

system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 2048, n_predict = 1024, n_keep = 1


 Building a website can be done in 10 simple steps:
Step 1: Plan your website
Step 2: Register a domain name
Step 3: Get a web hosting account
Step 4: Plan your site's structure
Step 5: Build your site
Step 6: Test and publish your website
Step 7: Maintain your website
Step 8: Get traffic to your website
Step 9: Grow your traffic
Step 10: Get feedback for your website
You need to think about what you want your website to show and how you want the users to interact with it.
Step 1: Plan your website
Before you start building your website, you need to have a plan for it. Think about what you want your website to be about and how you want users to interact with it. Here are some things to consider:
What is the purpose of your website?
What do you want to achieve with your website?
Who is your target audience?
What kind of content will you be posting on your website?
Once you have a good idea of what you want your website to be, you can start planning out the individual pages and features.
Step 2: Register a domain name
A domain name is the web address of your website. It's what users will type into their browser to visit your site. You need to register a domain name before you can start building your website.
There are many different domain name registrars to choose from. Some popular ones are GoDaddy, 1&1, and Namecheap.
Once you've chosen a registrar, you'll need to choose a domain name for your website. Your domain name should be easy to remember and relevant to your site's content.
Step 3: Get a web hosting account
A web host provides a place for your website's files to live on the internet. Without a web host, your website would not be accessible to anyone.
There are many different web hosting companies to choose from. Some popular ones are Hostgator, Bluehost, and WP Engine.
Once you've chosen a web host, you'll need to choose a hosting plan. Most hosting plans come with a free domain name registration.
Step 4: Plan your site's structure
Now that you have a domain name and a web host, it's time to start planning out your website's structure. This includes deciding what pages will be on your site and what content will go on each page.
Start by creating a list of all the pages you want on your website. Then, for each page, list out the content you want to include.
Once you have a plan for your site's structure, it's time to start building!
Step 5: Create your website's pages
Now it's time to start building your website! You'll need to create a new page for each page on your site.
To create a new page, log in to your web host's control panel and look for the "Website Builder" or "Site Manager" tool. This tool will allow you to create new pages and add content to them.
Once you've created all the pages for your website, it's time to add content to them.
To add content to a page, log in to your web host's control panel and look for the "Page Editor" or "Content Editor" tool. This tool will allow you to add text, images, and other content to your pages.
Once you've added content to all the pages on your site, it's time to publish your site!
Step 7: Publish your website
To publish your website, log in to your web host's control panel and look for the "Publish Website" or "Site Manager" tool. This tool will allow you to publish your site so that it is live on the internet.
Once you've published your site, congratulations! You've built your first website.

## Tips for making your website successful

A website is a great way to share your ideas, thoughts, and creations with the world. But how do you make sure that your website is successful? Here are some tips:

1. Keep it simple. Don't try to cram too much information onto your website. Stick to the essentials and make sure that everything is easy to find.

2. Make it visually appealing. Use high-quality images and graphics, and choose a color scheme that is pleasing to the eye.

3. Write compelling content. Your website should be full of interesting and useful information. Write in a clear and concise manner, and make sure to proofread your work before you publish it.

4. Promote your site. Use social media, word of mouth, and other marketing strategies to get people to visit your site.

5. Keep it up to date. Regularly add new content to your website to keep people coming back for more.

By following these tips, you can create a successful website
llama_print_timings:        load time =  299213.62 ms
llama_print_timings:      sample time =     103.33 ms /  1024 runs   (    0.10 ms per token,  9910.00 tokens per second)
llama_print_timings: prompt eval time =   73534.14 ms /    19 tokens ( 3870.22 ms per token,     0.26 tokens per second)
llama_print_timings:        eval time = 2711785.61 ms /  1023 runs   ( 2650.82 ms per token,     0.38 tokens per second)
llama_print_timings:       total time = 2786380.91 ms /  1042 tokens
Log end

尴尬的一批，才0.38 tokens/s。

huggingface.co/Arki05/Grok...

感觉是macbook pro，总算让我找回点场子。个人感觉是内存不够导致swap被占满，速度应该是在Disk IO上被拉下来的。所以作者推荐256Gi RAM。

四舍五入，所以说我跑了至少2.5辆帕拉梅拉？