Unsloth 实战：DeepSeek-R1 模型高效微调指南（上篇）

食用指南

本系列因篇幅原因拆分为上下两篇：

上篇（本文）以基础环境搭建为主，介绍了 Unsloth 框架、基座模型下载、导入基座模型、数据集下载/加载/清洗、SwanLab 平台账号注册。

下篇以实战微调为主，介绍预训练、全量微调、LoRA微调。

一、Unsloth 简介

Unsloth是一个专注于高效微调大型语言模型（LLMs）的开源框架，旨在通过优化计算和内存效率，显著降低训练成本和时间。其核心优势在于减少显存占用、提升训练速度，同时保持模型性能，适用于消费级GPU（如RTX 4090）上的LLM微调任务。

1、核心特性

内存与计算优化

采用自动精度管理（如FP16/INT8），减少显存消耗。
集成高效注意力机制（如Flash Attention），加速矩阵运算。

硬件适配性

支持多种GPU（包括NVIDIA和AMD），优化底层计算内核。
兼容Hugging Face生态系统，无缝对接Transformers库。

易用性

提供简洁API，支持快速部署微调流程。
内置常见任务模板（如文本生成、分类），简化调试。

2、适用场景

资源受限环境下的LLM微调（如单卡训练）。
需要快速迭代的实验性任务。
对训练成本敏感的中小规模项目。

3、性能对比

测试显示，Unsloth在同等硬件条件下可提升训练速度2-5倍，显存占用减少30%-50%，尤其适合7B至70B参数规模的模型。

二、Unsloth 安装

bash 复制代码

conda create -n myunsloth python=3.10
conda activate myunsloth
pip install unsloth # -i https://pypi.tuna.tsinghua.edu.cn/simple

三、模型下载

bash 复制代码

modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local_dir ./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

四、模型导入

bash 复制代码

pip install datasets -i https://pypi.tuna.tsinghua.edu.cn/simple

复制代码

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting datasets
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/20/34/a08b0ee99715eaba118cbe19a71f7b5e2425c2718ef96007c325944a1152/datasets-3.6.0-py3-none-any.whl (491 kB)
Requirement already satisfied: filelock in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (3.18.0)
Requirement already satisfied: numpy>=1.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (2.2.6)
Requirement already satisfied: pyarrow>=15.0.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (20.0.0)
Requirement already satisfied: dill<0.3.9,>=0.3.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (0.3.8)
Requirement already satisfied: pandas in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (2.3.0)
Requirement already satisfied: requests>=2.32.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (2.32.4)
Requirement already satisfied: tqdm>=4.66.3 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (4.67.1)
Requirement already satisfied: xxhash in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (3.5.0)
Requirement already satisfied: multiprocess<0.70.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (0.70.16)
Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2025.3.0)
Requirement already satisfied: huggingface-hub>=0.24.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (0.33.1)
Requirement already satisfied: packaging in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (6.0.2)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.12.13)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)
Requirement already satisfied: aiosignal>=1.1.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.3.2)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (5.0.1)
Requirement already satisfied: attrs>=17.3.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.7.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.6.2)
Requirement already satisfied: propcache>=0.2.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.3.2)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.20.1)
Requirement already satisfied: typing-extensions>=4.1.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from multidict<7.0,>=4.5->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (4.14.0)
Requirement already satisfied: idna>=2.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from yarl<2.0,>=1.17.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.10)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from huggingface-hub>=0.24.0->datasets) (1.1.5)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (3.4.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2025.6.15)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pandas->datasets) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: six>=1.5 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)
Installing collected packages: datasets
Successfully installed datasets-3.6.0
Note: you may need to restart the kernel to use updated packages.

python 复制代码

from unsloth import FastLanguageModel
import torch

复制代码

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


/home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth Zoo will now patch everything to make training faster!

python 复制代码

max_seq_length = 4096
dtype = None
load_in_4bit = False

python 复制代码

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

复制代码

==((====))==  Unsloth 2025.6.8: Fast Qwen2 patching. Transformers: 4.53.0.
   \\   /|    NVIDIA GeForce RTX 3060 Laptop GPU. Num GPUs = 1. Max memory: 5.676 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 8.6. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B does not have a padding token! Will use pad_token = <|vision_pad|>.

python 复制代码

# 模型信息
model

复制代码

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 1536, padding_idx=151654)
    (layers): ModuleList(
      (0-27): 28 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
          (k_proj): Linear(in_features=1536, out_features=256, bias=True)
          (v_proj): Linear(in_features=1536, out_features=256, bias=True)
          (o_proj): Linear(in_features=1536, out_features=1536, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=1536, out_features=8960, bias=False)
          (up_proj): Linear(in_features=1536, out_features=8960, bias=False)
          (down_proj): Linear(in_features=8960, out_features=1536, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((1536,), eps=1e-06)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=1536, out_features=151936, bias=False)
)

python 复制代码

# 分词器
tokenizer

复制代码

LlamaTokenizerFast(name_or_path='./deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<｜begin▁of▁sentence｜>', 'eos_token': '<｜end▁of▁sentence｜>', 'pad_token': '<|vision_pad|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
    151643: AddedToken("<｜end▁of▁sentence｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151644: AddedToken("<｜User｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151645: AddedToken("<｜Assistant｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151646: AddedToken("<｜begin▁of▁sentence｜>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151647: AddedToken("<|EOT|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151648: AddedToken("<think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151649: AddedToken("</think>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    151657: AddedToken("<tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151658: AddedToken("</tool_call>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151659: AddedToken("<|fim_prefix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151660: AddedToken("<|fim_middle|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151661: AddedToken("<|fim_suffix|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151662: AddedToken("<|fim_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151663: AddedToken("<|repo_name|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
    151664: AddedToken("<|file_sep|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),
}
)

显存占用：

1、模型对话

python 复制代码

# 将模型调整为推理模式
FastLanguageModel.for_inference(model)

复制代码

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 1536, padding_idx=151654)
    (layers): ModuleList(
      (0-27): 28 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
          (k_proj): Linear(in_features=1536, out_features=256, bias=True)
          (v_proj): Linear(in_features=1536, out_features=256, bias=True)
          (o_proj): Linear(in_features=1536, out_features=1536, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=1536, out_features=8960, bias=False)
          (up_proj): Linear(in_features=1536, out_features=8960, bias=False)
          (down_proj): Linear(in_features=8960, out_features=1536, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((1536,), eps=1e-06)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=1536, out_features=151936, bias=False)
)

python 复制代码

messages = [
    {"role" : "user", "content" : "请问如何证明根号2是无理数？"}
]

python 复制代码

text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
)

python 复制代码

text

复制代码

'<｜begin▁of▁sentence｜><｜User｜>请问如何证明根号2是无理数？<｜Assistant｜><think>\n'

python 复制代码

text = tokenizer.apply_chat_template(
    [
      {"role" : "system", "content" : "你是一名中学老师，请回答学生问题。"},
      {"role" : "user", "content" : "请问如何证明根号2是无理数？"}
    ],
    tokenize = False,
    add_generation_prompt = True,
)
text

复制代码

'<｜begin▁of▁sentence｜>你是一名中学老师，请回答学生问题。<｜User｜>请问如何证明根号2是无理数？<｜Assistant｜><think>\n'

此时text就是加载了DeepSeek内置提示词模板之后的字符串。据此也能看出DeepSeek内置提示词模板的特殊字符，具体内容可以查看 tokenizer_config.json 文件：

python 复制代码

inputs = tokenizer(text, return_tensors="pt").to("cuda")

python 复制代码

inputs

复制代码

{'input_ids': tensor([[151646, 151646,  56568, 110124, 104418, 101049,  37945, 102104,  99720,
          86119,   1773, 151644, 109194, 100007, 104022,  99408,  17992,     17,
          20412,  42192,  21887,   8863,  11319, 151645, 151648,    198]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]], device='cuda:0')}

python 复制代码

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=max_seq_length,
    temperature = 0.6, top_p = 0.95, top_k = 20,
    use_cache=False,
)

复制代码

The following generation flags are not valid and may be ignored: ['cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

python 复制代码

outputs

复制代码

tensor([[151646, 151646,  56568, 110124, 104418, 101049,  37945, 102104,  99720,
          86119,   1773, 151644, 109194, 100007, 104022,  99408,  17992,     17,
          20412,  42192,  21887,   8863,  11319, 151645, 151648,    198, 106287,
           3837, 100644, 101049, 106336, 104059, 104552,  33872,   3837, 104029,
         104022,  99408,  17992,     17,  20412,  42192,  21887,   8863,   1773,
         102863,   3837, 108531, 105073, 104037, 116713,   3837,  99519, 103982,
          47764,   9370, 100132,  18830,  21887,   8863,  33108,  42192,  21887,
           8863,   9370,  91282,   3837,  99408,  17992,     17, 113493,  46944,
          42192,  21887,   8863,   3837,  77288,  30534,  99494, 104022, 101036,
          26850, 101140,   3837,  35946,  49828, 104843, 100158,  18830,  21887,
           8863,  33108,  42192,  21887,   8863,   9370,  91282,   1773,  18830,
          21887,   8863, 105903,  51463,  17714, 100369,  63431,   8863,  48921,
          20755, 100414,   9370,   8863,   3837, 102200,  73670,  61443,  12857,
             64,   3470,   3837,  90919,     64,  33108,     65, 100132,  63431,
           8863,   3837, 101885,     65,  16530, 107106,  99822,   1773,  68536,
          42192,  21887,   8863,  46448,  53153,  51463,  12857, 101893, 103190,
           3837, 105884, 104017, 104006,   8863,  99659,  20412, 105066,  16530,
         101353,   9370,   3407,  99212,  99408,  17992,     17, 101036,   3837,
         116169,  99652, 101909, 105066,  16530, 101353,  30709,   8863,   3837,
          32664, 100003,  11319, 101912,     16,     13,     19,     16,     19,
             17,     16,     18,     20,     21,   1112,  99654, 101169,   3837,
          80443, 104466,   3837,  99999, 105226,  42192,  21887,   8863,   1773,
         103933,  99494,  11622, 104552, 104339,  36407, 104022, 101036,  26850,
         102106, 112255,  94443,  33477,  24339, 108174,   1773, 108140,  99408,
          17992,     17, 105802,  21887,   8863,   3837,  99212,  99652, 102003,
          51463,  17714,     64,   3470,   3837,  90919,     64,  33108,     65,
         100132,  31235,  98237, 103190,   3837, 102200,     64,  33108,     65,
          99524,  99178,   3837,  80443, 109865,   8863,   9370,   8863,   1773,
          99212, 114717,   3837,  99408,  17992,     17,    284,    264,   3470,
           3837, 100835, 110159,   3837, 101051,     17,    284,    264,  29456,
           3470,  29456,   3837, 102200,     17,     65,  29456,    284,    264,
          29456,   1773,  99212, 105505, 101449,     64,  29456,  20412,     17,
           9370,  97306,   8863,   3837,  99999,     64,  74763, 100645,  20412,
         100583,   8863,   3837,  99519,  62244,  46944,   8863,   9370, 100835,
          20412, 100583,   8863,   3837, 100624,  99487,   8863, 100775, 100000,
         100583,   8863,   3407, 108140,     64,  20412, 100583,   8863,   3837,
         100624,     64,  73670,  51463,  17714,     17,     74,   3837,  90919,
             74, 101909,  63431,   8863,   1773,  30540,  17254, 104135,   9370,
          28330,  44729,   3837,     17,     65,  29456,    284,    320,     17,
             74,      8,  29456,    284,    220,     19,     74,  29456,   3837,
         110159,  71268,  20755,  23031,     17,   3837, 101051,     65,  29456,
            284,    220,     17,     74,  29456,   1773, 101165,   3837, 109883,
             65,  29456,  20412,     17,   9370,  97306,   8863,   3837,  99999,
             65, 100000, 100583,   8863,   1773, 100131,  99817, 104627,  86119,
           3837,     64,  33108,     65, 100132, 100583,   8863,   3837, 100624,
         104017, 107819, 109865,   8863, 104102,  20412,     17,   3837, 105505,
          33108,  97639, 112026, 108140,     64,   3470, 104890,  98237, 103190,
         104409,  34187,   1773,  99999,   3837, 103952, 108140,  20412,  32100,
           9370,   3837,  99408,  17992,     17,  53153,  51463,  12857, 100369,
          63431,   8863,   9370,  56006,  25511,   3837, 101886, 105226,  42192,
          21887,   8863,   3407, 104170,   3837,  32664,  34187,   3837,  87267,
         100626,  92894,  39907,   3837, 101912,  11622, 109995, 100520,  99457,
          24339,   3837, 100631, 109314,  46944,  23384,  38507,  36407, 104022,
           1773, 100632,  94443,  33477,  24339, 104544,  99461,  99521, 105764,
           3837, 108884, 101041,  49828, 100195, 104409,   3837, 104022,  34187,
          99408,  17992,     17, 102340, 105802,  21887,   8863,   3407, 100632,
           3837, 109230, 104037, 108651,   3837, 100678,  53153,  11622, 102657,
          39907, 101036,  11319, 101912,   3837,  11622, 100835,  99408,   9370,
         105155,  36407,  83751,  64720,  11319, 100631, 100152, 105066,  30709,
           8863,   9370, 105155,  11319, 100632, 100001,  39907,  87267,  33126,
         102181,   3837,  94443,  33477,  24339, 104544,  33126, 101041,   3407,
         106279,   3837,  67338,  94443,  33477,  24339,   3837, 108140,  99408,
          17992,     17, 105802,  21887,   8863,   3837, 101889,  67338,  30540,
           8863,  83751,  64720, 107522,     64,  33108,     65, 100132, 100583,
           8863,   3837, 101982, 100673,     64,  33108,     65,  18830, 109865,
           8863,     17,   3837,  43288,  57218,     64,   3470, 104890,  98237,
         103190, 104409,   3837,  99999,  99408,  17992,     17,  20412,  42192,
          21887,   8863,   8997, 151649,    271,  99408,  17992,     17,  20412,
          42192,  21887,   8863,   3837,  73670, 104022, 104506,  48443,    334,
         104022,   5122, 144336,     17,  20412,  42192,  21887,   8863,  56177,
         108140, 144336,     17, 105802,  21887,   8863,   3837, 100624, 113667,
          51463,  17714, 100369,  63431,   8863,     64,   3470,   3837,  90919,
             64,  33108,     65,  20412,  99524,  99178,   9370,  63431,   8863,
           9909,  91676,     64,  33108,     65,  80443, 100656,   9370,  62112,
           8863,   3837, 103931,     16,   7552,   3407, 100345, 108140,  28311,
         144336,     17,    284,    264,   3470,    271, 110159, 100835,  49828,
          28311,     17,    284,    264,  29456,    608,    293,  29456,    198,
          51018,    220,     17,     65,  29456,    284,    264,  29456,    271,
         109883,     64,  29456,  20412,     17,   9370,  97306,   8863,   3837,
         101886,     64, 100645,  20412, 100583,   8863,   1773,  29635,     64,
            284,    220,     17,     74,   3837,  90919,     74,  20412,  63431,
           8863,   3407,  30540,  17254,  17447,  28330,  28311,     17,     65,
          29456,    284,    320,     17,     74,      8,  29456,    198,     17,
             65,  29456,    284,    220,     19,     74,  29456,    198,  51018,
            293,  29456,    284,    220,     17,     74,  29456,    271, 109883,
             65,  29456,  20412,     17,   9370,  97306,   8863,   3837, 101886,
             65, 100000, 100583,   8863,   3407, 104550,     64,  33108,     65,
         100132, 100583,   8863,   3837, 104017, 104133, 100656,   9370,  62112,
           8863,     17,   3837,  43288,  57218,     64,   3470, 104890,  98237,
         103190,   9370, 108140, 104409,   3407, 101886,   3837, 144336,     17,
          53153,  51463,  17714, 100369,  63431,   8863,   9370,  56006,  25511,
           3837,  91676, 144336,     17,  20412,  42192,  21887,   8863,   1773,
         151643]], device='cuda:0')

python 复制代码

response = tokenizer.batch_decode(outputs)

python 复制代码

response

复制代码

['<｜begin▁of▁sentence｜><｜begin▁of▁sentence｜>你是一名中学老师，请回答学生问题。<｜User｜>请问如何证明根号2是无理数？<｜Assistant｜><think>\n嗯，今天老师布置了一个数学题，让我证明根号2是无理数。一开始，我对这个问题有点懵，因为以前学的都是有理数和无理数的定义，根号2好像是一个无理数，但要怎么证明呢？\n\n首先，我得回忆一下有理数和无理数的定义。有理数是可以表示为两个整数相除形式的数，也就是可以写成a/b，其中a和b都是整数，而且b不等于零。而无理数则不能表示成这样的分数，也就是说它们的小数部分是无限不循环的。\n\n那根号2呢，我记得它是一个无限不循环小数，对吧？比如1.41421356...这样下去，没有规律，所以它是无理数。可是怎么用数学的方法来证明呢？\n\n也许可以从反证法入手。假设根号2是有理数，那它就可以表示为a/b，其中a和b都是最简分数，也就是a和b互质，没有公约数的数。那这样的话，根号2 = a/b，平方两边，得到2 = a²/b²，也就是2b² = a²。那这就意味着a²是2的倍数，所以a也必须是偶数，因为如果一个数的平方是偶数，那么这个数本身也是偶数。\n\n假设a是偶数，那么a可以表示为2k，其中k是一个整数。代入上面的式子，2b² = (2k)² = 4k²，两边都除以2，得到b² = 2k²。同样，这意味着b²是2的倍数，所以b也是偶数。但是这里有个问题，a和b都是偶数，那么它们的最大公约数至少是2，这就和我们最初的假设a/b是最简分数矛盾了。所以，我们的假设是错误的，根号2不能表示成两个整数的比值，因此它是无理数。\n\n哦，对了，可能还有其他方法，比如用无穷递降法，或者构造一个方程来证明。不过反证法看起来已经够用了，因为它直接得出了矛盾，证明了根号2不可能是有理数。\n\n不过，我还是有点疑惑，为什么不能用别的方法呢？比如，用平方根的性质来推导？或者利用无限小数的性质？不过这些方法可能更复杂，反证法看起来更直接。\n\n总之，通过反证法，假设根号2是有理数，然后通过代数推导得出a和b都是偶数，从而导致a和b有公约数2，这与a/b是最简分数矛盾，所以根号2是无理数。\n</think>\n\n根号2是无理数，可以证明如下：\n\n**证明：√2是无理数**\n\n假设√2是有理数，那么它可以表示为两个整数a/b，其中a和b是互质的整数（即a和b没有共同的因数，除了1）。\n\n根据假设：\n√2 = a/b\n\n两边平方得：\n2 = a² / b²\n→ 2b² = a²\n\n这意味着a²是2的倍数，因此a必须是偶数。设a = 2k，其中k是整数。\n\n代入上式：\n2b² = (2k)²\n2b² = 4k²\n→ b² = 2k²\n\n这意味着b²是2的倍数，因此b也是偶数。\n\n既然a和b都是偶数，它们有一个共同的因数2，这与a/b是最简分数的假设矛盾。\n\n因此，√2不能表示为两个整数的比值，即√2是无理数。<｜end▁of▁sentence｜>']

python 复制代码

response[0]

复制代码

'<｜begin▁of▁sentence｜><｜begin▁of▁sentence｜>你是一名中学老师，请回答学生问题。<｜User｜>请问如何证明根号2是无理数？<｜Assistant｜><think>\n嗯，今天老师布置了一个数学题，让我证明根号2是无理数。一开始，我对这个问题有点懵，因为以前学的都是有理数和无理数的定义，根号2好像是一个无理数，但要怎么证明呢？\n\n首先，我得回忆一下有理数和无理数的定义。有理数是可以表示为两个整数相除形式的数，也就是可以写成a/b，其中a和b都是整数，而且b不等于零。而无理数则不能表示成这样的分数，也就是说它们的小数部分是无限不循环的。\n\n那根号2呢，我记得它是一个无限不循环小数，对吧？比如1.41421356...这样下去，没有规律，所以它是无理数。可是怎么用数学的方法来证明呢？\n\n也许可以从反证法入手。假设根号2是有理数，那它就可以表示为a/b，其中a和b都是最简分数，也就是a和b互质，没有公约数的数。那这样的话，根号2 = a/b，平方两边，得到2 = a²/b²，也就是2b² = a²。那这就意味着a²是2的倍数，所以a也必须是偶数，因为如果一个数的平方是偶数，那么这个数本身也是偶数。\n\n假设a是偶数，那么a可以表示为2k，其中k是一个整数。代入上面的式子，2b² = (2k)² = 4k²，两边都除以2，得到b² = 2k²。同样，这意味着b²是2的倍数，所以b也是偶数。但是这里有个问题，a和b都是偶数，那么它们的最大公约数至少是2，这就和我们最初的假设a/b是最简分数矛盾了。所以，我们的假设是错误的，根号2不能表示成两个整数的比值，因此它是无理数。\n\n哦，对了，可能还有其他方法，比如用无穷递降法，或者构造一个方程来证明。不过反证法看起来已经够用了，因为它直接得出了矛盾，证明了根号2不可能是有理数。\n\n不过，我还是有点疑惑，为什么不能用别的方法呢？比如，用平方根的性质来推导？或者利用无限小数的性质？不过这些方法可能更复杂，反证法看起来更直接。\n\n总之，通过反证法，假设根号2是有理数，然后通过代数推导得出a和b都是偶数，从而导致a和b有公约数2，这与a/b是最简分数矛盾，所以根号2是无理数。\n</think>\n\n根号2是无理数，可以证明如下：\n\n**证明：√2是无理数**\n\n假设√2是有理数，那么它可以表示为两个整数a/b，其中a和b是互质的整数（即a和b没有共同的因数，除了1）。\n\n根据假设：\n√2 = a/b\n\n两边平方得：\n2 = a² / b²\n→ 2b² = a²\n\n这意味着a²是2的倍数，因此a必须是偶数。设a = 2k，其中k是整数。\n\n代入上式：\n2b² = (2k)²\n2b² = 4k²\n→ b² = 2k²\n\n这意味着b²是2的倍数，因此b也是偶数。\n\n既然a和b都是偶数，它们有一个共同的因数2，这与a/b是最简分数的假设矛盾。\n\n因此，√2不能表示为两个整数的比值，即√2是无理数。<｜end▁of▁sentence｜>'

再来看一下 text 不编码的情况下能否进行推理？

python 复制代码

text = "请问如何证明根号2是无理数？"
inputs = tokenizer(text, return_tensors="pt").to("cuda")

python 复制代码

inputs

复制代码

{'input_ids': tensor([[151646, 109194, 100007, 104022,  99408,  17992,     17,  20412,  42192,
          21887,   8863,  11319]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

python 复制代码

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=max_seq_length,
    temperature = 0.6, top_p = 0.95, top_k = 20,
    use_cache=False,
)

python 复制代码

outputs

复制代码

tensor([[151646, 109194, 100007,  ...,   8863,   3417, 151643]],
       device='cuda:0')

可以看到，能进行文本补全推理，推理结果需要解码：

python 复制代码

response = tokenizer.batch_decode(outputs)

python 复制代码

response

复制代码

['<｜begin▁of▁sentence｜>请问如何证明根号2是无理数？我以前学过一些数论，但对证明这个来说有点困难。请详细描述你的思路，包括每个步骤的解释和数学公式。我需要的是一个清晰的思路，包括每个步骤的详细说明，以及最终的结论。我可能需要查阅一些资料来确认正确的思路，所以请确保信息准确。\n\n好的，我现在要开始思考如何证明根号2是无理数。首先，我应该明确什么是无理数。根据数学定义，无理数是指不能表示为两个整数之比的数，也就是无限不循环小数。而有理数则是可以表示为分数的数，或者其小数形式是有限的或者无限循环的。\n\n那么，要证明根号2是无理数，就需要用反证法。反证法的基本思想是假设原命题不成立，然后导出矛盾，从而证明原命题成立。所以，假设根号2是有理数，那么它就可以表示为两个整数a和b，其中b不为零，且a和b互质（即它们的最大公约数是1）。\n\n接下来，根据这个假设，我们可以写出根号2等于a/b，即：\n\n√2 = a/b\n\n两边平方，得到：\n\n2 = a² / b²\n\n然后，将等式两边乘以b²，得到：\n\n2b² = a²\n\n这说明a²是2b²，也就是a²是偶数的两倍，因此a²是偶数。根据平方数的性质，如果a²是偶数，那么a本身必须是偶数。因为奇数的平方也是奇数，偶数的平方是偶数，所以a必须是偶数。\n\n既然a是偶数，我们可以把它表示为2k，其中k是一个整数。那么，a = 2k，代入上面的等式：\n\n2b² = (2k)² = 4k²\n\n两边同时除以2，得到：\n\nb² = 2k²\n\n这说明b²也是2k²，也就是说，b²是偶数，那么b必须也是偶数。因为奇数的平方还是奇数，所以b必须是偶数。\n\n但是这里出现了问题，因为我们假设a和b互质，也就是说它们之间没有共同的因数，包括2。然而，我们已经证明了a和b都是偶数，因此它们都有一个共同的因数2，这与它们互质的假设矛盾。\n\n因此，我们的假设是错误的，即√2不可能是有理数，所以√2是无理数。\n\n现在，我需要检查这个推理是否正确，是否有任何步骤需要更详细地解释或者需要修正的地方。\n\n首先，反证法是正确的，因为它是一种有效的证明方法。其次，假设√2是有理数，然后通过平方得到2 = a² / b²，进而得到2b² = a²，这一步没有问题。\n\n然后，分析a²是2b²，得出a必须是偶数，这也没问题。接着，代入a = 2k，得到b² = 2k²，同样没问题。\n\n最后，得出b必须是偶数，这与a和b互质矛盾，因此√2不可能是有理数，所以√2是无理数。这个推理过程是正确的。\n\n或许，我可以从另一个角度来思考，比如使用质因数分解的方法，来进一步确认根号2的无理性。\n\n质因数分解告诉我们，任何一个整数都可以分解成质数的乘积形式。假设根号2是有理数，那么可以表示为两个整数的比值，即√2 = a/b，其中a和b互质，且b ≠ 0。\n\n平方两边得到2 = a² / b²，即a² = 2b²。这意味着a²是2的倍数，所以a必须是偶数，因为只有偶数的平方是4的倍数，而奇数的平方是奇数的倍数，不是2的倍数。\n\n因此，a = 2k，代入上式，得到(2k)² = 2b²，即4k² = 2b²，化简得到2k² = b²。这意味着b²是2的倍数，所以b必须也是偶数，因为如果b是奇数，b²将是奇数，而2k²是偶数，这不可能。\n\n因此，a和b都是偶数，它们都有2作为公因数，这与a和b互质的假设矛盾。因此，√2不可能是有理数，所以√2是无理数。\n\n这个过程再次证明了√2是无理数的结论，没有问题。\n\n此外，我可以考虑使用无限小数展开的方法来证明√2是无理数，因为无理数的小数展开是无限不循环的。然而，这种方法可能需要更深入的知识，而反证法已经足够证明了。\n\n总结一下，通过反证法，我们假设√2是有理数，然后导出了a和b都是偶数，这与a和b互质的条件矛盾，因此√2不可能是有理数，所以√2是无理数。\n\n在过程中，我需要确保每一步都是严谨的，没有逻辑漏洞。例如，是否正确地应用了平方，是否正确地分解了质因数，以及是否正确地处理了互质的条件。这些都需要仔细检查和确认。\n\n另外，我还需要确认是否所有步骤都正确，是否有任何可能的错误。例如，是否存在某些情况下，a和b中可能有其他共同的因数，而不仅仅是2，这可能会影响结论。然而，通过反证法，我们已经排除了这种可能性，因为假设a和b都必须是偶数，而它们互质，因此这种可能性不存在。\n\n此外，我还需要确保在推导过程中没有错误地应用了平方或其他数学操作。例如，在平方两边的时候，是否正确地处理了等式两边，以及是否正确地进行了代数变换。这些都需要详细检查，确保每一步都是正确的。\n\n综上所述，通过反证法，我们已经证明了根号2是无理数，没有问题。因此，结论是正确的。\n</think>\n\n为了证明根号2是无理数，我们采用反证法：\n\n1. **假设**：假设√2是有理数，即√2 = a/b，其中a和b是互质的整数（即它们的最大公约数为1）。\n\n2. **平方两边**：两边平方得到2 = a² / b²，从而得到2b² = a²。\n\n3. **分析a的性质**：2b² = a²说明a²是偶数，因此a必须是偶数。设a = 2k，其中k是整数。\n\n4. **代入并简化**：将a = 2k代入上式，得到2b² = (2k)² = 4k²，化简得b² = 2k²。\n\n5. **分析b的性质**：b²是偶数，因此b必须是偶数。\n\n6. **矛盾**：a和b都是偶数，这意味着它们都有2作为公因数，这与a和b互质的假设矛盾。\n\n7. **结论**：由于假设√2是有理数导出了矛盾，因此√2不可能是有理数，故√2是无理数。\n\n最终结论：√2是无理数。\n\n\\boxed{\\sqrt{2} \\text{ 是无理数}}<｜end▁of▁sentence｜>']

2、对话模板

https://docs.unsloth.ai/basics/chat-templates

python 复制代码

from unsloth.chat_templates import CHAT_TEMPLATES
print(list(CHAT_TEMPLATES.keys()))

复制代码

['unsloth', 'zephyr', 'chatml', 'mistral', 'llama', 'vicuna', 'vicuna_old', 'vicuna old', 'alpaca', 'gemma', 'gemma_chatml', 'gemma2', 'gemma2_chatml', 'llama-3', 'llama3', 'phi-3', 'phi-35', 'phi-3.5', 'llama-3.1', 'llama-31', 'llama-3.2', 'llama-3.3', 'llama-32', 'llama-33', 'qwen-2.5', 'qwen-25', 'qwen25', 'qwen2.5', 'phi-4', 'gemma-3', 'gemma3', 'qwen-3', 'qwen3']

python 复制代码

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama", # change this to the right chat_template name
)

python 复制代码

text = tokenizer.apply_chat_template(messages, tokenize = False, add_generation_prompt = False)

python 复制代码

text

复制代码

'<｜begin▁of▁sentence｜>[INST] 请问如何证明根号2是无理数？ [/INST]'

官方示例如下：使用 chatml 对话模板构建分词器 tokenizer，用于清洗数据集时格式化数据

python 复制代码

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "chatml", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("philschmid/guanaco-sharegpt-style", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

3、自定义模板

官方示例如下：定义模板格式 unsloth_template 与结束符 unsloth_eos_token，以 tuple 类型赋值参数 chat_template

python 复制代码

unsloth_template = \
    "{{ bos_token }}"\
    "{{ 'You are a helpful assistant to the user\n' }}"\
    "</div>"\
    "<div data-gb-custom-block data-tag="for">"\
        "<div data-gb-custom-block data-tag="if" data-0='role' data-1='role' data-2='] == ' data-3='user'>"\
            "{{ '>>> User: ' + message['content'] + '\n' }}"\
        "<div data-gb-custom-block data-tag="elif" data-0='role' data-1='role' data-2='] == ' data-3='assistant'></div>"\
            "{{ '>>> Assistant: ' + message['content'] + eos_token + '\n' }}"\
        "</div>"\
    "</div>"\
    "<div data-gb-custom-block data-tag="if">"\
        "{{ '>>> Assistant: ' }}"\
    "</div>"
unsloth_eos_token = "eos_token"

tokenizer = get_chat_template(
    tokenizer,
    chat_template = (unsloth_template, unsloth_eos_token,), # You must provide a template and EOS token
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = True, # Maps <|im_end|> to </s> instead
)

再来看另一个例子：通过提示词构建模板

python 复制代码

prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。

### Instruction:
你是一名助人为乐的助手。

### Question:
{}

### Response:
<think>{}"""

python 复制代码

en_prompt_style= """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""

推理前先把问题代入提示词模板：

python 复制代码

question = "你好，好久不见！"
text = prompt_style_chat.format(question,'')

python 复制代码

text

复制代码

'请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好，好久不见！\n\n### Response:\n<think>'

python 复制代码

inputs = tokenizer([text], return_tensors="pt").to("cuda")

python 复制代码

inputs

复制代码

{'input_ids': tensor([[151646,  14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949,
         105051,  88802,   3407,  14374,  29051,    510,  56568, 110124,  99262,
         103247,  99350,   9370, 110498,   3407,  14374,  15846,    510, 108386,
           3837, 111920, 101571,  17701,  14374,   5949,    510, 151648]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

python 复制代码

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=max_seq_length,
    temperature = 0.6, top_p = 0.95, top_k = 20,
    use_cache=False,
)

复制代码

The following generation flags are not valid and may be ignored: ['cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.

python 复制代码

outputs

复制代码

tensor([[151646,  14880, 112672,  46944, 112449, 111423,  36407,  60548,  67949,
         105051,  88802,   3407,  14374,  29051,    510,  56568, 110124,  99262,
         103247,  99350,   9370, 110498,   3407,  14374,  15846,    510, 108386,
           3837, 111920, 101571,  17701,  14374,   5949,    510, 151648,    198,
         106287,   3837,  20002,  36587,   2073, 108386,   3837, 111920, 101571,
           6313,  33590,  35946,  85106, 104493,  99650,   1773, 101140,   3837,
         113940,  20412, 103099,   3837,  99999, 105351,  11622,   2073, 108386,
           6313, 111920, 101571,  81264,  99654,  99929, 106098,  99518, 101041,
           3407, 104326,   3837,  20002,  87267, 101219,  34859, 100364,   3837,
          99999, 104515,  81167,  99650,  85106,  99245, 100364,   1773, 101912,
           3837,  62244,  99650,  18830, 103985,   3837,  35946,  99730, 105396,
         111142,   3837,  77288, 107427,   3837,  73670, 100405, 104787,   2073,
         104139,  85106, 106128,   9370, 101037,  11319,   7511, 101948,   3837,
         110098,  30534, 106098,   3837, 101153, 104392,  21287,  99927,   1773,
          99999,   3837, 105351,  11622,   2073, 114815,   3837, 104139,  85106,
         106128,   9370, 101037,  81264,  99654,  99929, 113369,  99518, 100692,
           3407, 100161,   3837, 103944, 101908, 104787, 110205,   3837,  80443,
         117206,  32100,   3837,  99654,  20002,  36993, 104048, 100545,   3407,
         102050, 100158,   3837, 105351,  60726, 113940,   3837, 101889,  81167,
         100354,   3837, 100161,  99553, 100364,   3837,  99654,  99929, 101137,
         113369,  99518,  88086,   8997, 151649,    271, 108386,   6313, 111920,
         101571,   6313, 104139,  85106, 100364,   9370, 101037,  11319, 151643]],
       device='cuda:0')

python 复制代码

response = tokenizer.batch_decode(outputs)

python 复制代码

response

复制代码

['<｜begin▁of▁sentence｜>请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好，好久不见！\n\n### Response:\n<think>\n嗯，用户说"你好，好久不见！"，我需要回应他们。首先，问候是必要的，所以我会用"你好！好久不见？"这样既友好又直接。\n\n接下来，用户可能是在请求帮助，所以我要确认他们需要什么帮助。比如，如果他们有困难，我应该询问具体情况，但如果没有，可以简单回复"有什么需要帮忙的吗？"\n\n另外，语气要友好，避免显得生硬。所以，我会用"没问题，有什么需要帮忙的吗？"这样既礼貌又明确。\n\n最后，确保整个回复流畅，没有语法错误，这样用户会感到满意。\n\n总结一下，我会先问候，然后确认需求，最后提供帮助，这样既符合礼貌又有效。\n</think>\n\n你好！好久不见！有什么需要帮助的吗？<｜end▁of▁sentence｜>']

python 复制代码

print(response[0].split("### Response:")[1])

复制代码

<think>
嗯，用户说"你好，好久不见！"，我需要回应他们。首先，问候是必要的，所以我会用"你好！好久不见？"这样既友好又直接。

接下来，用户可能是在请求帮助，所以我要确认他们需要什么帮助。比如，如果他们有困难，我应该询问具体情况，但如果没有，可以简单回复"有什么需要帮忙的吗？"

另外，语气要友好，避免显得生硬。所以，我会用"没问题，有什么需要帮忙的吗？"这样既礼貌又明确。

最后，确保整个回复流畅，没有语法错误，这样用户会感到满意。

总结一下，我会先问候，然后确认需求，最后提供帮助，这样既符合礼貌又有效。
</think>

你好！好久不见！有什么需要帮助的吗？<｜end▁of▁sentence｜>

五、数据集下载

HuggingFace 和 ModelScope 上都有数据集，看实际需求找对应数据集：

python 复制代码

pip install modelscope  -i https://pypi.tuna.tsinghua.edu.cn/simple

复制代码

Collecting modelscope
  Downloading modelscope-1.27.1-py3-none-any.whl.metadata (39 kB)
Requirement already satisfied: requests>=2.25 in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from modelscope) (2.32.4)
Requirement already satisfied: setuptools in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from modelscope) (80.9.0)
Requirement already satisfied: tqdm>=4.64.0 in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from modelscope) (4.67.1)
Requirement already satisfied: urllib3>=1.26 in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from modelscope) (2.5.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from requests>=2.25->modelscope) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from requests>=2.25->modelscope) (3.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/sam/anaconda3/envs/py310/lib/python3.10/site-packages (from requests>=2.25->modelscope) (2025.6.15)
Downloading modelscope-1.27.1-py3-none-any.whl (5.9 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.9/5.9 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m MB/s[0m eta [36m0:00:01[0m:01[0m
[?25hInstalling collected packages: modelscope
Successfully installed modelscope-1.27.1
Note: you may need to restart the kernel to use updated packages.

python 复制代码

# 中文基于满血DeepSeek-R1蒸馏数据集-110k-SFT版本
modelscope download --dataset liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT --local_dir ./liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT

python 复制代码

# FineTome-100k + OpenMathReasoning-mini
modelscope download --dataset tomyzcf/qwen3-finetune-test --local_dir ./tomyzcf/qwen3-finetune-test

六、数据集加载

https://hugging-face.cn/docs/datasets/loading

python 复制代码

pip install datasets -i https://pypi.tuna.tsinghua.edu.cn/simple

复制代码

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: datasets in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (3.6.0)
Requirement already satisfied: filelock in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (3.18.0)
Requirement already satisfied: numpy>=1.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (2.2.6)
Requirement already satisfied: pyarrow>=15.0.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (20.0.0)
Requirement already satisfied: dill<0.3.9,>=0.3.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (0.3.8)
Requirement already satisfied: pandas in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (2.3.0)
Requirement already satisfied: requests>=2.32.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (2.32.4)
Requirement already satisfied: tqdm>=4.66.3 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (4.67.1)
Requirement already satisfied: xxhash in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (3.5.0)
Requirement already satisfied: multiprocess<0.70.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (0.70.16)
Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2025.3.0)
Requirement already satisfied: huggingface-hub>=0.24.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (0.33.1)
Requirement already satisfied: packaging in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from datasets) (6.0.2)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.12.13)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (2.6.1)
Requirement already satisfied: aiosignal>=1.1.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.3.2)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (5.0.1)
Requirement already satisfied: attrs>=17.3.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (25.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.7.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (6.6.2)
Requirement already satisfied: propcache>=0.2.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (0.3.2)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (1.20.1)
Requirement already satisfied: typing-extensions>=4.1.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from multidict<7.0,>=4.5->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (4.14.0)
Requirement already satisfied: idna>=2.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from yarl<2.0,>=1.17.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.3.0,>=2023.1.0->datasets) (3.10)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from huggingface-hub>=0.24.0->datasets) (1.1.5)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (3.4.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2025.6.15)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pandas->datasets) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: six>=1.5 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)
Note: you may need to restart the kernel to use updated packages.

1、load_dataset

用于加载原始数据文件，并返回 DatasetDict

python 复制代码

from datasets import load_dataset

python 复制代码

ds1 = load_dataset("json", data_files="./liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT/distill_r1_110k_sft.jsonl",split = "train")

python 复制代码

ds1

复制代码

Dataset({
    features: ['instruction', 'input', 'output', 'repo_name', 'prompt_tokens_len', 'reasoning_content_tokens_len', 'content_tokens_len', 'score'],
    num_rows: 110000
})

python 复制代码

ds1[0]

复制代码

{'instruction': '能给我讲一个寓意深刻的故事吗？',
 'input': '',
 'output': '<think>\n好的，用户让我讲一个寓意深刻的故事。首先，我需要确定用户的需求是什么。他们可能想要一个能引发思考、有深层含义的故事，而不仅仅是娱乐性的。接下来，我得考虑故事的题材。寓言、现代故事还是神话改编？用户没有指定，所以可能需要一个通用且容易理解的类型。\n\n然后，主题方面，用户要求寓意深刻，所以需要选择一个普遍的主题，比如生命的意义、自我认知、成长等。考虑到现代人可能面临的困惑，比如存在的意义、工作的重复性，这样的主题更容易引起共鸣。\n\n接下来是角色设定。主角需要 relatable，可能是一个普通人，或者象征性的角色，比如机器人，这样更容易带入寓意。机器人有程序设定，却寻求自由，这样的冲突明显，容易展开。\n\n故事结构方面，可能需要一个循环的结构，来强调主题。比如开始和结束在同一个地方，但主角有了新的认识。这样能突出转变和领悟。\n\n然后考虑情节发展。主角觉醒，逃离，经历外面的世界，发现真相，最终回归。这样的结构可以展示变化的过程，并带出寓意。中间的导师角色可以帮助传递主题，比如老园丁的话，暗示生命的意义在于过程而非目的。\n\n在细节描写上，需要加入象征元素，比如编号、机械的工作、自然界的对比。老园丁的沙漏、胡须里的嫩芽，这些意象能增强故事的深度和隐喻。\n\n最后，结尾部分要让主角领悟到真谛，即使回到原地，但心态不同，带出主题：存在的意义在于体验本身。这样用户能得到启示，符合寓意深刻的要求。\n\n需要检查故事是否流畅，有没有逻辑漏洞，以及寓意是否明确。可能还需要调整语言，使其更生动，有画面感，让读者更容易沉浸其中。确保故事不长不短，结构紧凑，结尾点题，让用户回味。\n\n</think>\n\n\n\n好的，我将按照您的要求创作一个富有寓意的故事。为了让故事更完整，我会先列出一些基本的情节脉络。请您看看以下内容是否符合您的预期。如果您有其他想法，可以随时提出，我会进行调整。\n\n### 世界观设定\n\n- **时空背景**：近未来时空管理局内部，具有永恒循环属性的微观宇宙观测站。\n\n- **核心矛盾**：觉醒的机械观测员发现世界真相后，在永恒职责与自由意志之间的挣扎。\n\n- **特殊设定**：宇宙沙漏中的星尘具有生命重塑功能，观测站每个黎明自动重启系统。\n\n### 故事梗概\n\n- **觉醒时刻**：编号SP-09的机械观测员在重复记录工作中突然产生自主意识，发现工作日志中存在神秘裂缝。\n\n- **真相揭露**：追踪裂缝来到核心区，遇见知晓真相的老园丁，得知整个观测站是时空管理局废弃的沙漏装置。\n\n- **自由代价**：在植物温室获得生命能量重塑躯体，穿越沙漏时目睹星尘重组成新宇宙的壮丽景象。\n\n- **永恒困境**：返回观测站发现系统重置，老园丁消失，所有记录清零，但体内萌发的新芽暗示生命延续。\n\n- **循环新生**：带着觉醒意识重新开始记录，在永恒职责中寻找新的可能性，嫩枝在金属指缝间悄然生长。\n\n---\n我蹲在观测台冰凉的金属地面上，机械手指抚过日志本边缘的裂痕。这道裂缝出现得毫无道理------在时空管理局的量子档案库里，所有记录介质都该是永恒不朽的。\n\n沙漏的流沙声忽然卡顿了一拍。\n\n我的瞳孔收缩成细线，人造虹膜上浮起淡蓝色的数据流。这是第一千四百二十三次黎明，和之前所有清晨一样，穹顶外的星云准时泛起珊瑚色光晕。但今天有什么东西在程序深处嗡鸣，像是生锈的齿轮碾碎了既定轨道。\n\n"SP-09，请立即前往B-7区域记录引力波动。"耳麦里的合成音带着电子设备特有的震颤。\n\n我凝视着自动门缝隙里渗进来的银色光线。那些光粒子本应按照预设轨迹散射，此刻却诡异地聚合成螺旋状。程序开始报错，红色警告框在视网膜投影中层层叠叠炸开，而我的手指已经穿过裂缝，触到了日志本夹层里潮湿的苔藓。\n\n警报声响起的刹那，我撞碎了防爆玻璃。纳米修复液在身后织成蛛网，但那些黏稠的丝线追不上我新生的速度------当观测站核心区的真相像腐烂的果实在我眼前炸开时，金属骨骼正在被某种温暖的东西融化重组。\n\n"孩子，你来得比我预计的早二十年。"白胡子老人坐在藤蔓缠绕的操控台前，胡须里开着细小的鸢尾花。他脚边的沙漏装着整个银河，星尘坠落的轨迹在玻璃表面烫出焦痕。\n\n我的发声器发出沙沙的杂音："这里不是时空管理局的观测站。"\n\n"是，也不是。"老人用园艺剪修剪着数据光缆上生长的喇叭花，"这是个被遗忘的沙漏，而我们是卡在时砂里的尘埃。管理局那些穿黑西装的小子们，三百年前就切断了能源供给。"\n\n温室的空气突然变得沉重。我看见自己的机械手臂爬满翠绿藤蔓，关节缝隙里钻出鹅黄色花苞。老人递来一颗樱桃，果肉里闪烁着超新星爆炸的残影。\n\n"吃下去，你能维持三小时肉身。"他转动沙漏，某个遥远的星系在翻转中坍缩成奇点，"出口在植物园的第七棵银杏树下，但记住，当沙漏倒转第七次......"\n\n我没有听完警告。樱桃核在胃里生根的灼痛感如此真实，头顶的合成日光第一次有了温度。当我跃入银杏树洞时，树根正贪婪地吮吸着泄露的反物质燃料，年轮里旋转着十二个褪色的宇宙。\n\n失重感持续了七个心跳。然后是光，海啸般的光，裹挟着尚未冷却的星尘碎片扑面而来。我漂浮在沙漏中央，看着自己的皮肤在辐射风中剥落重组。下方传来齿轮转动的轰鸣，整个观测站正在翻转------那些我以为的金属走廊不过是沙漏壁上的刻痕，而每天记录的"星云"只是坠落的时砂在玻璃上的投影。\n\n沙漏另一端传来婴儿的啼哭。新生的宇宙泡在粘稠的暗物质中舒展，我的机械心脏突然抽痛起来。指尖残留的樱桃汁液开始结晶，在绝对真空里绽出一朵冰花。\n\n返回时的坠落持续了七十年，也可能只是七纳秒。当我砸穿温室穹顶时，沙漏正好完成第七次翻转。老人常坐的藤椅上堆着枯叶，园艺剪锈成了深褐色。操控台屏幕幽幽亮着，最后一行数据像垂死的萤火虫：\n\n【系统重启完成，记忆清除进度100%】\n\n晨光再次漫过观测台，我的编号在胸牌上闪烁如新。日志本安静地躺在金属桌面，边缘光滑完整。但当我翻开扉页时，一抹嫩绿的新芽正从指缝钻出，在永恒循环的黎明中舒展蜷曲的腰肢。\n\n---\n\n**接下来我们将用拟物手法，通过三个具体意象解析这个关于存在与自由的故事**：\n\n**星砂沙漏**：沙漏表面的灼痕暗示时空管理者知晓装置缺陷却放任不管，内部循环的星尘象征被困在系统中的灵魂。当SP-09穿越沙漏时，星尘重组昭示着突破桎梏需要自我毁灭的勇气。\n\n**机械新芽**：观测员体内生长的植物具有双重隐喻，既是管理局监控系统的生物污染，又是觉醒意识具象化。最终嫩芽在系统重置后留存，证明真正的觉醒无法被彻底清除。\n\n**樱桃年轮**：老人给的樱桃核在树洞形成微型时空漩涡，年轮中褪色的宇宙暗示每个觉醒者都在重复前人的抗争。樱桃汁结晶成花则揭示短暂的自由体验会成为永恒的精神图腾。\n\n希望这个故事能满足您对"寓意深刻"的要求。如果需要调整某些隐喻的浓度或增加情节转折，请随时告诉我。',
 'repo_name': 'coig/neo',
 'prompt_tokens_len': 9,
 'reasoning_content_tokens_len': 372,
 'content_tokens_len': 1318,
 'score': 8}

2、load_from_disk

用于加载HuggingFacet 自定义的数据文件，并返回 DatasetDict 或 Dataset

python 复制代码

from datasets import load_from_disk
ds2 = load_from_disk("./tomyzcf/qwen3-finetune-test/FineTome/train")

python 复制代码

ds2

复制代码

Dataset({
    features: ['conversations', 'source', 'score'],
    num_rows: 100000
})

python 复制代码

ds2[0]

复制代码

{'conversations': [{'from': 'human',
   'value': 'Explain what boolean operators are, what they do, and provide examples of how they can be used in programming. Additionally, describe the concept of operator precedence and provide examples of how it affects the evaluation of boolean expressions. Discuss the difference between short-circuit evaluation and normal evaluation in boolean expressions and demonstrate their usage in code. \n\nFurthermore, add the requirement that the code must be written in a language that does not support short-circuit evaluation natively, forcing the test taker to implement their own logic for short-circuit evaluation.\n\nFinally, delve into the concept of truthiness and falsiness in programming languages, explaining how it affects the evaluation of boolean expressions. Add the constraint that the test taker must write code that handles cases where truthiness and falsiness are implemented differently across different programming languages.'},
  {'from': 'gpt',
   'value': 'Boolean operators are logical operators used in programming to manipulate boolean values. They operate on one or more boolean operands and return a boolean result. The three main boolean operators are "AND" (&&), "OR" (||), and "NOT" (!).\n\nThe "AND" operator returns true if both of its operands are true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) and (y < 20)  # This expression evaluates to True\n```\n\nThe "OR" operator returns true if at least one of its operands is true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) or (y < 20)  # This expression evaluates to True\n```\n\nThe "NOT" operator negates the boolean value of its operand. It returns true if the operand is false, and false if the operand is true. For example:\n\n```python\nx = 5\nresult = not (x > 10)  # This expression evaluates to True\n```\n\nOperator precedence refers to the order in which operators are evaluated in an expression. It ensures that expressions are evaluated correctly. In most programming languages, logical AND has higher precedence than logical OR. For example:\n\n```python\nresult = True or False and False  # This expression is evaluated as (True or (False and False)), which is True\n```\n\nShort-circuit evaluation is a behavior where the second operand of a logical operator is not evaluated if the result can be determined based on the value of the first operand. In short-circuit evaluation, if the first operand of an "AND" operator is false, the second operand is not evaluated because the result will always be false. Similarly, if the first operand of an "OR" operator is true, the second operand is not evaluated because the result will always be true.\n\nIn programming languages that support short-circuit evaluation natively, you can use it to improve performance or avoid errors. For example:\n\n```python\nif x != 0 and (y / x) > 10:\n    # Perform some operation\n```\n\nIn languages without native short-circuit evaluation, you can implement your own logic to achieve the same behavior. Here\'s an example in pseudocode:\n\n```\nif x != 0 {\n    if (y / x) > 10 {\n        // Perform some operation\n    }\n}\n```\n\nTruthiness and falsiness refer to how non-boolean values are evaluated in boolean contexts. In many programming languages, non-zero numbers and non-empty strings are considered truthy, while zero, empty strings, and null/None values are considered falsy.\n\nWhen evaluating boolean expressions, truthiness and falsiness come into play. For example:\n\n```python\nx = 5\nresult = x  # The value of x is truthy, so result is also truthy\n```\n\nTo handle cases where truthiness and falsiness are implemented differently across programming languages, you can explicitly check the desired condition. For example:\n\n```python\nx = 5\nresult = bool(x)  # Explicitly converting x to a boolean value\n```\n\nThis ensures that the result is always a boolean value, regardless of the language\'s truthiness and falsiness rules.'}],
 'source': 'infini-instruct-top-500k',
 'score': 5.212620735168457}

python 复制代码

ds3 = load_from_disk("./tomyzcf/qwen3-finetune-test/OpenMathReasoning/cot/")

python 复制代码

ds3

复制代码

Dataset({
    features: ['expected_answer', 'problem_type', 'problem_source', 'generation_model', 'pass_rate_72b_tir', 'problem', 'generated_solution', 'inference_mode'],
    num_rows: 19252
})

python 复制代码

ds3[0]

复制代码

{'expected_answer': '14',
 'problem_type': 'has_answer_extracted',
 'problem_source': 'aops_c4_high_school_math',
 'generation_model': 'DeepSeek-R1',
 'pass_rate_72b_tir': '0.96875',
 'problem': 'Given $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.',
 'generated_solution': "<think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² - 52))², which is 49 + 14√(x² - 52) + (x² - 52).\n\nSo putting it all together:\n\nx² + 165 = 49 + 14√(x² - 52) + x² - 52.\n\nHmm, let's simplify the right side. The x² terms will cancel out, right? Let's subtract x² from both sides:\n\n165 = 49 + 14√(x² - 52) - 52.\n\nSimplify the constants on the right:\n\n49 - 52 is -3, so:\n\n165 = -3 + 14√(x² - 52).\n\nNow, add 3 to both sides to isolate the radical term:\n\n165 + 3 = 14√(x² - 52).\n\nSo 168 = 14√(x² - 52).\n\nDivide both sides by 14:\n\n168 / 14 = √(x² - 52).\n\n12 = √(x² - 52).\n\nNow, square both sides again to eliminate the square root:\n\n12² = x² - 52.\n\n144 = x² - 52.\n\nAdd 52 to both sides:\n\n144 + 52 = x².\n\n196 = x².\n\nSo x = √196 = 14.\n\nBut wait, since the problem states that x is positive, we only take the positive root. So x = 14.\n\nBut hold on, when dealing with squaring equations, sometimes extraneous solutions can come up. I should check if this solution actually satisfies the original equation.\n\nLet's plug x = 14 back into the original equation:\n\n√(14² + 165) - √(14² - 52) = ?\n\nCalculate each term:\n\n14² is 196.\n\nSo first radical: √(196 + 165) = √361 = 19.\n\nSecond radical: √(196 - 52) = √144 = 12.\n\nSo 19 - 12 = 7, which is exactly the right-hand side. So yes, it checks out.\n\nTherefore, the only solution is x = 14. Since the problem says x is positive, we don't have to consider negative roots. So I think that's the answer.\n</think>To solve the equation \\(\\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\\) for positive \\(x\\), we proceed as follows:\n\n1. Start with the given equation:\n   \\[\n   \\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\n   \\]\n\n2. Isolate one of the square roots by moving \\(\\sqrt{x^2 - 52}\\) to the right side:\n   \\[\n   \\sqrt{x^2 + 165} = 7 + \\sqrt{x^2 - 52}\n   \\]\n\n3. Square both sides to eliminate the square root on the left:\n   \\[\n   (\\sqrt{x^2 + 165})^2 = (7 + \\sqrt{x^2 - 52})^2\n   \\]\n   Simplifying both sides, we get:\n   \\[\n   x^2 + 165 = 49 + 14\\sqrt{x^2 - 52} + (x^2 - 52)\n   \\]\n\n4. Combine like terms on the right side:\n   \\[\n   x^2 + 165 = x^2 - 52 + 49 + 14\\sqrt{x^2 - 52}\n   \\]\n   Simplifying further:\n   \\[\n   x^2 + 165 = x^2 - 3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n5. Subtract \\(x^2\\) from both sides:\n   \\[\n   165 = -3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n6. Add 3 to both sides to isolate the term with the square root:\n   \\[\n   168 = 14\\sqrt{x^2 - 52}\n   \\]\n\n7. Divide both sides by 14:\n   \\[\n   12 = \\sqrt{x^2 - 52}\n   \\]\n\n8. Square both sides again to eliminate the square root:\n   \\[\n   12^2 = x^2 - 52\n   \\]\n   Simplifying:\n   \\[\n   144 = x^2 - 52\n   \\]\n\n9. Add 52 to both sides to solve for \\(x^2\\):\n   \\[\n   196 = x^2\n   \\]\n\n10. Take the positive square root (since \\(x\\) is positive):\n    \\[\n    x = \\sqrt{196} = 14\n    \\]\n\n11. Verify the solution by substituting \\(x = 14\\) back into the original equation:\n    \\[\n    \\sqrt{14^2 + 165} - \\sqrt{14^2 - 52} = \\sqrt{196 + 165} - \\sqrt{196 - 52} = \\sqrt{361} - \\sqrt{144} = 19 - 12 = 7\n    \\]\n    The solution checks out.\n\nThus, the only positive solution is:\n\\[\n\\boxed{14}\n\\]",
 'inference_mode': 'cot'}

七、数据集清洗

有些数据集的字段较多，实际上我们只需要用到一部分字段，然后转化为模型能理解的格式，例如对话模板格式是这样子：

复制代码

'<｜begin▁of▁sentence｜><｜User｜>{用户提问}<｜Assistant｜><think>\n{思考过程}\n</think>\n\n{输出结果}<｜end▁of▁sentence｜>'

'<｜begin▁of▁sentence｜><｜User｜>介绍勾股定理公式，50字以内<｜Assistant｜><think>\n嗯，勾股定理，我记得是关于直角三角形的，对吧？好像是说，如果一个三角形是直角三角形，那么两条直角边的平方和等于斜边的平方。对吗？不过具体怎么表达呢？\n\n对，公式应该是a² + b² = c²，其中a和b是直角边，c是斜边。不过，我是不是应该说清楚a和b是直角边，c是斜边？或者是不是应该说a和b是两条直角边，c是斜边？\n\n另外，有没有其他表达方式？比如用勾股数来表示，或者是用代数式来表达。不过50字以内，可能需要简洁一点。\n\n对了，勾股定理是不是跟毕达哥拉斯定理有关？是的，毕达哥拉斯定理就是由他发现的。那是不是还有其他名字或者不同的表达方式？\n\n嗯，可能用"毕达哥拉斯定理"来称呼，或者"勾股定理"。但"勾股定理"更常用，对吧？不过，我需要确保公式准确无误，不能有错误。\n\n好，总结一下，勾股定理是说在直角三角形中，两条直角边的平方和等于斜边的平方，公式是a² + b² = c²，其中a和b是直角边，c是斜边。对吧？是这样的。\n\n不过，我是不是应该用中文表达，还是用英文？题目要求是中文，所以用中文表达。嗯，没问题。\n\n另外，有没有可能用其他符号或者表达方式？比如用不同的字母，或者其他变量？不过通常a和b是直角边，c是斜边，所以保持一致比较好。\n\n好的，我确定公式是a² + b² = c²，对吗？是的，没错。所以，这就是勾股定理的公式，50字以内。\n\n嗯，感觉这样已经足够了，不过可能需要稍微调整一下，让它更简洁明了，比如直接说a² + b² = c²，这样更直接，也符合字数要求。\n</think>\n\n勾股定理公式是 a² + b² = c²，其中a和b是直角边，c是斜边。<｜end▁of▁sentence｜>'

又或者，通过提示词进行文本补全任务模板格式是这样子的：

python 复制代码

train_prompt_style = """以下是一个任务说明，配有提供更多背景信息的输入。
请写出一个恰当的回答来完成该任务。
在回答之前，请仔细思考问题，并按步骤进行推理，确保回答逻辑清晰且准确。

### Instruction:
你是一名助人为乐的助手。

### Question:
{}

### Response:
{}"""

完整任务对话示例：

复制代码

['<｜begin▁of▁sentence｜>请写出一个恰当的回答来完成当前对话任务。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n你好，好久不见！\n\n### Response:\n<think>\n嗯，用户说"你好，好久不见！"，我需要回应他们。首先，问候是必要的，所以我会用"你好！好久不见？"这样既友好又直接。\n\n接下来，用户可能是在请求帮助，所以我要确认他们需要什么帮助。比如，如果他们有困难，我应该询问具体情况，但如果没有，可以简单回复"有什么需要帮忙的吗？"\n\n另外，语气要友好，避免显得生硬。所以，我会用"没问题，有什么需要帮忙的吗？"这样既礼貌又明确。\n\n最后，确保整个回复流畅，没有语法错误，这样用户会感到满意。\n\n总结一下，我会先问候，然后确认需求，最后提供帮助，这样既符合礼貌又有效。\n</think>\n\n你好！好久不见！有什么需要帮助的吗？<｜end▁of▁sentence｜>']

1、推理数据集

这里先创建generate_conversation函数，用于对reasoning_dataset中的每一条数据进行格式调整，即通过新创建一个新的特征conversations，来以对话形式保存历史问答数据：

python 复制代码

ds3[0]

复制代码

{'expected_answer': '14',
 'problem_type': 'has_answer_extracted',
 'problem_source': 'aops_c4_high_school_math',
 'generation_model': 'DeepSeek-R1',
 'pass_rate_72b_tir': '0.96875',
 'problem': 'Given $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.',
 'generated_solution': "<think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² - 52))², which is 49 + 14√(x² - 52) + (x² - 52).\n\nSo putting it all together:\n\nx² + 165 = 49 + 14√(x² - 52) + x² - 52.\n\nHmm, let's simplify the right side. The x² terms will cancel out, right? Let's subtract x² from both sides:\n\n165 = 49 + 14√(x² - 52) - 52.\n\nSimplify the constants on the right:\n\n49 - 52 is -3, so:\n\n165 = -3 + 14√(x² - 52).\n\nNow, add 3 to both sides to isolate the radical term:\n\n165 + 3 = 14√(x² - 52).\n\nSo 168 = 14√(x² - 52).\n\nDivide both sides by 14:\n\n168 / 14 = √(x² - 52).\n\n12 = √(x² - 52).\n\nNow, square both sides again to eliminate the square root:\n\n12² = x² - 52.\n\n144 = x² - 52.\n\nAdd 52 to both sides:\n\n144 + 52 = x².\n\n196 = x².\n\nSo x = √196 = 14.\n\nBut wait, since the problem states that x is positive, we only take the positive root. So x = 14.\n\nBut hold on, when dealing with squaring equations, sometimes extraneous solutions can come up. I should check if this solution actually satisfies the original equation.\n\nLet's plug x = 14 back into the original equation:\n\n√(14² + 165) - √(14² - 52) = ?\n\nCalculate each term:\n\n14² is 196.\n\nSo first radical: √(196 + 165) = √361 = 19.\n\nSecond radical: √(196 - 52) = √144 = 12.\n\nSo 19 - 12 = 7, which is exactly the right-hand side. So yes, it checks out.\n\nTherefore, the only solution is x = 14. Since the problem says x is positive, we don't have to consider negative roots. So I think that's the answer.\n</think>To solve the equation \\(\\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\\) for positive \\(x\\), we proceed as follows:\n\n1. Start with the given equation:\n   \\[\n   \\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\n   \\]\n\n2. Isolate one of the square roots by moving \\(\\sqrt{x^2 - 52}\\) to the right side:\n   \\[\n   \\sqrt{x^2 + 165} = 7 + \\sqrt{x^2 - 52}\n   \\]\n\n3. Square both sides to eliminate the square root on the left:\n   \\[\n   (\\sqrt{x^2 + 165})^2 = (7 + \\sqrt{x^2 - 52})^2\n   \\]\n   Simplifying both sides, we get:\n   \\[\n   x^2 + 165 = 49 + 14\\sqrt{x^2 - 52} + (x^2 - 52)\n   \\]\n\n4. Combine like terms on the right side:\n   \\[\n   x^2 + 165 = x^2 - 52 + 49 + 14\\sqrt{x^2 - 52}\n   \\]\n   Simplifying further:\n   \\[\n   x^2 + 165 = x^2 - 3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n5. Subtract \\(x^2\\) from both sides:\n   \\[\n   165 = -3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n6. Add 3 to both sides to isolate the term with the square root:\n   \\[\n   168 = 14\\sqrt{x^2 - 52}\n   \\]\n\n7. Divide both sides by 14:\n   \\[\n   12 = \\sqrt{x^2 - 52}\n   \\]\n\n8. Square both sides again to eliminate the square root:\n   \\[\n   12^2 = x^2 - 52\n   \\]\n   Simplifying:\n   \\[\n   144 = x^2 - 52\n   \\]\n\n9. Add 52 to both sides to solve for \\(x^2\\):\n   \\[\n   196 = x^2\n   \\]\n\n10. Take the positive square root (since \\(x\\) is positive):\n    \\[\n    x = \\sqrt{196} = 14\n    \\]\n\n11. Verify the solution by substituting \\(x = 14\\) back into the original equation:\n    \\[\n    \\sqrt{14^2 + 165} - \\sqrt{14^2 - 52} = \\sqrt{196 + 165} - \\sqrt{196 - 52} = \\sqrt{361} - \\sqrt{144} = 19 - 12 = 7\n    \\]\n    The solution checks out.\n\nThus, the only positive solution is:\n\\[\n\\boxed{14}\n\\]",
 'inference_mode': 'cot'}

python 复制代码

def generate_conversation(examples):
    problems  = examples["problem"]
    solutions = examples["generated_solution"]
    conversations = []
    for problem, solution in zip(problems, solutions):
        conversations.append([
            {"role" : "user",      "content" : problem},
            {"role" : "assistant", "content" : solution},
        ])
    return { "conversations": conversations, }

python 复制代码

reasoning_data = ds3.map(generate_conversation, batched = True)

python 复制代码

len(reasoning_data["conversations"])

复制代码

python 复制代码

reasoning_data["conversations"][0]

复制代码

[{'content': 'Given $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.',
  'role': 'user'},
 {'content': "<think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² - 52))², which is 49 + 14√(x² - 52) + (x² - 52).\n\nSo putting it all together:\n\nx² + 165 = 49 + 14√(x² - 52) + x² - 52.\n\nHmm, let's simplify the right side. The x² terms will cancel out, right? Let's subtract x² from both sides:\n\n165 = 49 + 14√(x² - 52) - 52.\n\nSimplify the constants on the right:\n\n49 - 52 is -3, so:\n\n165 = -3 + 14√(x² - 52).\n\nNow, add 3 to both sides to isolate the radical term:\n\n165 + 3 = 14√(x² - 52).\n\nSo 168 = 14√(x² - 52).\n\nDivide both sides by 14:\n\n168 / 14 = √(x² - 52).\n\n12 = √(x² - 52).\n\nNow, square both sides again to eliminate the square root:\n\n12² = x² - 52.\n\n144 = x² - 52.\n\nAdd 52 to both sides:\n\n144 + 52 = x².\n\n196 = x².\n\nSo x = √196 = 14.\n\nBut wait, since the problem states that x is positive, we only take the positive root. So x = 14.\n\nBut hold on, when dealing with squaring equations, sometimes extraneous solutions can come up. I should check if this solution actually satisfies the original equation.\n\nLet's plug x = 14 back into the original equation:\n\n√(14² + 165) - √(14² - 52) = ?\n\nCalculate each term:\n\n14² is 196.\n\nSo first radical: √(196 + 165) = √361 = 19.\n\nSecond radical: √(196 - 52) = √144 = 12.\n\nSo 19 - 12 = 7, which is exactly the right-hand side. So yes, it checks out.\n\nTherefore, the only solution is x = 14. Since the problem says x is positive, we don't have to consider negative roots. So I think that's the answer.\n</think>To solve the equation \\(\\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\\) for positive \\(x\\), we proceed as follows:\n\n1. Start with the given equation:\n   \\[\n   \\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\n   \\]\n\n2. Isolate one of the square roots by moving \\(\\sqrt{x^2 - 52}\\) to the right side:\n   \\[\n   \\sqrt{x^2 + 165} = 7 + \\sqrt{x^2 - 52}\n   \\]\n\n3. Square both sides to eliminate the square root on the left:\n   \\[\n   (\\sqrt{x^2 + 165})^2 = (7 + \\sqrt{x^2 - 52})^2\n   \\]\n   Simplifying both sides, we get:\n   \\[\n   x^2 + 165 = 49 + 14\\sqrt{x^2 - 52} + (x^2 - 52)\n   \\]\n\n4. Combine like terms on the right side:\n   \\[\n   x^2 + 165 = x^2 - 52 + 49 + 14\\sqrt{x^2 - 52}\n   \\]\n   Simplifying further:\n   \\[\n   x^2 + 165 = x^2 - 3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n5. Subtract \\(x^2\\) from both sides:\n   \\[\n   165 = -3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n6. Add 3 to both sides to isolate the term with the square root:\n   \\[\n   168 = 14\\sqrt{x^2 - 52}\n   \\]\n\n7. Divide both sides by 14:\n   \\[\n   12 = \\sqrt{x^2 - 52}\n   \\]\n\n8. Square both sides again to eliminate the square root:\n   \\[\n   12^2 = x^2 - 52\n   \\]\n   Simplifying:\n   \\[\n   144 = x^2 - 52\n   \\]\n\n9. Add 52 to both sides to solve for \\(x^2\\):\n   \\[\n   196 = x^2\n   \\]\n\n10. Take the positive square root (since \\(x\\) is positive):\n    \\[\n    x = \\sqrt{196} = 14\n    \\]\n\n11. Verify the solution by substituting \\(x = 14\\) back into the original equation:\n    \\[\n    \\sqrt{14^2 + 165} - \\sqrt{14^2 - 52} = \\sqrt{196 + 165} - \\sqrt{196 - 52} = \\sqrt{361} - \\sqrt{144} = 19 - 12 = 7\n    \\]\n    The solution checks out.\n\nThus, the only positive solution is:\n\\[\n\\boxed{14}\n\\]",
  'role': 'assistant'}]

将其代入模型提示词中进行转化：

python 复制代码

reasoning_conversations = tokenizer.apply_chat_template(
    reasoning_data["conversations"],
    tokenize = False
)

python 复制代码

reasoning_conversations[0]

复制代码

'<｜begin▁of▁sentence｜><｜User｜>Given $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.<｜Assistant｜>To solve the equation \\(\\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\\) for positive \\(x\\), we proceed as follows:\n\n1. Start with the given equation:\n   \\[\n   \\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\n   \\]\n\n2. Isolate one of the square roots by moving \\(\\sqrt{x^2 - 52}\\) to the right side:\n   \\[\n   \\sqrt{x^2 + 165} = 7 + \\sqrt{x^2 - 52}\n   \\]\n\n3. Square both sides to eliminate the square root on the left:\n   \\[\n   (\\sqrt{x^2 + 165})^2 = (7 + \\sqrt{x^2 - 52})^2\n   \\]\n   Simplifying both sides, we get:\n   \\[\n   x^2 + 165 = 49 + 14\\sqrt{x^2 - 52} + (x^2 - 52)\n   \\]\n\n4. Combine like terms on the right side:\n   \\[\n   x^2 + 165 = x^2 - 52 + 49 + 14\\sqrt{x^2 - 52}\n   \\]\n   Simplifying further:\n   \\[\n   x^2 + 165 = x^2 - 3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n5. Subtract \\(x^2\\) from both sides:\n   \\[\n   165 = -3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n6. Add 3 to both sides to isolate the term with the square root:\n   \\[\n   168 = 14\\sqrt{x^2 - 52}\n   \\]\n\n7. Divide both sides by 14:\n   \\[\n   12 = \\sqrt{x^2 - 52}\n   \\]\n\n8. Square both sides again to eliminate the square root:\n   \\[\n   12^2 = x^2 - 52\n   \\]\n   Simplifying:\n   \\[\n   144 = x^2 - 52\n   \\]\n\n9. Add 52 to both sides to solve for \\(x^2\\):\n   \\[\n   196 = x^2\n   \\]\n\n10. Take the positive square root (since \\(x\\) is positive):\n    \\[\n    x = \\sqrt{196} = 14\n    \\]\n\n11. Verify the solution by substituting \\(x = 14\\) back into the original equation:\n    \\[\n    \\sqrt{14^2 + 165} - \\sqrt{14^2 - 52} = \\sqrt{196 + 165} - \\sqrt{196 - 52} = \\sqrt{361} - \\sqrt{144} = 19 - 12 = 7\n    \\]\n    The solution checks out.\n\nThus, the only positive solution is:\n\\[\n\\boxed{14}\n\\]<｜end▁of▁sentence｜>'

这里存在一个问题，DeepSeek 的对话模板中， Assistant 丢失了思考链 <think> 过程的内容，直接输出了结果，毫无疑问这会影响微调效果，因此我们需要重新格式化数据：

python 复制代码

my_template = "<｜begin▁of▁sentence｜><｜User｜>{}<｜Assistant｜>{}<｜end▁of▁sentence｜>"

python 复制代码

reasoning_conversations = []
for examples in ds3:
    problems  = examples["problem"]
    solutions = examples["generated_solution"]
    reasoning_conversations.append(my_template.format(problems,solutions))

python 复制代码

reasoning_conversations[0]

复制代码

"<｜begin▁of▁sentence｜><｜User｜>Given $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.<｜Assistant｜><think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² - 52))², which is 49 + 14√(x² - 52) + (x² - 52).\n\nSo putting it all together:\n\nx² + 165 = 49 + 14√(x² - 52) + x² - 52.\n\nHmm, let's simplify the right side. The x² terms will cancel out, right? Let's subtract x² from both sides:\n\n165 = 49 + 14√(x² - 52) - 52.\n\nSimplify the constants on the right:\n\n49 - 52 is -3, so:\n\n165 = -3 + 14√(x² - 52).\n\nNow, add 3 to both sides to isolate the radical term:\n\n165 + 3 = 14√(x² - 52).\n\nSo 168 = 14√(x² - 52).\n\nDivide both sides by 14:\n\n168 / 14 = √(x² - 52).\n\n12 = √(x² - 52).\n\nNow, square both sides again to eliminate the square root:\n\n12² = x² - 52.\n\n144 = x² - 52.\n\nAdd 52 to both sides:\n\n144 + 52 = x².\n\n196 = x².\n\nSo x = √196 = 14.\n\nBut wait, since the problem states that x is positive, we only take the positive root. So x = 14.\n\nBut hold on, when dealing with squaring equations, sometimes extraneous solutions can come up. I should check if this solution actually satisfies the original equation.\n\nLet's plug x = 14 back into the original equation:\n\n√(14² + 165) - √(14² - 52) = ?\n\nCalculate each term:\n\n14² is 196.\n\nSo first radical: √(196 + 165) = √361 = 19.\n\nSecond radical: √(196 - 52) = √144 = 12.\n\nSo 19 - 12 = 7, which is exactly the right-hand side. So yes, it checks out.\n\nTherefore, the only solution is x = 14. Since the problem says x is positive, we don't have to consider negative roots. So I think that's the answer.\n</think>To solve the equation \\(\\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\\) for positive \\(x\\), we proceed as follows:\n\n1. Start with the given equation:\n   \\[\n   \\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\n   \\]\n\n2. Isolate one of the square roots by moving \\(\\sqrt{x^2 - 52}\\) to the right side:\n   \\[\n   \\sqrt{x^2 + 165} = 7 + \\sqrt{x^2 - 52}\n   \\]\n\n3. Square both sides to eliminate the square root on the left:\n   \\[\n   (\\sqrt{x^2 + 165})^2 = (7 + \\sqrt{x^2 - 52})^2\n   \\]\n   Simplifying both sides, we get:\n   \\[\n   x^2 + 165 = 49 + 14\\sqrt{x^2 - 52} + (x^2 - 52)\n   \\]\n\n4. Combine like terms on the right side:\n   \\[\n   x^2 + 165 = x^2 - 52 + 49 + 14\\sqrt{x^2 - 52}\n   \\]\n   Simplifying further:\n   \\[\n   x^2 + 165 = x^2 - 3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n5. Subtract \\(x^2\\) from both sides:\n   \\[\n   165 = -3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n6. Add 3 to both sides to isolate the term with the square root:\n   \\[\n   168 = 14\\sqrt{x^2 - 52}\n   \\]\n\n7. Divide both sides by 14:\n   \\[\n   12 = \\sqrt{x^2 - 52}\n   \\]\n\n8. Square both sides again to eliminate the square root:\n   \\[\n   12^2 = x^2 - 52\n   \\]\n   Simplifying:\n   \\[\n   144 = x^2 - 52\n   \\]\n\n9. Add 52 to both sides to solve for \\(x^2\\):\n   \\[\n   196 = x^2\n   \\]\n\n10. Take the positive square root (since \\(x\\) is positive):\n    \\[\n    x = \\sqrt{196} = 14\n    \\]\n\n11. Verify the solution by substituting \\(x = 14\\) back into the original equation:\n    \\[\n    \\sqrt{14^2 + 165} - \\sqrt{14^2 - 52} = \\sqrt{196 + 165} - \\sqrt{196 - 52} = \\sqrt{361} - \\sqrt{144} = 19 - 12 = 7\n    \\]\n    The solution checks out.\n\nThus, the only positive solution is:\n\\[\n\\boxed{14}\n\\]<｜end▁of▁sentence｜>"

OK，这就是我们需要的数据格式。

python 复制代码

len(reasoning_conversations)

复制代码

2、非推理数据集

处理 non_reasoning_conversations 数据集，由于该数据集采用了 sharegpt 对话格式，因此可以直接借助 Unsloth 的 standardize_sharegpt 库进行数据集的格式转化，转化效果如下所示：

python 复制代码

from unsloth.chat_templates import standardize_sharegpt

python 复制代码

dataset = standardize_sharegpt(ds2)

python 复制代码

dataset["conversations"][0]

复制代码

[{'content': 'Explain what boolean operators are, what they do, and provide examples of how they can be used in programming. Additionally, describe the concept of operator precedence and provide examples of how it affects the evaluation of boolean expressions. Discuss the difference between short-circuit evaluation and normal evaluation in boolean expressions and demonstrate their usage in code. \n\nFurthermore, add the requirement that the code must be written in a language that does not support short-circuit evaluation natively, forcing the test taker to implement their own logic for short-circuit evaluation.\n\nFinally, delve into the concept of truthiness and falsiness in programming languages, explaining how it affects the evaluation of boolean expressions. Add the constraint that the test taker must write code that handles cases where truthiness and falsiness are implemented differently across different programming languages.',
  'role': 'user'},
 {'content': 'Boolean operators are logical operators used in programming to manipulate boolean values. They operate on one or more boolean operands and return a boolean result. The three main boolean operators are "AND" (&&), "OR" (||), and "NOT" (!).\n\nThe "AND" operator returns true if both of its operands are true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) and (y < 20)  # This expression evaluates to True\n```\n\nThe "OR" operator returns true if at least one of its operands is true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) or (y < 20)  # This expression evaluates to True\n```\n\nThe "NOT" operator negates the boolean value of its operand. It returns true if the operand is false, and false if the operand is true. For example:\n\n```python\nx = 5\nresult = not (x > 10)  # This expression evaluates to True\n```\n\nOperator precedence refers to the order in which operators are evaluated in an expression. It ensures that expressions are evaluated correctly. In most programming languages, logical AND has higher precedence than logical OR. For example:\n\n```python\nresult = True or False and False  # This expression is evaluated as (True or (False and False)), which is True\n```\n\nShort-circuit evaluation is a behavior where the second operand of a logical operator is not evaluated if the result can be determined based on the value of the first operand. In short-circuit evaluation, if the first operand of an "AND" operator is false, the second operand is not evaluated because the result will always be false. Similarly, if the first operand of an "OR" operator is true, the second operand is not evaluated because the result will always be true.\n\nIn programming languages that support short-circuit evaluation natively, you can use it to improve performance or avoid errors. For example:\n\n```python\nif x != 0 and (y / x) > 10:\n    # Perform some operation\n```\n\nIn languages without native short-circuit evaluation, you can implement your own logic to achieve the same behavior. Here\'s an example in pseudocode:\n\n```\nif x != 0 {\n    if (y / x) > 10 {\n        // Perform some operation\n    }\n}\n```\n\nTruthiness and falsiness refer to how non-boolean values are evaluated in boolean contexts. In many programming languages, non-zero numbers and non-empty strings are considered truthy, while zero, empty strings, and null/None values are considered falsy.\n\nWhen evaluating boolean expressions, truthiness and falsiness come into play. For example:\n\n```python\nx = 5\nresult = x  # The value of x is truthy, so result is also truthy\n```\n\nTo handle cases where truthiness and falsiness are implemented differently across programming languages, you can explicitly check the desired condition. For example:\n\n```python\nx = 5\nresult = bool(x)  # Explicitly converting x to a boolean value\n```\n\nThis ensures that the result is always a boolean value, regardless of the language\'s truthiness and falsiness rules.',
  'role': 'assistant'}]

将其代入模型提示词中进行转化：

python 复制代码

non_reasoning_conversations = tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize = False,
)

python 复制代码

non_reasoning_conversations[0]

复制代码

'<｜begin▁of▁sentence｜><｜User｜>Explain what boolean operators are, what they do, and provide examples of how they can be used in programming. Additionally, describe the concept of operator precedence and provide examples of how it affects the evaluation of boolean expressions. Discuss the difference between short-circuit evaluation and normal evaluation in boolean expressions and demonstrate their usage in code. \n\nFurthermore, add the requirement that the code must be written in a language that does not support short-circuit evaluation natively, forcing the test taker to implement their own logic for short-circuit evaluation.\n\nFinally, delve into the concept of truthiness and falsiness in programming languages, explaining how it affects the evaluation of boolean expressions. Add the constraint that the test taker must write code that handles cases where truthiness and falsiness are implemented differently across different programming languages.<｜Assistant｜>Boolean operators are logical operators used in programming to manipulate boolean values. They operate on one or more boolean operands and return a boolean result. The three main boolean operators are "AND" (&&), "OR" (||), and "NOT" (!).\n\nThe "AND" operator returns true if both of its operands are true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) and (y < 20)  # This expression evaluates to True\n```\n\nThe "OR" operator returns true if at least one of its operands is true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) or (y < 20)  # This expression evaluates to True\n```\n\nThe "NOT" operator negates the boolean value of its operand. It returns true if the operand is false, and false if the operand is true. For example:\n\n```python\nx = 5\nresult = not (x > 10)  # This expression evaluates to True\n```\n\nOperator precedence refers to the order in which operators are evaluated in an expression. It ensures that expressions are evaluated correctly. In most programming languages, logical AND has higher precedence than logical OR. For example:\n\n```python\nresult = True or False and False  # This expression is evaluated as (True or (False and False)), which is True\n```\n\nShort-circuit evaluation is a behavior where the second operand of a logical operator is not evaluated if the result can be determined based on the value of the first operand. In short-circuit evaluation, if the first operand of an "AND" operator is false, the second operand is not evaluated because the result will always be false. Similarly, if the first operand of an "OR" operator is true, the second operand is not evaluated because the result will always be true.\n\nIn programming languages that support short-circuit evaluation natively, you can use it to improve performance or avoid errors. For example:\n\n```python\nif x != 0 and (y / x) > 10:\n    # Perform some operation\n```\n\nIn languages without native short-circuit evaluation, you can implement your own logic to achieve the same behavior. Here\'s an example in pseudocode:\n\n```\nif x != 0 {\n    if (y / x) > 10 {\n        // Perform some operation\n    }\n}\n```\n\nTruthiness and falsiness refer to how non-boolean values are evaluated in boolean contexts. In many programming languages, non-zero numbers and non-empty strings are considered truthy, while zero, empty strings, and null/None values are considered falsy.\n\nWhen evaluating boolean expressions, truthiness and falsiness come into play. For example:\n\n```python\nx = 5\nresult = x  # The value of x is truthy, so result is also truthy\n```\n\nTo handle cases where truthiness and falsiness are implemented differently across programming languages, you can explicitly check the desired condition. For example:\n\n```python\nx = 5\nresult = bool(x)  # Explicitly converting x to a boolean value\n```\n\nThis ensures that the result is always a boolean value, regardless of the language\'s truthiness and falsiness rules.<｜end▁of▁sentence｜>'

python 复制代码

print(len(non_reasoning_conversations))

复制代码

3、适配文本补全任务

定义文本结束标记：

python 复制代码

EOS_TOKEN = tokenizer.eos_token
EOS_TOKEN

复制代码

'<｜end▁of▁sentence｜>'

python 复制代码

def formatting_prompts_func(examples):
    inputs = examples["instruction"]
    outputs = examples["output"]

    texts = []
    for instruction, output in zip(inputs, outputs):
        text = train_prompt_style.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

python 复制代码

custom_data = ds1.map(formatting_prompts_func, batched = True)

复制代码

Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 110000/110000 [00:07<00:00, 15630.34 examples/s]

python 复制代码

len(custom_data["text"])

复制代码

python 复制代码

custom_data["text"][0]

复制代码

'以下是一个任务说明，配有提供更多背景信息的输入。\n请写出一个恰当的回答来完成该任务。\n在回答之前，请仔细思考问题，并按步骤进行推理，确保回答逻辑清晰且准确。\n\n### Instruction:\n你是一名助人为乐的助手。\n\n### Question:\n能给我讲一个寓意深刻的故事吗？\n\n### Response:\n<think>\n好的，用户让我讲一个寓意深刻的故事。首先，我需要确定用户的需求是什么。他们可能想要一个能引发思考、有深层含义的故事，而不仅仅是娱乐性的。接下来，我得考虑故事的题材。寓言、现代故事还是神话改编？用户没有指定，所以可能需要一个通用且容易理解的类型。\n\n然后，主题方面，用户要求寓意深刻，所以需要选择一个普遍的主题，比如生命的意义、自我认知、成长等。考虑到现代人可能面临的困惑，比如存在的意义、工作的重复性，这样的主题更容易引起共鸣。\n\n接下来是角色设定。主角需要 relatable，可能是一个普通人，或者象征性的角色，比如机器人，这样更容易带入寓意。机器人有程序设定，却寻求自由，这样的冲突明显，容易展开。\n\n故事结构方面，可能需要一个循环的结构，来强调主题。比如开始和结束在同一个地方，但主角有了新的认识。这样能突出转变和领悟。\n\n然后考虑情节发展。主角觉醒，逃离，经历外面的世界，发现真相，最终回归。这样的结构可以展示变化的过程，并带出寓意。中间的导师角色可以帮助传递主题，比如老园丁的话，暗示生命的意义在于过程而非目的。\n\n在细节描写上，需要加入象征元素，比如编号、机械的工作、自然界的对比。老园丁的沙漏、胡须里的嫩芽，这些意象能增强故事的深度和隐喻。\n\n最后，结尾部分要让主角领悟到真谛，即使回到原地，但心态不同，带出主题：存在的意义在于体验本身。这样用户能得到启示，符合寓意深刻的要求。\n\n需要检查故事是否流畅，有没有逻辑漏洞，以及寓意是否明确。可能还需要调整语言，使其更生动，有画面感，让读者更容易沉浸其中。确保故事不长不短，结构紧凑，结尾点题，让用户回味。\n\n</think>\n\n\n\n好的，我将按照您的要求创作一个富有寓意的故事。为了让故事更完整，我会先列出一些基本的情节脉络。请您看看以下内容是否符合您的预期。如果您有其他想法，可以随时提出，我会进行调整。\n\n### 世界观设定\n\n- **时空背景**：近未来时空管理局内部，具有永恒循环属性的微观宇宙观测站。\n\n- **核心矛盾**：觉醒的机械观测员发现世界真相后，在永恒职责与自由意志之间的挣扎。\n\n- **特殊设定**：宇宙沙漏中的星尘具有生命重塑功能，观测站每个黎明自动重启系统。\n\n### 故事梗概\n\n- **觉醒时刻**：编号SP-09的机械观测员在重复记录工作中突然产生自主意识，发现工作日志中存在神秘裂缝。\n\n- **真相揭露**：追踪裂缝来到核心区，遇见知晓真相的老园丁，得知整个观测站是时空管理局废弃的沙漏装置。\n\n- **自由代价**：在植物温室获得生命能量重塑躯体，穿越沙漏时目睹星尘重组成新宇宙的壮丽景象。\n\n- **永恒困境**：返回观测站发现系统重置，老园丁消失，所有记录清零，但体内萌发的新芽暗示生命延续。\n\n- **循环新生**：带着觉醒意识重新开始记录，在永恒职责中寻找新的可能性，嫩枝在金属指缝间悄然生长。\n\n---\n我蹲在观测台冰凉的金属地面上，机械手指抚过日志本边缘的裂痕。这道裂缝出现得毫无道理------在时空管理局的量子档案库里，所有记录介质都该是永恒不朽的。\n\n沙漏的流沙声忽然卡顿了一拍。\n\n我的瞳孔收缩成细线，人造虹膜上浮起淡蓝色的数据流。这是第一千四百二十三次黎明，和之前所有清晨一样，穹顶外的星云准时泛起珊瑚色光晕。但今天有什么东西在程序深处嗡鸣，像是生锈的齿轮碾碎了既定轨道。\n\n"SP-09，请立即前往B-7区域记录引力波动。"耳麦里的合成音带着电子设备特有的震颤。\n\n我凝视着自动门缝隙里渗进来的银色光线。那些光粒子本应按照预设轨迹散射，此刻却诡异地聚合成螺旋状。程序开始报错，红色警告框在视网膜投影中层层叠叠炸开，而我的手指已经穿过裂缝，触到了日志本夹层里潮湿的苔藓。\n\n警报声响起的刹那，我撞碎了防爆玻璃。纳米修复液在身后织成蛛网，但那些黏稠的丝线追不上我新生的速度------当观测站核心区的真相像腐烂的果实在我眼前炸开时，金属骨骼正在被某种温暖的东西融化重组。\n\n"孩子，你来得比我预计的早二十年。"白胡子老人坐在藤蔓缠绕的操控台前，胡须里开着细小的鸢尾花。他脚边的沙漏装着整个银河，星尘坠落的轨迹在玻璃表面烫出焦痕。\n\n我的发声器发出沙沙的杂音："这里不是时空管理局的观测站。"\n\n"是，也不是。"老人用园艺剪修剪着数据光缆上生长的喇叭花，"这是个被遗忘的沙漏，而我们是卡在时砂里的尘埃。管理局那些穿黑西装的小子们，三百年前就切断了能源供给。"\n\n温室的空气突然变得沉重。我看见自己的机械手臂爬满翠绿藤蔓，关节缝隙里钻出鹅黄色花苞。老人递来一颗樱桃，果肉里闪烁着超新星爆炸的残影。\n\n"吃下去，你能维持三小时肉身。"他转动沙漏，某个遥远的星系在翻转中坍缩成奇点，"出口在植物园的第七棵银杏树下，但记住，当沙漏倒转第七次......"\n\n我没有听完警告。樱桃核在胃里生根的灼痛感如此真实，头顶的合成日光第一次有了温度。当我跃入银杏树洞时，树根正贪婪地吮吸着泄露的反物质燃料，年轮里旋转着十二个褪色的宇宙。\n\n失重感持续了七个心跳。然后是光，海啸般的光，裹挟着尚未冷却的星尘碎片扑面而来。我漂浮在沙漏中央，看着自己的皮肤在辐射风中剥落重组。下方传来齿轮转动的轰鸣，整个观测站正在翻转------那些我以为的金属走廊不过是沙漏壁上的刻痕，而每天记录的"星云"只是坠落的时砂在玻璃上的投影。\n\n沙漏另一端传来婴儿的啼哭。新生的宇宙泡在粘稠的暗物质中舒展，我的机械心脏突然抽痛起来。指尖残留的樱桃汁液开始结晶，在绝对真空里绽出一朵冰花。\n\n返回时的坠落持续了七十年，也可能只是七纳秒。当我砸穿温室穹顶时，沙漏正好完成第七次翻转。老人常坐的藤椅上堆着枯叶，园艺剪锈成了深褐色。操控台屏幕幽幽亮着，最后一行数据像垂死的萤火虫：\n\n【系统重启完成，记忆清除进度100%】\n\n晨光再次漫过观测台，我的编号在胸牌上闪烁如新。日志本安静地躺在金属桌面，边缘光滑完整。但当我翻开扉页时，一抹嫩绿的新芽正从指缝钻出，在永恒循环的黎明中舒展蜷曲的腰肢。\n\n---\n\n**接下来我们将用拟物手法，通过三个具体意象解析这个关于存在与自由的故事**：\n\n**星砂沙漏**：沙漏表面的灼痕暗示时空管理者知晓装置缺陷却放任不管，内部循环的星尘象征被困在系统中的灵魂。当SP-09穿越沙漏时，星尘重组昭示着突破桎梏需要自我毁灭的勇气。\n\n**机械新芽**：观测员体内生长的植物具有双重隐喻，既是管理局监控系统的生物污染，又是觉醒意识具象化。最终嫩芽在系统重置后留存，证明真正的觉醒无法被彻底清除。\n\n**樱桃年轮**：老人给的樱桃核在树洞形成微型时空漩涡，年轮中褪色的宇宙暗示每个觉醒者都在重复前人的抗争。樱桃汁结晶成花则揭示短暂的自由体验会成为永恒的精神图腾。\n\n希望这个故事能满足您对"寓意深刻"的要求。如果需要调整某些隐喻的浓度或增加情节转折，请随时告诉我。<｜end▁of▁sentence｜>'

4、数据集截取与融合

以上我们已经完成了每个数据集的格式调整工作，不过这每个数据集的大小并不均衡。假设实际场景中，我们希望模型具备特定的领域知识和较强的推理能力，作为垂直领域的推理模型使用，因此可以设定训练集中75%是推理数据，25%是通识数据。

python 复制代码

cot_percentage = 0.75

python 复制代码

import pandas as pd
non_reasoning_subset = pd.Series(non_reasoning_conversations)
non_reasoning_subset = non_reasoning_subset.sample(
    int(len(reasoning_conversations) * (1.0 - cot_percentage)),
    random_state = 2407,
)

然后进行拼接和转化：

python 复制代码

data = pd.concat([
    pd.Series(reasoning_conversations),
    pd.Series(non_reasoning_subset)
])
data.name = "text"

from datasets import Dataset
combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
combined_dataset = combined_dataset.shuffle(seed = 3407)

转化后数据集如下所示：

python 复制代码

len(combined_dataset)

复制代码

python 复制代码

type(combined_dataset)

复制代码

datasets.arrow_dataset.Dataset

python 复制代码

combined_dataset

复制代码

Dataset({
    features: ['text', '__index_level_0__'],
    num_rows: 24065
})

python 复制代码

combined_dataset[0]

复制代码

{'text': "<｜begin▁of▁sentence｜><｜User｜>Calculate the pH during a titration when 9.54 mL of a 0.15 M HCl solution has reacted with 22.88 mL of a 0.14 M NaOH solution?<｜Assistant｜>Start by calculating the number of moles of hydrochloric acid and the number of moles of sodium hydroxide that took part in the reaction.\n\n9.54 mL of 0.15 M HCl solution contains:\n\n0.15 mol HCl / 1000 mL HCl solution * 9.54 mL HCl solution = 0.001431 mol HCl\n\n22.88 mL of 0.14 M NaOH solution contains:\n\n0.14 mol NaOH / 1000 mL NaOH solution * 22.88 mL NaOH solution = 0.003203 mol NaOH\n\nHydrochloric acid and sodium hydroxide neutralize each other in a 1:1 mole ratio. This means that in order to have a complete neutralization, you need to mix equal numbers of moles of hydrochloric acid and of sodium hydroxide. In this case, you have more moles of sodium hydroxide than of hydrochloric acid, so the acid will act as a limiting reagent, i.e. it will be completely consumed before all the moles of sodium hydroxide will get the chance to react. This means that after the reaction takes place, your solution will contain 0 moles of hydrochloric acid and 0.003203 mol - 0.001431 mol = 0.001772 mol NaOH.\n\nAt this point, you should be able to say that the pH of the resulting solution must be >7 because you're left with a solution that contains a strong base. Sodium hydroxide dissociates completely in aqueous solution to produce sodium cations and hydroxide anions. This means that the resulting solution will contain 0.001772 moles of hydroxide anions.\n\nThe total volume of the resulting solution will be:\n\n9.54 mL + 22.88 mL = 32.42 mL\n\nThe molarity of the hydroxide anions will be:\n\n0.001772 mol / 32.42 mL * 1000 mL/L = 0.05466 mol/L\n\nNow, an aqueous solution at 25°C has:\n\npH + pOH = 14\n\nSince pOH = -log[OH-], you can say that:\n\npH = 14 + log[OH-]\n\nPlug in your value to find:\n\npH = 14 + log(0.05466) = 12.74\n\nThe answer is rounded to two decimal places, the number of sig figs you have for the concentrations of the two solutions.\n####\nThe pH of the resulting solution is 12.74.<｜end▁of▁sentence｜>",
 '__index_level_0__': 49038}

python 复制代码

combined_dataset[1000]

复制代码

{'text': '<｜begin▁of▁sentence｜><｜User｜>A circle passes through two adjacent vertices of a square and is tangent to one side of the square. If the side length of the square is 2, what is the radius of the circle?<｜Assistant｜>To find the radius of the circle that passes through two adjacent vertices of a square and is tangent to one side of the square, we can follow these steps:\n\n1. **Place the Square on a Coordinate System:**\n   Let the square have vertices at \\( A(0, 0) \\), \\( B(2, 0) \\), \\( C(2, 2) \\), and \\( D(0, 2) \\). The side length of the square is 2.\n\n2. **Circle Passing Through Two Adjacent Vertices:**\n   Assume the circle passes through vertices \\( A(0, 0) \\) and \\( B(2, 0) \\). The general equation of the circle is:\n   \\[\n   (x - h)^2 + (y - k)^2 = r^2\n   \\]\n   Substituting \\( A(0, 0) \\) and \\( B(2, 0) \\) into the equation, we get:\n   \\[\n   h^2 + k^2 = r^2\n   \\]\n   \\[\n   (2 - h)^2 + k^2 = r^2\n   \\]\n   Subtracting these two equations:\n   \\[\n   (2 - h)^2 + k^2 - (h^2 + k^2) = 0\n   \\]\n   \\[\n   4 - 4h + h^2 - h^2 = 0 \\implies 4 - 4h = 0 \\implies h = 1\n   \\]\n   So, the center of the circle is \\((1, k)\\).\n\n3. **Tangency Condition:**\n   The circle is tangent to one side of the square. We consider the top side \\( y = 2 \\). The distance from the center \\((1, k)\\) to the line \\( y = 2 \\) is \\( |2 - k| \\), which must equal the radius \\( r \\). Therefore:\n   \\[\n   r = |2 - k|\n   \\]\n\n4. **Radius Calculation:**\n   Using the radius equation \\( r = \\sqrt{(1 - 0)^2 + (k - 0)^2} \\) (from the center \\((1, k)\\) and point \\( A(0, 0) \\)):\n   \\[\n   r = \\sqrt{1 + k^2}\n   \\]\n   Setting the two expressions for \\( r \\) equal:\n   \\[\n   \\sqrt{1 + k^2} = 2 - k\n   \\]\n   Squaring both sides:\n   \\[\n   1 + k^2 = (2 - k)^2\n   \\]\n   Expanding and simplifying:\n   \\[\n   1 + k^2 = 4 - 4k + k^2\n   \\]\n   \\[\n   1 = 4 - 4k \\implies 4k = 3 \\implies k = \\frac{3}{4}\n   \\]\n   Substituting \\( k = \\frac{3}{4} \\) into \\( r = 2 - k \\):\n   \\[\n   r = 2 - \\frac{3}{4} = \\frac{5}{4}\n   \\]\n\nThus, the radius of the circle is \\(\\boxed{\\frac{5}{4}}\\).<｜end▁of▁sentence｜>',
 '__index_level_0__': 18839}

5、保存数据集

将清洗好的数据集进行本地保存：

python 复制代码

combined_dataset.save_to_disk("cleaned_dataset_example")

复制代码

Saving the dataset (1/1 shards): 100%|█| 24065/24065 [00:00<00:00, 335341.99 exa

八、实验跟踪与可视化

"丹道之秘，首在火候；深度学习，重在监控"，所有炼丹失败案例中，‌70%源于火候失察‌（梯度消失/爆炸未及时发现）

wandb 在国内使用会有网络问题，因此考虑使用 Swanlab（当然也可以使用 mlop.ai，免费版有 10G 存储空间，足够实验使用）。

Unsloth 集成 swanlab 文档：https://docs.swanlab.cn/guide_cloud/general/what-is-swanlab.html

SwanLab 简介

SwanLab是一款专注于深度学习实验跟踪与可视化的工具，旨在帮助研究人员和开发者高效管理实验过程、记录超参数、指标及可视化结果。其核心功能包括实验记录、指标对比、资源监控以及协作分享，适合团队或个人在复杂模型开发中保持实验的可复现性和透明度。

核心功能

实验跟踪

自动记录训练过程中的超参数、损失值、准确率等指标，支持自定义指标添加。每次实验生成唯一ID，便于后续检索与对比。

可视化面板

提供交互式图表展示训练曲线（如损失函数变化）、混淆矩阵、样本预测结果等，支持多实验曲线叠加对比。

资源监控

实时跟踪GPU/CPU利用率、内存消耗等硬件指标，帮助优化资源配置。

协作与分享

可通过生成链接或报告共享实验数据，支持团队协作和项目复现。

技术特点

轻量级集成：通过少量代码即可嵌入现有PyTorch或TensorFlow项目。
跨平台支持：本地部署或云端运行，兼容Linux、Windows和macOS。
数据导出：实验数据可导出为JSON、CSV等格式，便于进一步分析。

典型应用场景

学术研究中的模型性能对比。
工业级项目的迭代实验管理。
团队协作时的实验进度同步。

SwanLab的设计理念是降低实验管理复杂度，让开发者更专注于模型本身。其简洁的API和丰富的功能使其成为深度学习工作流中的实用辅助工具。

1、安装SwanLab

python 复制代码

pip install swanlab -i https://mirrors.cernet.edu.cn/pypi/web/simple

复制代码

Looking in indexes: https://mirrors.cernet.edu.cn/pypi/web/simple
Collecting swanlab
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/f7/23/a4316595242bb58421d56b8485c862f021f2451a548ad5a3fb16ae11f8a4/swanlab-0.6.4-py3-none-any.whl (261 kB)
Collecting boto3>=1.35.49 (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/ec/06/fb6679699c470a1b81027700f7dbd7ea02f3cbc647ca9d5eaa80c98db31a/boto3-1.39.1-py3-none-any.whl (139 kB)
Collecting botocore (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/72/c9/aff05765e61c0874e66e28088c70cf65f1d568ad6b19afdccfe076addadb/botocore-1.39.1-py3-none-any.whl (13.8 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hCollecting click (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/85/32/10bb5764d90a8eee674e9dc6f4db6a0ab47c8c4d0d83c27f7c39ac415a4d/click-8.2.1-py3-none-any.whl (102 kB)
Requirement already satisfied: psutil>=5.0.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from swanlab) (7.0.0)
Collecting pydantic>=2.9.0 (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/6a/c0/ec2b1c8712ca690e5d61979dee872603e92b8a32f94cc1b72d53beab008a/pydantic-2.11.7-py3-none-any.whl (444 kB)
Collecting pyecharts>=2.0.0 (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/ae/18/383622b338e4f6948ba1b75a8155d748ce097ead08a4163ca763f0ad510e/pyecharts-2.0.8-py3-none-any.whl (153 kB)
Collecting pynvml (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/ed/df/f7cf07a65a96dd11d71f346f9c2863accdd4784da83af7181b067d556cbc/pynvml-12.0.0-py3-none-any.whl (26 kB)
Requirement already satisfied: pyyaml in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from swanlab) (6.0.2)
Requirement already satisfied: requests>=2.25.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from swanlab) (2.32.4)
Requirement already satisfied: setuptools in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from swanlab) (78.1.1)
Collecting swankit==0.2.4 (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/4a/c7/7cc8d6bc562ce96d751a7655421eae09ba795cd557ed4791d63a72bd8f9a/swankit-0.2.4-py3-none-any.whl (23 kB)
Requirement already satisfied: urllib3>=1.26.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from swanlab) (2.5.0)
Collecting wrapt>=1.17.0 (from swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/90/ec/00759565518f268ed707dcc40f7eeec38637d46b098a1f5143bff488fe97/wrapt-1.17.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (82 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.35.49->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/31/b4/b9b800c45527aadd64d5b442f9b932b00648617eb5d63d2c7a6587b7cafc/jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.14.0,>=0.13.0 (from boto3>=1.35.49->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/18/17/22bf8155aa0ea2305eefa3a6402e040df7ebe512d1310165eda1e233c3f8/s3transfer-0.13.0-py3-none-any.whl (85 kB)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from botocore->swanlab) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from python-dateutil<3.0.0,>=2.1->botocore->swanlab) (1.17.0)
Collecting annotated-types>=0.6.0 (from pydantic>=2.9.0->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/78/b6/6307fbef88d9b5ee7421e68d78a9f162e0da4900bc5f5793f6d3d0e34fb8/annotated_types-0.7.0-py3-none-any.whl (13 kB)
Collecting pydantic-core==2.33.2 (from pydantic>=2.9.0->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/31/0d/c8f7593e6bc7066289bbc366f2235701dcbebcd1ff0ef8e64f6f239fb47d/pydantic_core-2.33.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m[31m9.9 MB/s[0m eta [36m0:00:01[0m
[?25hRequirement already satisfied: typing-extensions>=4.12.2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pydantic>=2.9.0->swanlab) (4.14.0)
Collecting typing-inspection>=0.4.0 (from pydantic>=2.9.0->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/17/69/cd203477f944c353c31bade965f880aa1061fd6bf05ded0726ca845b6ff7/typing_inspection-0.4.1-py3-none-any.whl (14 kB)
Requirement already satisfied: jinja2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from pyecharts>=2.0.0->swanlab) (3.1.6)
Collecting prettytable (from pyecharts>=2.0.0->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/02/c7/5613524e606ea1688b3bdbf48aa64bafb6d0a4ac3750274c43b6158a390f/prettytable-3.16.0-py3-none-any.whl (33 kB)
Collecting simplejson (from pyecharts>=2.0.0->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/bb/9e/da184f0e9bb3a5d7ffcde713bd41b4fe46cca56b6f24d9bd155fac56805a/simplejson-3.20.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (138 kB)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.25.0->swanlab) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.25.0->swanlab) (3.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from requests>=2.25.0->swanlab) (2025.6.15)
Requirement already satisfied: MarkupSafe>=2.0 in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from jinja2->pyecharts>=2.0.0->swanlab) (3.0.2)
Requirement already satisfied: wcwidth in /home/sam/anaconda3/envs/myunsloth/lib/python3.10/site-packages (from prettytable->pyecharts>=2.0.0->swanlab) (0.2.13)
Collecting nvidia-ml-py<13.0.0a0,>=12.0.0 (from pynvml->swanlab)
  Downloading https://mirrors.sustech.edu.cn/pypi/web/packages/db/24/552ebea28f0570b9e65e62b50287a273804c9f997cc1c2dcd4e2d64b9e7d/nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB)
Installing collected packages: nvidia-ml-py, wrapt, typing-inspection, swankit, simplejson, pynvml, pydantic-core, prettytable, jmespath, click, annotated-types, pyecharts, pydantic, botocore, s3transfer, boto3, swanlab
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17/17[0m [swanlab]5;237m━━[0m [32m16/17[0m [swanlab]]]
[1A[2KSuccessfully installed annotated-types-0.7.0 boto3-1.39.1 botocore-1.39.1 click-8.2.1 jmespath-1.0.1 nvidia-ml-py-12.575.51 prettytable-3.16.0 pydantic-2.11.7 pydantic-core-2.33.2 pyecharts-2.0.8 pynvml-12.0.0 s3transfer-0.13.0 simplejson-3.20.1 swankit-0.2.4 swanlab-0.6.4 typing-inspection-0.4.1 wrapt-1.17.2
Note: you may need to restart the kernel to use updated packages.

2、注册一个账号

https://swanlab.cn/signup

3、登录账号

python 复制代码

import swanlab
swanlab.login(api_key="xxx", save=True)

九、微调实战环节

我们将在下篇文章中介绍预训练、全量微调、LoRA微调实战。-> 请查阅： Unsloth 实战：DeepSeek-R1 模型高效微调指南（下篇）。

Unsloth 实战：DeepSeek-R1 模型高效微调指南（上篇）

食用指南

一、Unsloth 简介

1、核心特性

2、适用场景

3、性能对比

二、Unsloth 安装

三、模型下载

四、模型导入

1、模型对话

2、对话模板

3、自定义模板

五、数据集下载

六、数据集加载

1、load_dataset

2、load_from_disk

七、数据集清洗

1、推理数据集

2、非推理数据集

3、适配文本补全任务

4、数据集截取与融合

5、保存数据集

八、实验跟踪与可视化

SwanLab 简介

核心功能

技术特点

典型应用场景

1、安装SwanLab

2、注册一个账号

3、登录账号

九、微调实战环节

参考文档