修改 lm-evaluation-harness 任务 yaml 的 prompt

修改 lm-evaluation-harness 任务 yaml 的 prompt，以gsm8k_cot_zeroshot.yaml 为例说明。

✅ gsm8k_cot_zeroshot.yaml 在哪里？

如果使用的是 LM Evaluation Harness（lm_eval），文件一般在：

复制代码

<your_env>/site-packages/lm_eval/tasks/gsm8k/

可以直接在终端找到实际路径：

bash 复制代码

python -c "import lm_eval, inspect, os; print(os.path.dirname(inspect.getfile(lm_eval)))"

输出类似：

复制代码

/data/home/usr111/.conda/envs/llm_gpu/lib/python3.10/site-packages/lm_eval

那么 gsm8k 的配置就在：

复制代码

/data/home/usr111/.conda/envs/llm_gpu/lib/python3.10/site-packages/lm_eval/tasks/gsm8k/gsm8k_cot_zeroshot.yaml

📌 你要找的内容就在这个文件里

打开：

bash 复制代码

vim ~/.conda/envs/llm_gpu/lib/python3.10/site-packages/lm_eval/tasks/gsm8k/gsm8k_cot_zeroshot.yaml

会看到类似：

yaml 复制代码

doc_to_text: "Q: {{question}}\nA: Let's think step by step."

而想修改 prompt 是：

yaml 复制代码

doc_to_text: "Solve the following math problem step by step. The last line of your response should be of the form The answer is $ANSWER (without quotes) where $ANSWER is the answer to the problem.\n{{question}}\n Remember to put your answer on its own line after 'The answer is', and you do not need to use a \\boxed command."

⚠️ 注意：不要直接修改 site-packages 的文件！

未来更新/重装会被覆盖。

官方推荐两种方式：

✅ 方法 1（推荐）：在本地创建自定义 task

创建目录：

复制代码

mkdir -p custom_tasks/gsm8k

新建文件：

复制代码

custom_tasks/gsm8k/gsm8k_cot_zeroshot.yaml

把你的版本粘进去：

yaml 复制代码

task: gsm8k_cot_zeroshot_custom

doc_to_text: |
  Solve the following math problem step by step. The last line of your response should be of the form The answer is $ANSWER (without quotes) where $ANSWER is the answer to the problem.
  {{question}}
  Remember to put your answer on its own line after 'The answer is', and you do not need to use a \boxed command.

doc_to_target: "{{answer}}"

然后运行：

bash 复制代码

lm_eval --tasks gsm8k_cot_zeroshot_custom

或：

bash 复制代码

accelerate launch eval_llada.py --tasks gsm8k_cot_zeroshot_custom

✅ 方法 2：直接覆盖原文件（简单但不推荐）

bash 复制代码

vim ~/.conda/envs/llm_gpu/lib/python3.10/site-packages/lm_eval/tasks/gsm8k/gsm8k-cot-zeroshot.yaml

然后把 prompt 改成你需要的。