评测集 - 评测集技术,学习,经验文章

山顶夕景

7 个月前

【Agent】Evaluation and Benchmarking of LLM Agents: A Survey现在 LLM Agent 越来越复杂：会规划、用工具、有记忆、能多轮互动、能协作但评测方法仍停留在 LLM 级别：