【Langchain大语言模型开发教程】评估

🔗 LangChain for LLM Application Development - DeepLearning.AI

学习目标

1、Example generation

2、Manual evaluation and debug

3、LLM-assisted evaluation

4、LangChain evaluation platform

1、引包、加载环境变量;

python 复制代码
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

2、加载数据;

python 复制代码
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file, encoding='utf-8')
data = loader.load()

3、创建向量数据库(内存警告⚠);

python 复制代码
model_name = "bge-large-en-v1.5"
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
)

db = DocArrayInMemorySearch.from_documents(data, embeddings)
retriever = db.as_retriever()

4、初始化一个LLM并创建一个RetrievalQ链;

python 复制代码
llm = ChatOpenAI(api_key=os.environ.get('ZHIPUAI_API_KEY'),
                         base_url=os.environ.get('ZHIPUAI_API_URL'),
                         model="glm-4",
                         temperature=0.98)

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever,
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

Example generation

python 复制代码
from langchain.evaluation.qa import QAGenerateChain

example_gen_chain = QAGenerateChain.from_llm(llm)

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
)

这里我们打印一下这个生成的example,发现是一个列表长下面这个样子;

python 复制代码
[{'qa_pairs': {'query': "What is the unique feature of the innersole in the Women's Campside Oxfords?", 'answer': 'The innersole has a vintage hunt, fish, and camping motif.'}}, {'qa_pairs': {'query': 'What is the name of the dog mat that is ruggedly constructed from recycled plastic materials, helping to keep dirt and water off the floors and plastic out of landfills?', 'answer': 'The name of the dog mat is Recycled Waterhog Dog Mat, Chevron Weave.'}}, {'qa_pairs': {'query': 'What is the name of the product described in the document that is suitable for Infant and Toddler Girls?', 'answer': "The product is called 'Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece'."}}, {'qa_pairs': {'query': 'What is the primary material used in the construction of the Refresh Swimwear V-Neck Tankini, and what percentage of it is recycled?', 'answer': 'The primary material is nylon, with 82% of it being recycled nylon.'}}, {'qa_pairs': {'query': 'What is the material used for the EcoFlex 3L Storm Pants, according to the document?', 'answer': 'The EcoFlex 3L Storm Pants are made of 100% nylon, exclusive of trim.'}}]

所以这里我们需要进行一步提取;

python 复制代码
for example in new_examples:
    examples.append(example["qa_pairs"])

print(examples)

qa.invoke(examples[0]["query"])

Manual Evaluation

python 复制代码
import langchain
langchain.debug = True #开始debug模式,查看chain中的详细步骤

我们再次执行来查看chain中的细节;

LLM-assisted evaluation

那我们是不是可以使用语言模型来评估呢;

python 复制代码
langchain.debug = False #关闭debug模式

from langchain.evaluation.qa import QAEvalChain

让大语言模型来为我们每个example来生成答案;

python 复制代码
predictions = qa.apply(examples)

我们初始化一个评估链;

python 复制代码
eval_chain = QAEvalChain.from_llm(llm)

让大语言模型对实际答案和预测答案进行对比并给出一个评分;

python 复制代码
graded_outputs = eval_chain.evaluate(examples, predictions)

最后,我们可以打印一下看看结果;

python 复制代码
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()
相关推荐
AI机器学习算法32 分钟前
深度学习模型演进:6个里程碑式CNN架构
人工智能·深度学习·cnn·大模型·ai学习路线
Ztopcloud极拓云视角43 分钟前
从 OpenRouter 数据看中美 AI 调用量反转:统计口径、模型路由与多云应对方案
人工智能·阿里云·大模型·token·中美ai
AI医影跨模态组学1 小时前
如何将深度学习MTSR与膀胱癌ITGB8/TGF-β/WNT机制建立关联,并进一步解释其与患者预后及肿瘤侵袭、免疫抑制的生物学联系
人工智能·深度学习·论文·医学影像
搬砖的前端1 小时前
AI编辑器开源主模型搭配本地模型辅助对标GPT5.2/GPT5.4/Claude4.6(前端开发专属)
人工智能·开源·claude·mcp·trae·qwen3.6·ops4.6
Python私教2 小时前
Hermes Agent 安全加固与生态扩展:2026-04-23 更新解析
人工智能
饼干哥哥2 小时前
Kimi K2.6 干成了Claude Design国产版,一句话生成电影级的动态品牌网站
人工智能
肖有米XTKF86462 小时前
带货者精品优选模式系统的平台解析
人工智能·信息可视化·团队开发·csdn开发云
天天进步20152 小时前
打破沙盒限制:OpenWork 如何通过权限模型实现安全的系统级调用?
人工智能·安全
xcbrand2 小时前
政府事业机构品牌策划公司找哪家
大数据·人工智能·python
骥龙2 小时前
第十篇:合规与未来展望——构建AI智能体安全标准
人工智能·安全