介绍如何使用Langchain结合ChatGLM对Pandas DataFrame进行数据处理。以下是具体步骤和代码示例:
-
导入所需库:
pythonfrom config.chatglm_config import llm_glm4 import pprint from typing import Any, Dict import pandas as pd from langchain.output_parsers import PandasDataFrameOutputParser from langchain.prompts import PromptTemplate
-
格式化解析器输出函数:
pythondef format_parser_output(parser_output: Dict[str, Any]) -> None: for key in parser_output.keys(): parser_output[key] = parser_output[key].to_dict() return pprint.PrettyPrinter(width=4, compact=True).pprint(parser_output)
-
定义Pandas DataFrame:
pythondf = pd.DataFrame( { "num_legs": [2, 4, 8, 0], "num_wings": [2, 0, 0, 0], "num_specimen_seen": [10, 2, 1, 8], } )
-
设置解析器和提示模板:
pythonparser = PandasDataFrameOutputParser(dataframe=df) df_query = "检索 num_wings 列。" prompt = PromptTemplate( template="回答用户查询。\n{format_instructions}\n{query}\n", input_variables=["query"], partial_variables={"format_instructions": parser.get_format_instructions()}, )
-
执行链操作:
pythonchain = prompt | llm_glm4 | parser parser_output = chain.invoke({"query": df_query}) format_parser_output(parser_output)
-
得到结果:
bash
{'num_wings': {0: 2,
1: 0,
2: 0,
3: 0}}
Process finished with exit code 0