AI(学习笔记第十七课)langchain v1.0(SQL Agent)

文章目录

  • [AI(学习笔记第十七课)langchain v1.0(SQL Agent)](#AI(学习笔记第十七课)langchain v1.0(SQL Agent))
    • [1. `langchain v1.0`的`sql agent`](#1. langchain v1.0sql agent)
      • [1.1 整体的`sql agent`说明](#1.1 整体的sql agent说明)
      • [1.2 整体的`sql agent`示例代码](#1.2 整体的sql agent示例代码)
      • [1.3 整体的`sql agent`示例`database`](#1.3 整体的sql agent示例database)
    • [2. 代码解析](#2. 代码解析)
      • [2.1 配置大模型和`langsmith`](#2.1 配置大模型和langsmith)
      • [2.2 配置`sqlite`的`database`](#2.2 配置sqlitedatabase)
      • [2.3 使用大模型生成`tools`](#2.3 使用大模型生成tools)
      • [2.4 提供系统提示词`system prompt`](#2.4 提供系统提示词system prompt)
      • [2.5 生成`sql agent`](#2.5 生成sql agent)
      • [2.6 准备用户的关于这个`database`的提问](#2.6 准备用户的关于这个database的提问)
      • [2.7 开始对`sql agent`进行提问](#2.7 开始对sql agent进行提问)
    • [3 确认执行的结果](#3 确认执行的结果)
      • [3.1 `human message`(`human`→`AI`)提出查询问题](#3.1 human messagehumanAI)提出查询问题)
      • [3.2 `AI message`(`AI`→`AI database tool`)获得数据库的所有表](#3.2 AI messageAIAI database tool)获得数据库的所有表)
      • [3.3 `tool message`(`AI database tool`→`AI`)`AI tool`回答数据库的所有表](#3.3 tool messageAI database toolAIAI tool回答数据库的所有表)
      • [3.4 `AImessage`(`AI`→`AI database tool`)`AI`进行分析,进一步请求相关表的`schema`](#3.4 AImessageAIAI database toolAI进行分析,进一步请求相关表的schema)
      • [3.5 `tool message`(`AI database tool`→`AI`)`AI tool`给出了相关表的`schema`](#3.5 tool messageAI database toolAIAI tool给出了相关表的schema)
      • [3.6 `AI message`(`AI`)`AI`进一步分析,初步思考出`sql`文](#3.6 AI messageAIAI进一步分析,初步思考出sql文)
      • [3.6 `tool message`(`AI database tool`)`sql_db_query_checker`给出了检查结果](#3.6 tool messageAI database toolsql_db_query_checker给出了检查结果)
      • [3.7 `AI message`(`AI`)大模型调用`sql_db_query`进行`db query`](#3.7 AI messageAI)大模型调用sql_db_query进行db query)
      • [3.8 `tool message`(`AI database tool`)`AI tool`查询数据库给出结果](#3.8 tool messageAI database toolAI tool查询数据库给出结果)
      • [3.9 `AI message`(`AI`)`AI`给出了结果的分析](#3.9 AI messageAIAI给出了结果的分析)
      • [3.10 `AI tool message`(`AI database tool`)`AI tool`给出了数据库的查询结果(全量)](#3.10 AI tool messageAI database toolAI tool给出了数据库的查询结果(全量))
      • [3.11 `AI message`(`AI`)`AI`最终给出了全面的回答](#3.11 AI messageAIAI最终给出了全面的回答)
    • [4. 接下来](#4. 接下来)

AI(学习笔记第十七课)langchain v1.0(SQL Agent)

  • langchain v1.0sql agent
  • 配置数据库
  • model设定SQLDatabaseToolkit
  • model和数据库的交互

1. langchain v1.0sql agent

1.1 整体的sql agent说明

langchain v1.0 sql agent
AI通过如下过程,能够和database进行交互,理解用户的自然语言,主动进行sql的检索。

  • 查询database的有效表avaiable tables以及全部表的schemas
  • 决定那些表tables和本地查询有关系
  • 取得相关表的schemas
  • 生成基于问题的sql注意,这里需要指定database's dialect,就是数据库的方言
  • 对生成的sql进行double check
  • 执行sql,并返回结果。
  • 如果sql有错误,那么纠正错误,重新query,直到执行成功
  • 对结果进行格式化(formulate),给用户返回

1.2 整体的sql agent示例代码

sql agent的实例代码

1.3 整体的sql agent示例database

sql agent的示例database数据

2. 代码解析

2.1 配置大模型和langsmith

python 复制代码
import os
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain.agents import create_agent

# DeepSeek API
model = ChatOpenAI(
    api_key = 'sk-xxxxxxx',
    base_url = 'https://api.deepseek.com/v1',
    model='deepseek-chat'# 或其他 DeepSeek 模型
)

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "lsv2_xxxxx"

2.2 配置sqlitedatabase

python 复制代码
# 获取当前文件的目录,然后构建数据库路径
current_dir = os.path.dirname(os.path.abspath(__file__))
db_path = os.path.join(current_dir, "02_chinook.db")  # 根据实际位置调整
db_uri = f"sqlite:///{db_path}"

db = SQLDatabase.from_uri(db_uri)

print(f"Dialect: {db.dialect}")
print(f"Available tables: {db.get_usable_table_names()}")
print(f'Sample output: {db.run("SELECT * FROM Artists LIMIT 5;")}')

这里采用本地的sqlite的数据库,进行一个简单的数据库练习。数据库文件已经在git代码库里准备了,不用另行准备。

执行程序,会看到。

json 复制代码
Dialect: sqlite
Available tables: ['albums', 'artists', 'customers', 'employees', 'genres', 'invoice_items', 'invoices', 'media_types', 'playlist_track', 'playlists', 'tracks']
Sample output: [(1, 'AC/DC'), (2, 'Accept'), (3, 'Aerosmith'), (4, 'Alanis Morissette'), (5, 'Alice In Chains')]

2.3 使用大模型生成tools

python 复制代码
toolkit = SQLDatabaseToolkit(db=db, llm=model)

tools = toolkit.get_tools()

for tool in tools:
    print(f"{tool.name}: {tool.description}\n")

执行结果:

json 复制代码
sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.

sql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3

sql_db_list_tables: Input is an empty string, output is a comma-separated list of tables in the database.

sql_db_query_checker: Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!

这里看到,提供了四个database的工具(tool):

  • sql_db_query 能够根据sql对数据库进行query
  • sql_db_schema 给定tables,可以得到表的schemassample rows
  • sql_db_list_tables 列出数据库的所有表tables
  • sql_db_query_checker 能够进行sql的检查
    可以看出,这里使用大模型model,提供了四个databasetools

2.4 提供系统提示词system prompt

python 复制代码
system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
""".format(
    dialect=db.dialect,
    top_k=5,
)

2.5 生成sql agent

python 复制代码
agent = create_agent(
    model,
    tools,
    system_prompt=system_prompt,
)

这里参数,包括:

  • 大模型(model)
  • 各种tools
  • 系统系统提示词system prompt

2.6 准备用户的关于这个database的提问

python 复制代码
question = "Which genre on average has the longest tracks?"

这里示例database是关于音乐的流派(genre)和音轨(即单个曲目)(tracks)等的音乐曲目数据库。

2.7 开始对sql agent进行提问

python 复制代码
for step in agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

3 确认执行的结果

通过每一步来检查来看sql agent是如何工作。

3.1 human messagehumanAI)提出查询问题

json 复制代码
================================ Human Message =================================
Which genre on average has the longest tracks?

human messageAI提问。

这里的问题是哪个音乐流派(genre)有平均长度最长 的音轨track。当然期待AI在给定的音乐曲目数据库中查询。

3.2 AI messageAIAI database tool)获得数据库的所有表

json 复制代码
================================== Ai Message ==================================
I'll help you find which genre on average has the longest tracks. Let me start by exploring the database structure.
Tool Calls:
  sql_db_list_tables (call_00_q8ks92xNcHhSoGW9tgsJPPNm)
 Call ID: call_00_q8ks92xNcHhSoGW9tgsJPPNm
  Args:
    tool_input:

3.3 tool messageAI database toolAIAI tool回答数据库的所有表

python 复制代码
================================= Tool Message =================================
Name: sql_db_list_tables

albums, artists, customers, employees, genres, invoice_items, invoices, media_types, playlist_track, playlists, tracks

AI database tool得到了database的所有表名tables

3.4 AImessageAIAI database toolAI进行分析,进一步请求相关表的schema

python 复制代码
================================== Ai Message ==================================
Now let me look at the schema for the relevant tables - particularly `genres` and `tracks` tables since we need to analyze track lengths by genre.
Tool Calls:
  sql_db_schema (call_00_nlSRcRvpO3yQAP1otxxWkbRw)
 Call ID: call_00_nlSRcRvpO3yQAP1otxxWkbRw
  Args:
    table_names: genres, tracks

AI通过分析,进一步调用AI tool,进一步获得相关表的schemas

3.5 tool messageAI database toolAIAI tool给出了相关表的schema

这里,AI database toolAI返回了两种信息:

  • 一个是表的schema
  • 另一个是表对应的sample data
  • 最后应该有各个字段的补充说明。但是这个例子没有
    通过这些数据,AI model能够有充分的数据来了解整个database的结构。
json 复制代码
================================= Tool Message =================================
Name: sql_db_schema

CREATE TABLE genres (
        "GenreId" INTEGER NOT NULL,
        "Name" NVARCHAR(120),
        PRIMARY KEY ("GenreId")
)

/*
3 rows from genres table:
GenreId Name
1       Rock
2       Jazz
3       Metal
*/

CREATE TABLE tracks (
        "TrackId" INTEGER NOT NULL,
        "Name" NVARCHAR(200) NOT NULL,
        "AlbumId" INTEGER,
        "MediaTypeId" INTEGER NOT NULL,
        "GenreId" INTEGER,
        "Composer" NVARCHAR(220),
        "Milliseconds" INTEGER NOT NULL,
        "Bytes" INTEGER,
        "UnitPrice" NUMERIC(10, 2) NOT NULL,
        PRIMARY KEY ("TrackId"),
        FOREIGN KEY("MediaTypeId") REFERENCES media_types ("MediaTypeId"),
        FOREIGN KEY("GenreId") REFERENCES genres ("GenreId"),
        FOREIGN KEY("AlbumId") REFERENCES albums ("AlbumId")
)

/*
3 rows from tracks table:
TrackId Name    AlbumId MediaTypeId     GenreId Composer        Milliseconds    Bytes   UnitPrice
1       For Those About To Rock (We Salute You) 1       1       1       Angus Young, Malcolm Young, Brian Johnson      343719  11170334        0.99
2       Balls to the Wall       2       2       1       None    342562  5510424 0.99
3       Fast As a Shark 3       2       1       F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman     2306193990994  0.99
*/

3.6 AI messageAIAI进一步分析,初步思考出sql

json 复制代码
================================== Ai Message ==================================

Perfect! I can see that:
1. The `tracks` table has a `GenreId` column that references the `genres` table
2. The `tracks` table has a `Milliseconds` column which represents the track length
3. The `genres` table has `GenreId` and `Name` columns

Now I need to write a query that:
4. Joins the `tracks` and `genres` tables
5. Groups by genre
6. Calculates the average track length in milliseconds for each genre
7. Orders by average track length descending to find the genre with the longest average tracks

Let me first check my query before executing it:
Tool Calls:
  sql_db_query_checker (call_00_rJzx7nT7IpvLXemzJhCYi9jG)
 Call ID: call_00_rJzx7nT7IpvLXemzJhCYi9jG
  Args:
    query: SELECT
    g.Name AS Genre,
    AVG(t.Milliseconds) AS AverageTrackLengthMs,
    AVG(t.Milliseconds) / 1000.0 AS AverageTrackLengthSeconds,
    AVG(t.Milliseconds) / 60000.0 AS AverageTrackLengthMinutes
FROM tracks t
JOIN genres g ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.Name
ORDER BY AverageTrackLengthMs DESC
LIMIT 5

这里,可以看出AI能够分析出写出sql的分析过程,给出了4个步骤:

  1. Joins the tracks and genres tables
  2. Groups by genre
  3. Calculates the average track length in milliseconds for each genre
  4. Orders by average track length descending to find the genre with the longest average tracks
    但是以防万一,这里AI model接下来调用sql_db_query_checker为这个sql进行检查。

3.6 tool messageAI database toolsql_db_query_checker给出了检查结果

json 复制代码
================================= Tool Message =================================
Name: sql_db_query_checker

```sql
SELECT
    g.Name AS Genre,
    AVG(t.Milliseconds) AS AverageTrackLengthMs,
    AVG(t.Milliseconds) / 1000.0 AS AverageTrackLengthSeconds,
    AVG(t.Milliseconds) / 60000.0 AS AverageTrackLengthMinutes
FROM tracks t
JOIN genres g ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.Name
ORDER BY AverageTrackLengthMs DESC
LIMIT 5

3.7 AI messageAI)大模型调用sql_db_query进行db query

python 复制代码
================================== Ai Message ==================================

The query looks good. Now let me execute it:
Tool Calls:
  sql_db_query (call_00_skJHEgeGsjap4lhoI1X5ohvx)
 Call ID: call_00_skJHEgeGsjap4lhoI1X5ohvx
  Args:
    query: SELECT
    g.Name AS Genre,
    AVG(t.Milliseconds) AS AverageTrackLengthMs,
    AVG(t.Milliseconds) / 1000.0 AS AverageTrackLengthSeconds,
    AVG(t.Milliseconds) / 60000.0 AS AverageTrackLengthMinutes
FROM tracks t
JOIN genres g ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.Name
ORDER BY AverageTrackLengthMs DESC
LIMIT 5

3.8 tool messageAI database toolAI tool查询数据库给出结果

json 复制代码
================================= Tool Message =================================
Name: sql_db_query

[('Sci Fi & Fantasy', 2911783.0384615385, 2911.7830384615386, 48.52971730769231), 
('Science Fiction', 2625549.076923077, 2625.549076923077, 43.759151282051285),
('Drama', 2575283.78125, 2575.28378125, 42.92139635416667),
('TV Shows', 2145041.0215053763, 2145.041021505376, 35.75068369175627),
('Comedy', 1585263.705882353, 1585.263705882353, 26.421061764705883)]

3.9 AI messageAIAI给出了结果的分析

json 复制代码
================================== Ai Message ==================================
Great! The results show that **Sci Fi & Fantasy** has the longest average track length at approximately:       
- 2,911,783 milliseconds (2,911.8 seconds)
- 48.53 minutes on average

However, I notice these genres seem unusual for music tracks (Sci Fi & Fantasy, Science Fiction, Drama, TV Shows, Comedy). These might be audiobooks or spoken word content rather than traditional music genres. Let me check if there are more traditional music genres with longer average track lengths by looking at all genres:
Tool Calls:
  sql_db_query (call_00_K3SlmvjpbSFh6GR8qESVMjWu)
 Call ID: call_00_K3SlmvjpbSFh6GR8qESVMjWu
  Args:
    query: SELECT
    g.Name AS Genre,
    AVG(t.Milliseconds) AS AverageTrackLengthMs,
    AVG(t.Milliseconds) / 60000.0 AS AverageTrackLengthMinutes,
    COUNT(t.TrackId) AS NumberOfTracks
FROM tracks t
JOIN genres g ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.Name
ORDER BY AverageTrackLengthMs DESC

这里,虽然AI得出了结论,但是AI慎重起见,又进行了进一步的分析。

sql 复制代码
SELECT
    g.Name AS Genre,
    AVG(t.Milliseconds) AS AverageTrackLengthMs,
    AVG(t.Milliseconds) / 60000.0 AS AverageTrackLengthMinutes,
    COUNT(t.TrackId) AS NumberOfTracks
FROM tracks t
JOIN genres g ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.Name
ORDER BY AverageTrackLengthMs DESC

这里,给出了全体的数据,而不是limit 5

3.10 AI tool messageAI database toolAI tool给出了数据库的查询结果(全量)

json 复制代码
[('Sci Fi & Fantasy', 2911783.0384615385, 48.52971730769231, 26), 
('Science Fiction', 2625549.076923077, 43.759151282051285, 13), 
('Drama', 2575283.78125, 42.92139635416667, 64), 
('TV Shows', 2145041.0215053763, 35.75068369175627, 93), 
('Comedy', 1585263.705882353, 26.421061764705883, 17), 
('Metal', 309749.4438502674, 5.162490730837789, 374), 
('Electronica/Dance', 302985.8, 5.049763333333333, 30), 
('Heavy Metal', 297452.9285714286, 4.95754880952381, 28), 
('Classical', 293867.5675675676, 4.897792792792793, 74), 
('Jazz', 291755.3769230769, 4.862589615384615, 130), 
('Rock', 283910.0431765613, 4.7318340529426886, 1297), 
('Blues', 270359.77777777775, 4.505996296296296, 81), 
('Alternative', 264058.525, 4.400975416666667, 40), 
('Reggae', 247177.75862068965, 4.119629310344828, 58), 
('Soundtrack', 244370.88372093023, 4.072848062015503, 43), 
('Alternative & Punk', 234353.84939759035, 3.9058974899598393, 332), 
('Latin', 232859.26252158894, 3.8809877086931492, 579), 
('Pop', 229034.10416666666, 3.817235069444444, 48), 
('World', 224923.82142857142, 3.748730357142857, 28), 
('R&B/Soul', 220066.8524590164, 3.66778087431694, 61), 
('Bossa Nova', 219590.0, 3.6598333333333333, 15), 
('Easy Listening', 189164.20833333334, 3.1527368055555556, 24), 
('Hip Hop/Rap', 178176.2857142857, 2.969604761904762, 35), 
('Opera', 174813.0, 2.91355, 1), 
('Rock And Roll', 134643.5, 2.2440583333333333, 12)]

3.11 AI messageAIAI最终给出了全面的回答

json 复制代码
================================== Ai Message ==================================

Now I can see the complete picture. The top 5 genres with the longest average track lengths are all non-music content (audiobooks/spoken word). Among traditional music genres, the ones with the longest average track lengths are:

1. **Metal** - 309,749 ms (5.16 minutes)
2. **Electronica/Dance** - 302,986 ms (5.05 minutes)
3. **Heavy Metal** - 297,453 ms (4.96 minutes)
4. **Classical** - 293,868 ms (4.90 minutes)
5. **Jazz** - 291,755 ms (4.86 minutes)

So, to answer your question:

**Among traditional music genres, Metal has the longest average track length at approximately 5.16 minutes.**

However, if we include all content types in the database (including audiobooks and spoken word), then **Sci Fi & Fantasy** has by far the longest average track length at approximately 48.53 minutes.

AI + AI database tool构建的sql agent发挥了巨大的威力。

4. 接下来

相关推荐
山土成旧客2 小时前
【Python学习打卡-Day30】模块化编程:从“单兵作战”到“军团指挥”
开发语言·python·学习
zbguolei2 小时前
使用VBA将EXCEL生成PPT
人工智能·opencv·计算机视觉
易百纳2 小时前
易百纳携多模态AI桌面机器人——Kubee Robot亮相2025火山引擎冬季FORCE大会
人工智能·火山引擎
bl4ckpe4ch2 小时前
从零开始Mac OS 开荒整理笔记
笔记·macos·开荒
zhengfei6112 小时前
AI渗透工具——自主进攻性安全人工智能,用于指导渗透测试流程(EVA)
人工智能·安全
IT_陈寒2 小时前
React 18 性能优化实战:5个被低估的Hooks用法让你的应用快30%
前端·人工智能·后端
戴西软件2 小时前
戴西软件3DViz Convert:解锁三维数据流动,驱动一体化协同设计
大数据·人工智能·安全·3d·华为云·云计算
haiyu_y2 小时前
Day 51 在预训练 ResNet18 中注入 CBAM 注意力
人工智能·pytorch·深度学习
lkbhua莱克瓦242 小时前
基础-约束
android·开发语言·数据库·笔记·sql·mysql·约束