1. 概述
vanna是一个可以将自然语言转为sql的工具。简单的demo如下:
!pip install vanna
import vanna
from vanna.remote import VannaDefault
vn = VannaDefault(model='chinook', api_key=vanna.get_api_key('my-email@example.com'))
vn.connect_to_sqlite('https://vanna.ai/Chinook.sqlite')
vn.ask("What are the top 10 albums by sales?")
执行下面的代码运行图形界面
from vanna.flask import VannaFlaskApp
VannaFlaskApp(vn).run()
2. 配置
数据库可以是任何数据库,比如mysql如下:
import pandas as pd
import psycopg2
def run_sql(sql):
conn = psycopg2.connect(
host="localhost",
database="my_database",
user="my_user",
password="my_password"
)
return pd.read_sql(sql, conn)
vn.run_sql = run_sql
vn.run_sql_is_set = True
向量数据库稍微麻烦一些,目前支持的包括:
参考代码如下:
from vanna.chromadb.chromadb_vector import ChromaDB_VectorStore
class MyVanna(ChromaDB_VectorStore):
def __init__(self, config=None):
ChromaDB_VectorStore.__init__(self, config=config)
vn = MyVanna(config={'path': '/path/to/chromadb'})
3. 训练
训练数据可以是:DDL、documentation、sql以及Question-SQL Pairs
vn.train(ddl="CREATE TABLE my_table (id INT, name TEXT)")
vn.train(documentation="Our business defines XYZ as ABC")
vn.train(sql="SELECT col1, col2, col3 FROM my_table")
可以设置auto_train = True
4. 询问
vn.ask("What are the top 10 customers by sales?")
它包含下列几个函数:
vn.generate_sql
vn.run_sql
vn.generate_plotly_code
vn.get_plotly_figure
visualize=False
5. 启用服务
参考https://github.com/vanna-ai/vanna-flask,将LLM、embedding、vectorStore都改造成自己的代码。
首先是LLM,改造框架为:
from vanna.base import VannaBase
class MyLLM(VannaBase):
def __init__(self,config=None):
VannaBase.__init__(self, config=config)
...
def system_message(self, message: str) -> any:
return {"role": "system", "content": message}
def user_message(self, message: str) -> any:
return {"role": "user", "content": message}
def assistant_message(self, message: str) -> any:
return {"role": "assistant", "content": message}
def submit_prompt(self, prompt, **kwargs) -> str:
...
然后是embedding,需要定义encode_documents和encode_queries两个函数,例如:
class BgeM3:
def __init__(self, url):
self.url = url
def encode_documents(self, docs):
....
def encode_queries(self, queries):
....
接下来是vectorStore,我们使用milvus,它会自动调用config中的embedding_function,我们把它定义成上面的BegM3即可:
class MyVanna(Milvus_VectorStore, QwenLLM):
def __init__(self, config=None):
Milvus_VectorStore.__init__(self, config=config)
QwenLLM.__init__(self, config=config)
vn = MyVanna(config={'milvus_client': MilvusClient(...),'embedding_function':BgeM3(...)})
然后定义连接的数据库,可以换成任意的其他数据库:
def run_sql(sql: str) -> pd.DataFrame:
cnx = mysql.connector.connect(...)
cursor = cnx.cursor()
cursor.execute(sql)
result = cursor.fetchall()
columns = cursor.column_names
df = pd.DataFrame(result, columns=columns)
return df
vn.run_sql = run_sql
vn.run_sql_is_set = True
接着执行python app.py即可启用服务,访问localhost:5000可以打开页面:
同时也可以调用接口:
import requests
response = requests.get(url+'/api/v0/get_training_data',headers={'Content-Type':'application/json'})
response.json()