背景描述:
需求是根据各个业务系统界面数据,获取各业务系统接口结果。
现在通过text-tosql 查询方式非常不稳定,我需要 改成另外一套方案,各个业务方自己提供sql 模板sql 语句,将这些sql 同步到milvus 中,然后根据 提问内容进行关键字提取 , 根据匹配度来查询,精准命中sql语句来执行。
这是企业里非常常见的一种实现方式:
- AI BI
- AI 报表
- AI 数据助手
一、docker compose 安装 Milvus 2.5.5
1)compose 文件如下:
bash
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.18
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "19001:9001"
- "19000:9000"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.5.5
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
attu:
container_name: milvus-attu
image: zilliz/attu:latest
environment:
MILVUS_URL: standalone:19530
ports:
- "8000:3000"
depends_on:
- standalone
networks:
default:
name: milvus
启动方式:启动前先 docker compose down 然后 docker compose up -d
2)怎么看 Milvus 数据?
用 Milvus Web UI 安装 Attu(官方可视化工具)
在 docker-compose.yml 里新增
bash
attu:
container_name: milvus-attu
image: zilliz/attu:latest
environment:
MILVUS_URL: standalone:19530
ports:
- "8000:3000"
depends_on:
- standalone
浏览器访问: http://localhost:8000
3)业务数据同步到 milvus中
表实体信息:
bash
public class SqlTemplate {
@TableId(type = IdType.AUTO)
private Long id;
private String businessCode;
private String templateName;
private String templateDesc;
private String keywords;
private String questionExamples;
private String sqlText;
private String parameterSchema;
private String vectorId;
private Integer enabled;
private Date createTime;
private Date updateTime;
}
关键字段: keywords 关键词,question_examples 问题示例,sql_text sql 语句,parameter_schema 解析 sql 参数用,正则来截取入参, vector_id 向量id
参数模板:
{
"templates": [
{
"businessCode": "sales",
"templateName": "monthly_sales_summary",
"templateDesc": "Query monthly sales amount and order count by month",
"keywords": [
"销售",
"销售额",
"订单数",
"月度销售",
"月报"
],
"questionExamples": [
"查询2025年3月销售额",
"看一下2025年3月订单数和销售额",
"给我月度销售汇总"
],
"sqlText": "select ifnull(sum(pay_amount), 0) as total_amount, count(1) as order_count from sales_order where date_format(pay_time, '%Y年%c月') = :month",
"parameterSchema": [
{
"name": "month",
"description": "Month text like 2025年3月",
"pattern": "(20\\\\d{2}年\\\\d{1,2}月)",
"defaultValue": null,
"required": true
}
],
"enabled": true
},
{
"businessCode": "member",
"templateName": "member_count_by_level",
"templateDesc": "Count members by membership level",
"keywords": [
"会员",
"会员数",
"等级",
"会员等级"
],
"questionExamples": [
"黄金会员有多少人",
"统计白银会员数量",
"会员等级人数"
],
"sqlText": "select level_name, count(1) as member_count from member_info where level_name = :levelName group by level_name",
"parameterSchema": [
{
"name": "levelName",
"description": "Member level name",
"pattern": "(黄金会员|白银会员|普通会员)",
"defaultValue": null,
"required": true
}
],
"enabled": true
},
{
"businessCode": "inventory",
"templateName": "product_stock_by_id",
"templateDesc": "Query stock by product id",
"keywords": [
"库存",
"商品库存",
"库存查询",
"商品ID"
],
"questionExamples": [
"查询商品1001库存",
"1002还有多少库存",
"看一下商品库存"
],
"sqlText": "select product_id, product_name, stock_num from product_stock where product_id = :id",
"parameterSchema": [
{
"name": "id",
"description": "Product id",
"pattern": "(\\\\d+)",
"defaultValue": null,
"required": true
}
],
"enabled": true
}
]
}
代码如下:
@Transactional(rollbackFor = Exception.class)
public List<SqlTemplate> syncTemplates(List<SqlTemplateSyncItemDTO> templates) {
if (templates == null || templates.isEmpty()) {
throw new BusinessException(ErrorCode.PARAMS_ERROR, "SQL template list cannot be empty");
}
List<SqlTemplate> savedTemplates = new ArrayList<>();
for (SqlTemplateSyncItemDTO item : templates) {
SqlTemplate saved = saveOrUpdateTemplate(item);
syncTemplateToVectorStore(saved);
savedTemplates.add(saved);
}
return savedTemplates;
}
private void syncTemplateToVectorStore(SqlTemplate template) {
if (template.getVectorId() != null && !template.getVectorId().isBlank()) {
vectorStore.delete(List.of(template.getVectorId()));
}
String vectorText = buildVectorText(template);
Map<String, Object> metadata = new HashMap<>();
metadata.put("knowledgeType", KNOWLEDGE_TYPE);
metadata.put("templateId", template.getId());
metadata.put("businessCode", template.getBusinessCode());
metadata.put("templateName", template.getTemplateName());
Document document = new Document(vectorText, metadata);
vectorStore.add(List.of(document));
template.setVectorId(document.getId());
sqlTemplateMapper.updateById(template);
}
private String buildVectorText(SqlTemplate template) {
List<String> textParts = new ArrayList<>();
textParts.add(template.getBusinessCode());
textParts.add(template.getTemplateName());
textParts.add(Optional.ofNullable(template.getTemplateDesc()).orElse(""));
textParts.add(String.join(" ", parseJsonArray(template.getKeywords())));
textParts.add(String.join(" ", parseJsonArray(template.getQuestionExamples())));
textParts.add(buildParamHintText(template));
return textParts.stream()
.filter(text -> text != null && !text.isBlank())
.collect(Collectors.joining("\n"));
}
## 二、根据问题获取 答案:
public SqlTemplateMatchVO matchTemplate(String question) {
if (question == null || question.isBlank()) {
throw new BusinessException(ErrorCode.PARAMS_ERROR, "Question cannot be empty");
}
List<String> extractedKeywords = extractKeywords(question);
String recallQuery = String.join(" ", extractedKeywords.isEmpty() ? List.of(question) : extractedKeywords);
List<Document> documents = vectorStore.similaritySearch(SearchRequest.builder()
.query(recallQuery)
.topK(10)
.similarityThreshold(0.1d)
.filterExpression("knowledgeType == '" + KNOWLEDGE_TYPE + "'")
.build());
if (documents == null || documents.isEmpty()) {
return null;
}
SqlTemplateMatchVO best = null;
for (Document document : documents) {
Object templateIdValue = document.getMetadata().get("templateId");
if (templateIdValue == null) {
continue;
}
// Long templateId = Long.parseLong(String.valueOf(templateIdValue));
Long templateId = parseTemplateId(templateIdValue);
SqlTemplate template = sqlTemplateMapper.selectById(templateId);
if (template == null || !Objects.equals(template.getEnabled(), 1)) {
continue;
}
List<String> templateKeywords = parseJsonArray(template.getKeywords());
List<String> matchedKeywords = matchKeywords(extractedKeywords, templateKeywords, question);
// 根据匹配度高低来查询答案
double keywordScore = templateKeywords.isEmpty() ? 0D : matchedKeywords.size() * 1.0D / templateKeywords.size();
double vectorScore = Optional.ofNullable(document.getScore()).orElse(0D);
double finalScore = keywordScore * 0.7D + vectorScore * 0.3D;
Map<String, Object> extractedParams = extractParams(question, template, Collections.emptyMap());
List<String> missingParams = findMissingParams(template, extractedParams);
SqlTemplateMatchVO current = SqlTemplateMatchVO.builder()
.templateId(template.getId())
.businessCode(template.getBusinessCode())
.templateName(template.getTemplateName())
.sqlText(template.getSqlText())
.vectorScore(vectorScore)
.keywordScore(keywordScore)
.finalScore(finalScore)
.extractedKeywords(extractedKeywords)
.matchedKeywords(matchedKeywords)
.extractedParams(extractedParams)
.missingParams(missingParams)
.build();
if (best == null || current.getFinalScore() > best.getFinalScore()) {
best = current;
}
}
return best;
}
匹配关键字 命中率
private List<String> matchKeywords(List<String> extractedKeywords, List<String> templateKeywords, String question) {
if (templateKeywords == null || templateKeywords.isEmpty()) {
return Collections.emptyList();
}
String normalizedQuestion = normalize(question);
return templateKeywords.stream()
.filter(keyword -> {
String normalizedKeyword = normalize(keyword);
return extractedKeywords.contains(normalizedKeyword)
|| normalizedQuestion.contains(normalizedKeyword);
})
.distinct()
.toList();
}
根据正则匹配 sql 入参
private Map<String, String> extractHeuristicValues(String question) {
Map<String, String> values = new HashMap<>();
Matcher dateMatcher = Pattern.compile("(20\\d{2}-\\d{1,2}-\\d{1,2})").matcher(question);
if (dateMatcher.find()) {
values.put("date", dateMatcher.group(1));
}
Matcher yearMonthMatcher = Pattern.compile("(20\\d{2}年\\d{1,2}月)").matcher(question);
if (yearMonthMatcher.find()) {
String value = yearMonthMatcher.group(1);
values.put("month", value);
values.put("yearMonth", value);
}
Matcher yearMatcher = Pattern.compile("(20\\d{2})年?").matcher(question);
if (yearMatcher.find()) {
values.put("year", yearMatcher.group(1));
}
Matcher numberMatcher = Pattern.compile("(\\d+)").matcher(question);
if (numberMatcher.find()) {
values.put("value", numberMatcher.group(1));
values.put("id", numberMatcher.group(1));
}
return values;
}
问答示例:
curl "http://localhost:8989/v1/sql-template/match?question=查询2025年3月销售额"
curl "http://localhost:8989/v1/sql-template/match?question=看一下2025年3月订单数和销售额"
curl "http://localhost:8989/v1/sql-template/match?question=给我月度销售汇总"
结果:
```bash
{"code":0,"data":{"templateId":1,"businessCode":"sales","templateName":"monthly_sales_summary","sqlText":"select ifnull(sum(pay_amount), 0) as total_amount, count(1) as order_count from sales_order where date_format(pay_time, '%Y年%c月') = :month","vectorScore":0.7076406478881836,"keywordScore":0.2,"finalScore":0.3522921943664551,"extractedKeywords":["看一下2025年3月订单数和销售额","看一","一下","下年","年月","月订","订单","单数","数和","和销","销售","售额","2025年3月"],"matchedKeywords":["销售"],"extractedParams":{"month":"2025年3月"},"missingParams":[]},"message":"ok"}
遇到问题:
milvus 没有启动起来 runtime.goexit({}) /go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.0.linux-amd64/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00009cfe8 sp=0xc00009cfe0 pc=0x206ea41created by go.opencensus.io/stats/view.init.0 in goroutine 1
Go 协程堆栈信息(正常运行也会出现)
真正的问题是:Milvus 没完全启动成功(通常卡在依赖组件 / 资源 / 网络)
问题 docker 版本比较低
wget https://github.com/milvus-io/milvus/releases/download/v2.5.5/milvus-standalone-docker-compose.yml -O docker-compose.yml
ou are using pip version 8.1.2, however version 26.0.1 is available.You should consider upgrading via the 'pip install --upgrade pip' command.
当前 Python + pip 环境太老,升级方式已经不兼容
安装 Docker Compose(推荐方式)
官方插件
bash
mkdir -p ~/.docker/cli-plugins/
curl -SL https://github.com/docker/compose/releases/download/v2.27.0/docker-compose-linux-x86_64 \
-o ~/.docker/cli-plugins/docker-compose
chmod +x ~/.docker/cli-plugins/docker-compose
验证:docker compose version
docker compose ✅(正确)
docker-compose ❌(旧版)
Docker Compose version v2.27.0 说明compose 已经安装完成
下载 docker compose文件
bash
curl -L -o docker-compose.yml https://github.com/milvus-io/milvus/releases/download/v2.5.5/milvus-standalone-docker-compose.yml
Error response from daemon: driver failed programming external connectivity on endpoint milvus-minio (40948cd198354631dd03397397876d0f8e354d28db263365597bb2f7cd196fe5): Bind for 0.0.0.0:9000 failed: port is already allocated
改端口号
ports:
- "19001:9001"
- "19000:9000"