JoyAgent问数多表关联Bug修复

我们测试了JoyAgent单表问数表现,整体问数效果高于预期,能够自动加工复杂的数据场景。例如:我们导入员工信息表,包含身份证号码和出生日期等字段,我们通过问数"获取身份证号码中的出生日期和登记的出生日期有差异的员工",系统先从身份证号码中提取出生日期,并做日期格式转换,再判断是否相同,可以很好的反馈正确数据。

但是,真实分析场景很少在单张表上做数据分析,会横跨多张表做数据关联。JoyAgent的源代码以及样例中并未展示或测试,需要多表关联的业务问数。我们在检查源代码后,评估JoyAgent可以完成多表关联任务,但是,源代码并无外键关系知识库编写规范和样例可供参考。于是,我们设计了一个多表关联的数据场景。并在此场景下,做了工程测试。

我们在完成场景设计和测试数据导入后,系统自动生成了表关联SQL代码,但是在表名字替换阶段出现了问题。下面我们逐步描述我们设置的查询场景,以及查询场景之下遇到的一个系统bug,并成功修复,达到预期效果。我们逐步描述场景、问题、修复方案

1.数据场景

我们导入两张表作为问数基础:员工信息表,员工考勤信息表。建表语句如下:

复制代码
CREATE TABLE employee_info (
    employee_id VARCHAR(20) PRIMARY KEY COMMENT '员工ID(主键)',
    full_name VARCHAR(50) NOT NULL COMMENT '员工全名',
    gender VARCHAR(50) COMMENT '性别:男或女',
    nationality VARCHAR(30) COMMENT '国籍',
    id_card VARCHAR(20)  COMMENT '身份证号',
    birth_date VARCHAR(200) NOT NULL COMMENT '出生日期',
    department VARCHAR(50) NOT NULL COMMENT '所属部门',
    marital_status VARCHAR(20)  COMMENT '婚姻状况(未婚,已婚,离异)',
    education VARCHAR(20) COMMENT '最高学历(高中,专科,本科,硕士,博士)',
    contact_phone VARCHAR(15) COMMENT '联系电话',
    emergency_contact VARCHAR(15) COMMENT '紧急联系人电话',
    address VARCHAR(1000) COMMENT '现居住地址',
    hire_date VARCHAR(100) NOT NULL COMMENT '入职日期'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='员工基础信息表';

CREATE TABLE employee_attendance (
    attendance_id INT PRIMARY KEY COMMENT '行 ID',
	employee_id VARCHAR(20) COMMENT '员工ID',
    clock_in_time VARCHAR(20)  COMMENT '上班打卡时间',
    clock_out_time VARCHAR(20)  COMMENT '下班打卡时间',
    clock_in_type varchar(100) COMMENT '打卡方式(指纹,人脸识别, IC卡, 手机APP, 手动补录)'
) COMMENT='员工打卡信息表';

数据模型配置:

复制代码
      - name: 员工基础信息表
        id: t_employeeinfo
        type: table
        content: employee_info
        remark: 员工基础信息表
        business-prompt:  
        ignore-fields:
        default-recall-fields:
        analyze-suggest-fields:
        analyze-forbid-fields: employee_id
        sync-value-fields:
        column-alias-map: ''
      - name: 员工考勤明细表
        id: t_employeeattendance
        type: table
        content: employee_attendance
        remark: 包含员工的考勤打卡记录
        business-prompt:  
        ignore-fields:
        default-recall-fields:
        analyze-suggest-fields:
        analyze-forbid-fields: attendance_id
        sync-value-fields:
        column-alias-map: ''

测试数据(略),可以通过deepseek按照这种数据模式帮助造几条数据。

2.系统异常分析

当我们问出"获取考勤打卡迟到员工的姓名",可以正常生成SQL伪代码,SQL中的表名称是配置的知识库ID:

2025-10-17T03:25:44.730Z INFO 47 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e SSE数据结果:{"code": 200, "data": [{"query": "通过员工信息表和考勤记录表,获取每个员工的迟到情况", "nl2sql": "SELECT t_employeeinfo.EMPLOYEE_ID, t_employeeinfo.FULL_NAME, COUNT(CASE WHEN CAST(t_employeeattendance.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(t_employeeattendance.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM t_employeeinfo JOIN t_employeeattendance ON t_employeeinfo.EMPLOYEE_ID = t_employeeattendance.EMPLOYEE_ID GROUP BY t_employeeinfo.EMPLOYEE_ID, t_employeeinfo.FULL_NAME"}], "request_id": "b44bf49f-e299-4c17-8e72-90b991325a6e", "status": "data", "error_msg": ""}

完成表结构替换后,将所有源表都替换成同一个表,导致程序执行错误,返回无法回答问题:

2025-10-17T03:25:46.329Z INFO 47 --- [genie-backend] [ exe-pool-1] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e 执行sql:SELECT employee_info.EMPLOYEE_ID, employee_info.FULL_NAME, COUNT(CASE WHEN CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM employee_info JOIN employee_info ON employee_info.EMPLOYEE_ID = employee_info.EMPLOYEE_ID GROUP BY employee_info.EMPLOYEE_ID, employee_info.FULL_NAME

原因分析:com.jd.genie.service.Nl2SqlService,代码中获取的TableName没有更新,导致被替换成同一张表

修复后,在循环替换过程中,更新tableName

修复效果:

复制代码
2025-10-17T08:31:30.005Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE数据结果:{"code": 200, "data": [{"query": "获取考勤打卡迟到员工的姓名和部门", "nl2sql": "SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')"}], "request_id": "86f76794-e83c-42b3-9958-53904409e6f0", "status": "data", "error_msg": ""}
2025-10-17T08:31:30.018Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE 连接关闭
2025-10-17T08:31:30.025Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0 sse event count:208
2025-10-17T08:31:30.090Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始替换modelname:共有model[5]个
2025-10-17T08:31:30.091Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.093Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.111Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.112Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.c.JdbcConnectionPools          : 从缓存获取连接池 poolId genie-datasource
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.provider.jdbc.JdbcDataProvider   : jdbc执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.114Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.connection.ConnectionWrapper   : 获取数据库链接成功 poolId:genie-datasource
2025-10-17T08:31:30.373Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 查询sql结果大小:5
相关推荐
极客学术工坊1 天前
2023年第二十届五一数学建模竞赛-A题 无人机定点投放问题-基于抛体运动的无人机定点投放问题研究
人工智能·机器学习·数学建模·启发式算法
Theodore_10221 天前
深度学习(9)导数与计算图
人工智能·深度学习·机器学习·矩阵·线性回归
PPIO派欧云1 天前
PPIO上新GPU实例模板,一键部署PaddleOCR-VL
人工智能
TGITCIC1 天前
金融RAG落地之痛:不在模型,而在数据结构
人工智能·ai大模型·ai agent·ai智能体·开源大模型·金融ai·金融rag
chenzhiyuan20181 天前
《十五五规划》下的AI边缘计算机遇:算力下沉与工业智能化
人工智能·边缘计算
whaosoft-1431 天前
51c深度学习~合集11
人工智能
Tiandaren1 天前
大模型应用03 || 函数调用 Function Calling || 概念、思想、流程
人工智能·算法·microsoft·数据分析
领航猿1号1 天前
Pytorch 内存布局优化:Contiguous Memory
人工智能·pytorch·深度学习·机器学习
综合热讯1 天前
宠智灵宠物识别AI:从犬猫到鸟鱼的全生态智能识别
人工智能·宠物
zskj_zhyl1 天前
智慧康养新篇章:七彩喜如何重塑老年生活的温度与尊严
大数据·人工智能·科技·物联网·生活