JoyAgent问数多表关联Bug修复

我们测试了JoyAgent单表问数表现,整体问数效果高于预期,能够自动加工复杂的数据场景。例如:我们导入员工信息表,包含身份证号码和出生日期等字段,我们通过问数"获取身份证号码中的出生日期和登记的出生日期有差异的员工",系统先从身份证号码中提取出生日期,并做日期格式转换,再判断是否相同,可以很好的反馈正确数据。

但是,真实分析场景很少在单张表上做数据分析,会横跨多张表做数据关联。JoyAgent的源代码以及样例中并未展示或测试,需要多表关联的业务问数。我们在检查源代码后,评估JoyAgent可以完成多表关联任务,但是,源代码并无外键关系知识库编写规范和样例可供参考。于是,我们设计了一个多表关联的数据场景。并在此场景下,做了工程测试。

我们在完成场景设计和测试数据导入后,系统自动生成了表关联SQL代码,但是在表名字替换阶段出现了问题。下面我们逐步描述我们设置的查询场景,以及查询场景之下遇到的一个系统bug,并成功修复,达到预期效果。我们逐步描述场景、问题、修复方案

1.数据场景

我们导入两张表作为问数基础:员工信息表,员工考勤信息表。建表语句如下:

复制代码
CREATE TABLE employee_info (
    employee_id VARCHAR(20) PRIMARY KEY COMMENT '员工ID(主键)',
    full_name VARCHAR(50) NOT NULL COMMENT '员工全名',
    gender VARCHAR(50) COMMENT '性别:男或女',
    nationality VARCHAR(30) COMMENT '国籍',
    id_card VARCHAR(20)  COMMENT '身份证号',
    birth_date VARCHAR(200) NOT NULL COMMENT '出生日期',
    department VARCHAR(50) NOT NULL COMMENT '所属部门',
    marital_status VARCHAR(20)  COMMENT '婚姻状况(未婚,已婚,离异)',
    education VARCHAR(20) COMMENT '最高学历(高中,专科,本科,硕士,博士)',
    contact_phone VARCHAR(15) COMMENT '联系电话',
    emergency_contact VARCHAR(15) COMMENT '紧急联系人电话',
    address VARCHAR(1000) COMMENT '现居住地址',
    hire_date VARCHAR(100) NOT NULL COMMENT '入职日期'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='员工基础信息表';

CREATE TABLE employee_attendance (
    attendance_id INT PRIMARY KEY COMMENT '行 ID',
	employee_id VARCHAR(20) COMMENT '员工ID',
    clock_in_time VARCHAR(20)  COMMENT '上班打卡时间',
    clock_out_time VARCHAR(20)  COMMENT '下班打卡时间',
    clock_in_type varchar(100) COMMENT '打卡方式(指纹,人脸识别, IC卡, 手机APP, 手动补录)'
) COMMENT='员工打卡信息表';

数据模型配置:

复制代码
      - name: 员工基础信息表
        id: t_employeeinfo
        type: table
        content: employee_info
        remark: 员工基础信息表
        business-prompt:  
        ignore-fields:
        default-recall-fields:
        analyze-suggest-fields:
        analyze-forbid-fields: employee_id
        sync-value-fields:
        column-alias-map: ''
      - name: 员工考勤明细表
        id: t_employeeattendance
        type: table
        content: employee_attendance
        remark: 包含员工的考勤打卡记录
        business-prompt:  
        ignore-fields:
        default-recall-fields:
        analyze-suggest-fields:
        analyze-forbid-fields: attendance_id
        sync-value-fields:
        column-alias-map: ''

测试数据(略),可以通过deepseek按照这种数据模式帮助造几条数据。

2.系统异常分析

当我们问出"获取考勤打卡迟到员工的姓名",可以正常生成SQL伪代码,SQL中的表名称是配置的知识库ID:

2025-10-17T03:25:44.730Z INFO 47 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e SSE数据结果:{"code": 200, "data": [{"query": "通过员工信息表和考勤记录表,获取每个员工的迟到情况", "nl2sql": "SELECT t_employeeinfo.EMPLOYEE_ID, t_employeeinfo.FULL_NAME, COUNT(CASE WHEN CAST(t_employeeattendance.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(t_employeeattendance.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM t_employeeinfo JOIN t_employeeattendance ON t_employeeinfo.EMPLOYEE_ID = t_employeeattendance.EMPLOYEE_ID GROUP BY t_employeeinfo.EMPLOYEE_ID, t_employeeinfo.FULL_NAME"}], "request_id": "b44bf49f-e299-4c17-8e72-90b991325a6e", "status": "data", "error_msg": ""}

完成表结构替换后,将所有源表都替换成同一个表,导致程序执行错误,返回无法回答问题:

2025-10-17T03:25:46.329Z INFO 47 --- [genie-backend] [ exe-pool-1] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e 执行sql:SELECT employee_info.EMPLOYEE_ID, employee_info.FULL_NAME, COUNT(CASE WHEN CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM employee_info JOIN employee_info ON employee_info.EMPLOYEE_ID = employee_info.EMPLOYEE_ID GROUP BY employee_info.EMPLOYEE_ID, employee_info.FULL_NAME

原因分析:com.jd.genie.service.Nl2SqlService,代码中获取的TableName没有更新,导致被替换成同一张表

修复后,在循环替换过程中,更新tableName

修复效果:

复制代码
2025-10-17T08:31:30.005Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE数据结果:{"code": 200, "data": [{"query": "获取考勤打卡迟到员工的姓名和部门", "nl2sql": "SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')"}], "request_id": "86f76794-e83c-42b3-9958-53904409e6f0", "status": "data", "error_msg": ""}
2025-10-17T08:31:30.018Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE 连接关闭
2025-10-17T08:31:30.025Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0 sse event count:208
2025-10-17T08:31:30.090Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始替换modelname:共有model[5]个
2025-10-17T08:31:30.091Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.093Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.111Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.112Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.c.JdbcConnectionPools          : 从缓存获取连接池 poolId genie-datasource
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.provider.jdbc.JdbcDataProvider   : jdbc执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.114Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.connection.ConnectionWrapper   : 获取数据库链接成功 poolId:genie-datasource
2025-10-17T08:31:30.373Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 查询sql结果大小:5
相关推荐
薛定e的猫咪7 分钟前
【论文精读】ICLR 2023 --- 作为离线强化学习强表达能力策略类的扩散策略
人工智能·深度学习·机器学习·stable diffusion
连线Insight15 分钟前
当考公遇上AI,粉笔能吸引用户付费吗?
人工智能
●VON20 分钟前
开源 vs 商业:主流AI生态概览——从PyTorch到OpenAI的技术格局之争
人工智能·pytorch·开源
乾元1 小时前
AI 在网络工程中的 12 个高频场景深度实战(Cisco / Huawei 双体系)
人工智能
子午2 小时前
【食物识别系统】Python+TensorFlow+Vue3+Django+人工智能+深度学习+卷积网络+resnet50算法
人工智能·python·深度学习
Dev7z2 小时前
基于深度学习和图像处理的药丸计数与分类系统研究
图像处理·人工智能·深度学习
Mxsoft6192 小时前
某次联邦学习训练模型不准,发现协议转换字段映射错,手动校验救场!
人工智能
shayudiandian3 小时前
用PyTorch训练一个猫狗分类器
人工智能·pytorch·深度学习
这儿有一堆花3 小时前
把 AI 装进终端:Gemini CLI 上手体验与核心功能解析
人工智能·ai·ai编程
子午3 小时前
【蘑菇识别系统】Python+TensorFlow+Vue3+Django+人工智能+深度学习+卷积网络+resnet50算法
人工智能·python·深度学习