JoyAgent问数多表关联Bug修复

我们测试了JoyAgent单表问数表现,整体问数效果高于预期,能够自动加工复杂的数据场景。例如:我们导入员工信息表,包含身份证号码和出生日期等字段,我们通过问数"获取身份证号码中的出生日期和登记的出生日期有差异的员工",系统先从身份证号码中提取出生日期,并做日期格式转换,再判断是否相同,可以很好的反馈正确数据。

但是,真实分析场景很少在单张表上做数据分析,会横跨多张表做数据关联。JoyAgent的源代码以及样例中并未展示或测试,需要多表关联的业务问数。我们在检查源代码后,评估JoyAgent可以完成多表关联任务,但是,源代码并无外键关系知识库编写规范和样例可供参考。于是,我们设计了一个多表关联的数据场景。并在此场景下,做了工程测试。

我们在完成场景设计和测试数据导入后,系统自动生成了表关联SQL代码,但是在表名字替换阶段出现了问题。下面我们逐步描述我们设置的查询场景,以及查询场景之下遇到的一个系统bug,并成功修复,达到预期效果。我们逐步描述场景、问题、修复方案

1.数据场景

我们导入两张表作为问数基础:员工信息表,员工考勤信息表。建表语句如下:

复制代码
CREATE TABLE employee_info (
    employee_id VARCHAR(20) PRIMARY KEY COMMENT '员工ID(主键)',
    full_name VARCHAR(50) NOT NULL COMMENT '员工全名',
    gender VARCHAR(50) COMMENT '性别:男或女',
    nationality VARCHAR(30) COMMENT '国籍',
    id_card VARCHAR(20)  COMMENT '身份证号',
    birth_date VARCHAR(200) NOT NULL COMMENT '出生日期',
    department VARCHAR(50) NOT NULL COMMENT '所属部门',
    marital_status VARCHAR(20)  COMMENT '婚姻状况(未婚,已婚,离异)',
    education VARCHAR(20) COMMENT '最高学历(高中,专科,本科,硕士,博士)',
    contact_phone VARCHAR(15) COMMENT '联系电话',
    emergency_contact VARCHAR(15) COMMENT '紧急联系人电话',
    address VARCHAR(1000) COMMENT '现居住地址',
    hire_date VARCHAR(100) NOT NULL COMMENT '入职日期'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='员工基础信息表';

CREATE TABLE employee_attendance (
    attendance_id INT PRIMARY KEY COMMENT '行 ID',
	employee_id VARCHAR(20) COMMENT '员工ID',
    clock_in_time VARCHAR(20)  COMMENT '上班打卡时间',
    clock_out_time VARCHAR(20)  COMMENT '下班打卡时间',
    clock_in_type varchar(100) COMMENT '打卡方式(指纹,人脸识别, IC卡, 手机APP, 手动补录)'
) COMMENT='员工打卡信息表';

数据模型配置:

复制代码
      - name: 员工基础信息表
        id: t_employeeinfo
        type: table
        content: employee_info
        remark: 员工基础信息表
        business-prompt:  
        ignore-fields:
        default-recall-fields:
        analyze-suggest-fields:
        analyze-forbid-fields: employee_id
        sync-value-fields:
        column-alias-map: ''
      - name: 员工考勤明细表
        id: t_employeeattendance
        type: table
        content: employee_attendance
        remark: 包含员工的考勤打卡记录
        business-prompt:  
        ignore-fields:
        default-recall-fields:
        analyze-suggest-fields:
        analyze-forbid-fields: attendance_id
        sync-value-fields:
        column-alias-map: ''

测试数据(略),可以通过deepseek按照这种数据模式帮助造几条数据。

2.系统异常分析

当我们问出"获取考勤打卡迟到员工的姓名",可以正常生成SQL伪代码,SQL中的表名称是配置的知识库ID:

2025-10-17T03:25:44.730Z INFO 47 --- genie-backend 48.237:1601/... com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e SSE数据结果:{"code": 200, "data": {"query": "通过员工信息表和考勤记录表,获取每个员工的迟到情况", "nl2sql": "**SELECT `t_employeeinfo`.`EMPLOYEE_ID`, `t_employeeinfo`.`FULL_NAME`, COUNT(CASE WHEN CAST(`t_employeeattendance`.`CLOCK_IN_TIME` AS TIMESTAMP) \> CAST(DATE_FORMAT(CAST(`t_employeeattendance`.`CLOCK_IN_TIME` AS TIMESTAMP), 'yyyy-MM-dd') \|\| ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS `late_count` FROM `t_employeeinfo` JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` GROUP BY `t_employeeinfo`.`EMPLOYEE_ID`, `t_employeeinfo`.`FULL_NAME`"}, "request_id": "b44bf49f-e299-4c17-8e72-90b991325a6e", "status": "data", "error_msg": ""}**

完成表结构替换后,将所有源表都替换成同一个表,导致程序执行错误,返回无法回答问题:

2025-10-17T03:25:46.329Z INFO 47 --- genie-backend exe-pool-1 com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e 执行sql:SELECT employee_info.EMPLOYEE_ID, employee_info.FULL_NAME, COUNT(CASE WHEN CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM employee_info JOIN employee_info ON employee_info.EMPLOYEE_ID = employee_info.EMPLOYEE_ID GROUP BY employee_info.EMPLOYEE_ID, employee_info.FULL_NAME

原因分析:com.jd.genie.service.Nl2SqlService,代码中获取的TableName没有更新,导致被替换成同一张表

修复后,在循环替换过程中,更新tableName

修复效果:

复制代码
2025-10-17T08:31:30.005Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE数据结果:{"code": 200, "data": [{"query": "获取考勤打卡迟到员工的姓名和部门", "nl2sql": "SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')"}], "request_id": "86f76794-e83c-42b3-9958-53904409e6f0", "status": "data", "error_msg": ""}
2025-10-17T08:31:30.018Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE 连接关闭
2025-10-17T08:31:30.025Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0 sse event count:208
2025-10-17T08:31:30.090Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始替换modelname:共有model[5]个
2025-10-17T08:31:30.091Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.093Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.111Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.112Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.c.JdbcConnectionPools          : 从缓存获取连接池 poolId genie-datasource
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.provider.jdbc.JdbcDataProvider   : jdbc执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.114Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.connection.ConnectionWrapper   : 获取数据库链接成功 poolId:genie-datasource
2025-10-17T08:31:30.373Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 查询sql结果大小:5
相关推荐
网易云信14 分钟前
AI 融入协作场景,Hermes 接入云信 IM
人工智能·agent
vivo互联网技术15 分钟前
ICLR 2026 | 基于后验采样的图像恢复方法LearnIR:人脸去阴影、去雾
人工智能·算法·aigc
饼干哥哥1 小时前
ChatGPT会员掉了,代充黑幕藏不住了
人工智能·操作系统·产品
ZzT1 小时前
Claude Sonnet 5 来了:Opus 级的能力,Sonnet 的价
人工智能·ai编程·claude
用户5191495848451 小时前
CVE-2025-14440 漏洞利用工具 - WordPress 插件认证绕过检测
人工智能·aigc
网易云信1 小时前
网易智企亮相2026上海文创展:重新定义文创潮玩的“生命力”
人工智能·产品
魏祖潇2 小时前
DDD、TDD、SDD——AI 时代工程师的三件秩序乐器
人工智能·ai编程
Bigfish_coding2 小时前
前端转agent-【python】-18 Agent 与本地应用结合:让 AI 操作你的浏览器
人工智能
网易云信2 小时前
OpenClaw最佳实践:部署在圈组的AI团队
人工智能·agent
爱读源码的大都督2 小时前
Claude Code源码解析(一):为什么Claude Code系统提示词中需要有tools?
人工智能