按照 sql_finger_md5 分组取query_time_ms 最大的一行数据。
使用any函数可以去匹配到的第一行数据,所以可以先让数据按照query_time_ms 排序,然后再使用group by 和any结合取第一行数据,就是最大值的那一行数据。
bash
select
any (time) as time ,
any (query_time_ms) as query_time_ms ,
any (sqltext) as sqltext,
any (inst_id) as inst_id,
any (inst_name) as inst_name,
any (dbname) as dbname,
any (host_address) as host_address,
any (lock_times) as lock_times,
any (parse_row_counts) as parse_row_counts,
any (return_row_counts) as return_row_counts,
any (sql_finger) as sql_finger,
sql_finger_md5,
any (sqltext_md5) as sqltext_md5
FROM
(
SELECT
inst_id,
inst_name,
dbname,
execution_start_time as time,
host_address,
query_time_ms,
sqltext,
lock_times,
parse_row_counts,
return_row_counts,
sql_finger,
sql_finger_md5,
sqltext_md5
FROM
cmdb.rds_all_slow_sql_record_distributed
WHERE
(
execution_start_time >= toDateTime(1711079437)
AND execution_start_time <= toDateTime(1711684237)
)
AND (dbname = 'leopard_admin')
AND host_address not like '%root%' AND host_address not like '%bi_user%' AND sqltext not like '%insert%'
order by sql_finger_md5 desc,
query_time_ms desc
) a
group by
a.sql_finger_md5
order by query_time_ms desc
limit
1000
窗体函数在数据量大的时候性能堪忧,在clickhouse中还有其他的处理方式。比如使用any()、anyLast()函数。
按官方文档的定义:any() "selects the first encountered value.",也就是返回遇到的首个值,看上去是很符合当前的情况。但文档又做了说明:因为查询可能是以任意顺序执行的,并且可能每次执行得顺序都不同(如同我们上面的select * from user_order返回的行顺序不同),所以这个函数的执行结果可能是不确定的。如果要获得确定的值,可以使用"min"或者"max"。或者,select的对象的是一个已经排序过的子查询。
参考资料: