1. 现象
- 程序提供业务逻辑:接收提交的任务请求,然后执行一个Job
- 任务卡住不执行了
- 程序错误日志显示如下错误:
bash
scheduleThread error:
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
2. 排查
2.1 尝试加大数据库锁超时时间(结果无效)
sql
-- 查看当前锁超时时间(默认50)
SHOW VARIABLES LIKE 'innodb_lock_wait_timeout';
-- 查看事务超时时间(默认50)
SHOW VARIABLES LIKE 'lock_wait_timeout';
- 临时调整(重启后失效)
sql
SET GLOBAL innodb_lock_wait_timeout = 120;
SET GLOBAL lock_wait_timeout = 120;
2.2 查询锁等待情况(看到有2个等待)
sql
-- 查看正在等待锁的线程
-- mysql8+ 语法不同,详见本文"补充"
SELECT * FROM information_schema.INNODB_LOCKS;
SELECT * FROM information_schema.INNODB_LOCK_WAITS;
2.3 查看当前所有线程(看到有大量线程且时间超过7000秒)
sql
-- 查看所有连接线程
SHOW PROCESSLIST;
-- 或者更详细的信息
SELECT * FROM information_schema.PROCESSLIST;
| process_id | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
|---|---|---|---|---|---|---|---|
| 1744 | root | 10.10.20.45:62890 | rouyi-vue-plus | Sleep | 7325 | NULL | |
| 1757 | root | 10.10.20.45:62937 | rouyi-vue-plus | Sleep | 7326 | NULL | |
| 1778 | root | 10.10.20.45:63581 | rouyi-vue-plus | Sleep | 7344 | NULL | |
| 1739 | root | 10.10.20.45:62870 | rouyi-vue-plus | Sleep | 7422 | NULL | |
| 1759 | root | 10.10.20.45:62951 | rouyi-vue-plus | Sleep | 7344 | NULL | |
| 1756 | root | 10.10.20.45:62929 | rouyi-vue-plus | Sleep | 7344 | NULL |
2.4 查找锁等待相关的线程(显示了一堆线程)
sql
-- 查找正在等待锁的线程
SELECT
p.ID as process_id,
p.USER,
p.HOST,
p.DB,
p.COMMAND,
p.TIME,
p.STATE,
p.INFO
FROM information_schema.PROCESSLIST p
WHERE p.STATE LIKE '%lock%'
OR p.COMMAND = 'Sleep' AND p.TIME > 60;
2.5 精确查找阻塞的线程(显示一个阻塞线程1744)
sql
-- 查看锁等待关系
SELECT
r.trx_id waiting_trx_id,
r.trx_mysql_thread_id waiting_thread_id,
r.trx_query waiting_query,
b.trx_id blocking_trx_id,
b.trx_mysql_thread_id blocking_thread_id,
b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;
| waiting_trx_id | waiting_thread_id | waiting_query | blocking_trx_id | blocking_thread_id | blocking_query |
|---|---|---|---|---|---|
| 487168419 | 1998 | select * from xxl_job_lock where lock_name = 'schedule_lock' for update | 486996078 | 1744 | NULL |
2.6 查看阻塞线程的详细信息
sql
-- 查看阻塞线程1744的详细信息
SELECT
ID as process_id,
USER,
HOST,
DB,
COMMAND,
TIME as time_seconds,
STATE,
INFO
FROM information_schema.PROCESSLIST
WHERE ID = 1744;
2.7 查看阻塞线程的完整历史
sql
-- 查看该线程的完整历史
SELECT * FROM performance_schema.events_statements_history
WHERE THREAD_ID IN (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID = 1744)
ORDER BY EVENT_ID DESC;
- 线程1744的完整历史记录(简述)
sql
# 线程1744的完整历史记录(简述)
# 最后阻塞的原因就是54设置了当前session手动提交事务,55 获取了行锁(select * from xxl_job_lock where lock_name = 'schedule_lock' for update),但始终没有提交。
时间线(从新到旧):
1. EVENT_ID 55: 获取schedule_lock锁 (当前阻塞状态)
2. EVENT_ID 54: 设置autocommit=0 (开启事务)
3. EVENT_ID 53: 执行复杂查询 (扫描394,197行,耗时644秒!)
4. EVENT_ID 52: 执行另一个复杂查询 (扫描53,388行,耗时243秒)
5. EVENT_ID 51: 设置autocommit=1 (提交前一个事务)
6. EVENT_ID 50: commit (提交事务)
7. EVENT_ID 49: 之前获取schedule_lock锁
8. EVENT_ID 48: 设置autocommit=0
3. 解决
- 杀掉1744进程(该进程不是正常服务调用产生的进程,是研发连接的一个终端)
sql
-- KILL 阻塞的进程 (1744)
KILL 1744;
- 杀掉1744进程后,阻塞解除
sql
-- 验证阻塞是否解除
SELECT
r.trx_id waiting_trx_id,
r.trx_mysql_thread_id waiting_thread,
b.trx_mysql_thread_id blocking_thread
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;
4. 补充
4.1 查看长时间运行的查询
sql
-- 查找运行时间超过60秒的查询
SELECT
'长时间运行查询' as title;
SELECT
ID as process_id,
USER,
HOST,
DB,
COMMAND,
TIME,
STATE,
LEFT(INFO, 100) as query_snippet
FROM information_schema.PROCESSLIST
WHERE TIME > 60
ORDER BY TIME DESC;
4.2 查看所有线程
sql
-- 综合诊断脚本
SELECT
'当前进程状态' as title;
SHOW PROCESSLIST;
4.3 查看所有锁等待
sql
-- 使用系统库查询
SELECT * FROM sys.innodb_lock_waits;
-- 查看详细信息
-- mysq 5.7 查询
SELECT
'锁等待情况' as title;
SELECT
r.trx_id waiting_trx_id,
r.trx_mysql_thread_id waiting_thread,
r.trx_query waiting_query,
b.trx_id blocking_trx_id,
b.trx_mysql_thread_id blocking_thread,
b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;
-- mysq 8.4 查询
SELECT
'锁等待情况' as title;
SELECT
r.trx_id waiting_trx_id,
r.trx_mysql_thread_id waiting_thread,
r.trx_query waiting_query,
b.trx_id blocking_trx_id,
b.trx_mysql_thread_id blocking_thread,
b.trx_query blocking_query
FROM performance_schema.data_lock_waits w
INNER JOIN information_schema.innodb_trx b
ON b.trx_id = w.blocking_engine_transaction_id
INNER JOIN information_schema.innodb_trx r
ON r.trx_id = w.requesting_engine_transaction_id;
sql
SELECT
'长时间运行查询' as title;
SELECT
ID as process_id,
USER,
HOST,
DB,
COMMAND,
TIME,
STATE,
LEFT(INFO, 100) as query_snippet
FROM information_schema.PROCESSLIST
WHERE TIME > 60
ORDER BY TIME DESC;
4.4 查看最近死锁信息
sql
SHOW ENGINE INNODB STATUS;
关注以下信息
- TRANSACTIONS(事务部分)
- LOCK WAIT:表示有锁等待
- lock_mode X:排他锁等待
- waiting:正在等待锁
- TRX HAS BEEN WAITING 5 SEC:已等待时间
-
LATEST DETECTED DEADLOCK(最新死锁信息)
-
锁类型说明
在输出中常见的锁类型
- lock_mode X:排他锁(写锁)
- lock_mode S:共享锁(读锁)
- locks rec but not gap:记录锁
- locks gap before rec:间隙锁
- locks gap and rec:临键锁(间隙锁+记录锁)
- waiting:正在等待该锁