排查Mysql死锁问题

1. 现象

  1. 程序提供业务逻辑:接收提交的任务请求,然后执行一个Job
  2. 任务卡住不执行了
  3. 程序错误日志显示如下错误:
bash 复制代码
scheduleThread error:
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction

2. 排查

2.1 尝试加大数据库锁超时时间(结果无效)

sql 复制代码
-- 查看当前锁超时时间(默认50)
SHOW VARIABLES LIKE 'innodb_lock_wait_timeout';
-- 查看事务超时时间(默认50)  
SHOW VARIABLES LIKE 'lock_wait_timeout';
  • 临时调整(重启后失效)
sql 复制代码
SET GLOBAL innodb_lock_wait_timeout = 120;
SET GLOBAL lock_wait_timeout = 120;

2.2 查询锁等待情况(看到有2个等待)

sql 复制代码
-- 查看正在等待锁的线程
-- mysql8+ 语法不同,详见本文"补充"
SELECT * FROM information_schema.INNODB_LOCKS;
SELECT * FROM information_schema.INNODB_LOCK_WAITS;

2.3 查看当前所有线程(看到有大量线程且时间超过7000秒)

sql 复制代码
-- 查看所有连接线程
SHOW PROCESSLIST;
-- 或者更详细的信息
SELECT * FROM information_schema.PROCESSLIST;
process_id USER HOST DB COMMAND TIME STATE INFO
1744 root 10.10.20.45:62890 rouyi-vue-plus Sleep 7325 NULL
1757 root 10.10.20.45:62937 rouyi-vue-plus Sleep 7326 NULL
1778 root 10.10.20.45:63581 rouyi-vue-plus Sleep 7344 NULL
1739 root 10.10.20.45:62870 rouyi-vue-plus Sleep 7422 NULL
1759 root 10.10.20.45:62951 rouyi-vue-plus Sleep 7344 NULL
1756 root 10.10.20.45:62929 rouyi-vue-plus Sleep 7344 NULL

2.4 查找锁等待相关的线程(显示了一堆线程)

sql 复制代码
-- 查找正在等待锁的线程
SELECT 
    p.ID as process_id,
    p.USER,
    p.HOST, 
    p.DB,
    p.COMMAND,
    p.TIME,
    p.STATE,
    p.INFO
FROM information_schema.PROCESSLIST p
WHERE p.STATE LIKE '%lock%' 
   OR p.COMMAND = 'Sleep' AND p.TIME > 60;

2.5 精确查找阻塞的线程(显示一个阻塞线程1744)

sql 复制代码
-- 查看锁等待关系
SELECT 
    r.trx_id waiting_trx_id,
    r.trx_mysql_thread_id waiting_thread_id,
    r.trx_query waiting_query,
    b.trx_id blocking_trx_id,
    b.trx_mysql_thread_id blocking_thread_id,
    b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;
waiting_trx_id waiting_thread_id waiting_query blocking_trx_id blocking_thread_id blocking_query
487168419 1998 select * from xxl_job_lock where lock_name = 'schedule_lock' for update 486996078 1744 NULL

2.6 查看阻塞线程的详细信息

sql 复制代码
-- 查看阻塞线程1744的详细信息
SELECT 
    ID as process_id,
    USER,
    HOST,
    DB,
    COMMAND,
    TIME as time_seconds,
    STATE,
    INFO
FROM information_schema.PROCESSLIST 
WHERE ID = 1744;

2.7 查看阻塞线程的完整历史

sql 复制代码
-- 查看该线程的完整历史
SELECT * FROM performance_schema.events_statements_history 
WHERE THREAD_ID IN (SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID = 1744)
ORDER BY EVENT_ID DESC;
  • 线程1744的完整历史记录(简述)
sql 复制代码
# 线程1744的完整历史记录(简述)
# 最后阻塞的原因就是54设置了当前session手动提交事务,55 获取了行锁(select * from xxl_job_lock where lock_name = 'schedule_lock' for update),但始终没有提交。
时间线(从新到旧):
1. EVENT_ID 55: 获取schedule_lock锁 (当前阻塞状态)
2. EVENT_ID 54: 设置autocommit=0 (开启事务)
3. EVENT_ID 53: 执行复杂查询 (扫描394,197行,耗时644秒!)
4. EVENT_ID 52: 执行另一个复杂查询 (扫描53,388行,耗时243秒)
5. EVENT_ID 51: 设置autocommit=1 (提交前一个事务)
6. EVENT_ID 50: commit (提交事务)
7. EVENT_ID 49: 之前获取schedule_lock锁
8. EVENT_ID 48: 设置autocommit=0

3. 解决

  1. 杀掉1744进程(该进程不是正常服务调用产生的进程,是研发连接的一个终端)
sql 复制代码
-- KILL 阻塞的进程 (1744)
KILL 1744;
  1. 杀掉1744进程后,阻塞解除
sql 复制代码
-- 验证阻塞是否解除
SELECT 
    r.trx_id waiting_trx_id,
    r.trx_mysql_thread_id waiting_thread,
    b.trx_mysql_thread_id blocking_thread
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;

4. 补充

4.1 查看长时间运行的查询

sql 复制代码
-- 查找运行时间超过60秒的查询
SELECT 
    '长时间运行查询' as title;
SELECT 
    ID as process_id,
    USER,
    HOST,
    DB,
    COMMAND,
    TIME,
    STATE,
    LEFT(INFO, 100) as query_snippet
FROM information_schema.PROCESSLIST
WHERE TIME > 60
ORDER BY TIME DESC;

4.2 查看所有线程

sql 复制代码
-- 综合诊断脚本
SELECT 
    '当前进程状态' as title;
SHOW PROCESSLIST;

4.3 查看所有锁等待

sql 复制代码
-- 使用系统库查询
SELECT * FROM sys.innodb_lock_waits;

-- 查看详细信息
-- mysq 5.7 查询
SELECT 
    '锁等待情况' as title;
SELECT 
    r.trx_id waiting_trx_id,
    r.trx_mysql_thread_id waiting_thread,
    r.trx_query waiting_query,
    b.trx_id blocking_trx_id,
    b.trx_mysql_thread_id blocking_thread,
    b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;

-- mysq 8.4 查询
SELECT 
    '锁等待情况' as title;
SELECT
  r.trx_id waiting_trx_id,
  r.trx_mysql_thread_id waiting_thread,
  r.trx_query waiting_query,
  b.trx_id blocking_trx_id,
  b.trx_mysql_thread_id blocking_thread,
  b.trx_query blocking_query
FROM performance_schema.data_lock_waits w
INNER JOIN information_schema.innodb_trx b
  ON b.trx_id = w.blocking_engine_transaction_id
INNER JOIN information_schema.innodb_trx r
  ON r.trx_id = w.requesting_engine_transaction_id;
sql 复制代码
SELECT 
    '长时间运行查询' as title;
SELECT 
    ID as process_id,
    USER,
    HOST,
    DB,
    COMMAND,
    TIME,
    STATE,
    LEFT(INFO, 100) as query_snippet
FROM information_schema.PROCESSLIST
WHERE TIME > 60
ORDER BY TIME DESC;

4.4 查看最近死锁信息

sql 复制代码
SHOW ENGINE INNODB STATUS;

关注以下信息

  1. TRANSACTIONS(事务部分)
  • LOCK WAIT:表示有锁等待
  • lock_mode X:排他锁等待
  • waiting:正在等待锁
  • TRX HAS BEEN WAITING 5 SEC:已等待时间
  1. LATEST DETECTED DEADLOCK(最新死锁信息)

  2. 锁类型说明

    在输出中常见的锁类型

  • lock_mode X:排他锁(写锁)
  • lock_mode S:共享锁(读锁)
  • locks rec but not gap:记录锁
  • locks gap before rec:间隙锁
  • locks gap and rec:临键锁(间隙锁+记录锁)
  • waiting:正在等待该锁
相关推荐
('-')41 分钟前
《从根上理解MySQL是怎样运行的》第二十二章学习笔记
笔记·学习·mysql
UCoding1 小时前
我们来学mysql -- 隐式锁,是机制锁,是规则锁
mysql·mysql隐式锁·升级为显示锁
冉冰学姐1 小时前
SSM旅游足迹分享系统19i58(程序+源码+数据库+调试部署+开发环境)带论文文档1万字以上,文末可获取,系统界面在最后面
数据库·旅游·ssm 框架应用·旅游足迹分享·攻略管理·出行计划
yaoxin5211231 小时前
为什么 IRIS SQL 会比 Spring JDBC 更快?
数据库·sql·spring
M***Z2101 小时前
SQL中如何添加数据
数据库·sql
p***s911 小时前
MySQL的底层原理与架构
数据库·mysql·架构
b***62952 小时前
Redis 设置密码无效问题解决
数据库·redis·缓存
I***26152 小时前
Windows环境下安装Redis并设置Redis开机自启
数据库·windows·redis
v***87042 小时前
【SqlServer】SQL Server Management Studio (SSMS) 下载、安装、配置使用及卸载——保姆级教程
数据库·sqlserver