问题发生
某个客户的一套数据库,主库为 MySQL 5.7.44,从库为 GreatSQL 8.0.32-27,从库的 SQL 线程中断,报错如下:
Plain
2026-05-07T21:53:42.151856+08:00 145 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Worker 1 failed executing transaction '18c68804-62e7-11f1-846e-00163ed55da5:3425315180' at master log binlog.022823, end_log_pos 127908803; Could not execute Write_rows event on table ort.bm_cdr; Writing one row to the row-based binary log failed, Error_code: 1534; handler error HA_ERR_RBR_LOGGING_FAILED; the event's master log FIRST, end_log_pos 627908803, Error_code: MY-001534
2026-05-07T21:53:42.245982+08:00 144 [Warning] [MY-010584] [Repl] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
报错信息表示:主库在回放事务 18c68804-62e7-11f1-846e-00163ed55da5:3425315180 时失败了,失败位置发生在执行 Write_rows 事件写入从库本地 row-based binlog 的过程中,最终导致 SQL 线程终止。
问题排查
看到 Could not execute Write_rows event on table,容易误以为是表数据写入失败或数据冲突。
Writing one row to the row-based binary log failedError_code: 1534handler error HA_ERR_RBR_LOGGING_FAILED
这说明故障与从库写本地 binlog 有关。该从库开启了本地 binlog,并记录复制回放产生的更新,即开启了 log_slave_updates / log_replica_updates。因此,SQL 线程不仅要把主库事务回放到表中,还要把该事务再次写入从库自己的 binlog;如果从库 binlog 事务缓存上限过小,就可能在回放大事务时失败。
对比主库和从库参数后,发现关键差异如下:
Plain
主库 max_binlog_cache_size = 4G
从库 max_binlog_cache_size = 10M
主库允许生成较大的事务 binlog,而从库写本地 binlog 时只允许最多使用 10M 的事务缓存。将从库 max_binlog_cache_size 调整为与主库一致后,再启动 SQL 线程,复制恢复正常。
随后使用 mysqlbinlog 解析故障 GTID 对应的 binlog,并与业务侧确认,该 GTID 确实是一个大事务,事务产生的 binlog 体量远超从库 10M 的 max_binlog_cache_size 限制。
我们知道,通常当max_binlog_cache_size设置过小时,会有 Error_code: MY-001197 的报错,那什么时候会出现仅有 Error_code: MY-001534 的报错呢?本文整理复现过程如下。
实验验证
测试环境参数配置
搭建主从异步复制环境,主库端参数配置如下:
Plain
binlog_cache_size =32768
max_binlog_cache_size =4294967296 #4G
max_binlog_size =1073741824
binlog_row_image =FULL
binlog_rows_query_log_events =ON
从库端参数配置如下:
Plain
binlog_cache_size =32768
max_binlog_cache_size =10485760 #10M
max_binlog_size =1073741824
binlog_row_image =FULL
binlog_rows_query_log_events =ON
log_replica_updates =ON
复现该问题的关键前提是:从库开启本地 binlog,并记录复制回放的更新。如果从库不写本地 binlog,通常不会在 SQL 线程回放时触发本地
max_binlog_cache_size限制。
创建测试表
在主库创建测试表并插入数据
SQL
mysql> CREATE TABLE test.t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, c2 VARCHAR(255), c3 VARCHAR(255), c4 VARCHAR(255), c5 VARCHAR(255));
mysql> INSERT INTO test.t1 VALUES (NULL, REPEAT('a', 250), REPEAT('b', 250), REPEAT('c',250), REPEAT('d', 250));
-- 持续执行如下语句14次
mysql> INSERT INTO test.t1 (c2,c3,c4,c5) SELECT c2,c3,c4,c5 FROM t1;
mysql> SELECT COUNT(*) FROM test.t1;
+----------+
| count(*) |
+----------+
| 16384 |
+----------+
1 row in set (0.04 sec)
实验一:单语句大事务触发错误 1197 + 1534
主库执行如下INSERT语句,插入10319条数据,这时产生的 binlog 未超过从库max_binlog_cache_size=10M的限制,查看主从复制状态没有异常,且主从数据一致。
SQL
mysql> INSERT INTO test.t1 (c2,c3,c4,c5) SELECT c2,c3,c4,c5 FROM t1 LIMIT 10319;
Query OK, 10319 rows affected (0.62 sec)
Records: 10319 Duplicates: 0 Warnings: 0
主库执行如下INSERT语句,插入10320条数据,这时产生的binlog已经超过从库max_binlog_cache_size=10M的限制。
SQL
mysql> INSERT INTO test.t1 (c2,c3,c4,c5) SELECT c2,c3,c4,c5 FROM t1 LIMIT 10320;
Query OK, 10320 rows affected (0.62 sec)
Records: 10320 Duplicates: 0 Warnings: 0
再次查看主从复制状态,SQL 线程已经异常终止,error.log 中打印错误信息如下:
Plain
2026-06-10T11:19:02.366583+08:00 212 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Worker 1 failed executing transaction '18c68804-62e7-11f1-846e-00163ed55da5:85' at master log binlog.000001, end_log_pos 755113733; Could not execute Write_rows event on table test.t1; Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again, Error_code: 1197; Writing one row to the row-based binary log failed, Error_code: 1534; handler error HA_ERR_RBR_LOGGING_FAILED; the event's master log binlog.000001, end_log_pos 755113733, Error_code: MY-001197
2026-06-10T11:19:02.548790+08:00 211 [Warning] [MY-010584] [Repl] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
可以看到错误日志中明确提示max_binlog_cache_size过小,错误码为1197+1534。此时调大max_binlog_cache_size再 START SLAVE 问题即可解决。
实验二:多语句事务只暴露错误 1534
在主库开启事务,依次执行以下语句:
SQL
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO test.t1 (c2,c3,c4,c5) SELECT c2,c3,c4,c5 FROM t1 LIMIT 10319;
Query OK, 10319 rows affected (0.25 sec)
Records: 10319 Duplicates: 0 Warnings: 0
mysql> INSERT INTO test.t1 VALUES (NULL, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
Query OK, 1 row affected (0.00 sec)
mysql> COMMIT;
Query OK, 0 rows affected (0.06 sec)
此时观察从库主从复制状态,显示 SQL 线程状态异常,error.log 中信息如下,可以看到,只有1534报错码,没有提示max_binlog_cache_size过小的信息。
Plain
2026-06-10T11:34:24.333342+08:00 232 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Worker 1 failed executing transaction '18c68804-62e7-11f1-846e-00163ed55da5:86' at master log binlog.000001, end_log_pos 771775512; Could not execute Write_rows event on table test.t1; Writing one row to the row-based binary log failed, Error_code: 1534; handler error HA_ERR_RBR_LOGGING_FAILED; the event's master log FIRST, end_log_pos 771775512, Error_code: MY-001534
2026-06-10T11:34:24.537353+08:00 231 [Warning] [MY-010584] [Repl] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
实验结论
以上两种情况,都是由于从库端max_binlog_cache_size设置过小导致SQL线程异常终止。前者是批量插入,binlog Event 总量超过 max_binlog_cache_size 限制时,数据库会抛出明确的 1197 错误(缓存超限),并伴随 1534 错误(单行写入失败)。
后者是最后一个单行插入语句生成的 Row Event 过大,超出了当前 binlog Cache 剩余可用空间,导致这一行无法写入,从而仅触发 1534 错误。虽然 error.log 记录的信息不同,但都属于同一故障。
总结
- 从库也会受 binlog cache 限制。 只要从库开启本地 binlog 并记录复制更新,SQL 线程回放事务时就要写本地 binlog,因此会受从库
max_binlog_cache_size限制。 1534** 不是数据冲突。** 当日志中出现Writing one row to the row-based binary log failed和HA_ERR_RBR_LOGGING_FAILED时,应优先检查 row-based binlog 写入链路,包括max_binlog_cache_size、磁盘空间、临时目录、本地 binlog 配置等。- 主库和从库关键参数应保持基线一致。 尤其是
max_binlog_cache_size、binlog_format、binlog_row_image、max_allowed_packet、lower_case_table_names等会影响复制兼容性或回放行为的参数,建议在变更前后做自动化比对。 - 大事务是根因风险。 调大
max_binlog_cache_size可以恢复复制,但不能消除大事务带来的 binlog 膨胀、复制延迟、回滚成本、备份恢复耗时等问题。批量 DML 应尽量拆分成小事务分批提交。 - 不要混淆
max_binlog_cache_size和max_binlog_size。 前者限制单事务写 binlog 可使用的缓存上限,后者只是 binlog 文件滚动的期望大小。一个大事务不会被拆到多个 binlog 文件中。 - 阈值不要照搬。 本文中 10319/10320 行只是当前表结构、字符集、row image、binlog 配置和版本组合下的临界点。换成其他表结构或参数后,临界点会变化。