一、问题场景
我们先来看这样一个场景,版本8028,参数
- slave_parallel_type:LOGICAL_CLOCK
- slave_parallel_workers:4
如果下面的情况下,因为Id 29的worker线程执行了一个大事务,我们嫌它慢,发起了一个kill 29操作后如下2个问题,
- 是否事务会回滚?
- SQL线程和WORKER线程如何表现?
sql
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 776 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 82 | | NULL |
| 28 | system user | | NULL | Query | 32 | Replica has read all relay log; waiting for more updates | NULL |
| 29 | system user | | NULL | Killed | 191 | Applying batch of row changes (delete) | delete from mytest |
| 30 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
| 31 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
| 32 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
8 rows in set (0.00 sec)
这里我们将kill的worker简称为worker1,而其他空闲的worker线程称作其他worker。
二、简要分析
首先进行kill worker1命令发起过后,kill 线程会标记woker1线程的标记为killed,同时考虑唤醒当前worker1当前等待的条件变量,比如worker1线程无事可做的情况下就会等待在mysql_cond_wait(&worker->jobs_cond, &worker->jobs_lock) 这个条件变量下,是需要唤醒的,如果不做唤醒操作只是做killed标记则无法继续。当然这里worker1正在执行event,会识别到killed标记,将执行event的返回值设置为error。 接下来就需要考虑是否进行重试这个事务,但是在考虑重试之前必须要回滚掉已经执行过的事务,接下来才是考虑是否重试这个事务,有2种情况是不会再次重试这个事务的,判断处于Slave_worker::check_and_report_end_of_retries中
- A: 是否为临时错误,所谓的临时错误大部分都是锁冲突和死锁触发,可以参考函数Slave_reporting_capability::has_temporary_error
- B: 是否达到了重试的最大次数,这里和参数slave_transaction_retries有关
这里因为是kill worker1报错因此这里不会再重试事务了,回滚完成就继续下面的流程了。
接下来本worker1线程会标记sql线程(协调线程)同样是通过调用kill 命令的主要函数c_rli->info_thd->awake(THD::KILL_QUERY)完成,我们假设分发完了全部的event处于状态(Replica has read all relay log; waiting for more updates)等待状态下,这种状态下MTS的sql线程(协调线程)条件变量等待并不是一直等待而是有超时时间,默认为300毫秒醒来一次进行kill标记的判断,这个时间和参数slave_checkpoint_period有关,默认为300毫秒,sql线程(协调线程)醒来后判断是否设置了kill标记,参考sql_slave_killed函数。一旦 sql线程(协调线程)设置了kill标记后就不会循环的读取event和分发event,会从handle_slave_sql的主循环中退出来,如下 while (!main_loop_error && !sql_slave_killed(thd, rli)) 接下来会调用slave_stop_workers唤醒其他的worker线程也进行退出,退出采用标记其他worker线程为Slave_worker::STOP状态,测试来看这个状态并不会导致其他worker线程的事务回滚,而是SQL线程会等待其他worker线程将当前的事务执行完成,这和worker1的回滚的方式不同。遇到这种情况可能sql线程处于状态Waiting for workers to exit下,如下
sql
mysql> show processlist;
+----+-------------+-----------------+--------------------+---------+-------+----------------------------------------+-----------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+-------+----------------------------------------+-----------------------+
| 7 | root | localhost | information_schema | Query | 0 | init | show processlist |
| 14 | root | localhost | NULL | Sleep | 1045 | | NULL |
| 15 | system user | connecting host | NULL | Connect | 7086 | Waiting for source to send event | NULL |
| 26 | system user | | NULL | Query | 2048 | Waiting for workers to exit | NULL |
| 27 | system user | | test1010 | Query | 92433 | Applying batch of row changes (delete) | delete from mytestbig |
+----+-------------+-----------------+--------------------+---------+-------+----------------------------------------+-----------------------+
因此我们发现当我们kill掉一个worker线程的时候,他的kill标记会传递到所有的worker线程和SQL协调线程,触发退出操作。我们将这个流程简化为一个多个线程之间的交互图大概如下,
当然这里只是简单描述了可能得交互,但是结果一定是各个worker线程和sql线程(协调线程)都感知到了kill woker1的请求。 那么开头的问题大概可以理解为,
- 被kill的worker1的大事务会进行回滚操作
- kill worker1同时会影响所有的MTS线程,包括其他worker和sql线程(协调线程)
大事务kill测试
如果某个事务正在执行,比如事务,这个时候kill会进行rollback,测试如下,
sql
mysql> show processlist;
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 757 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 63 | | NULL |
| 28 | system user | | NULL | Query | 13 | Replica has read all relay log; waiting for more updates | NULL |
| 29 | system user | | mytest123 | Query | 172 | Applying batch of row changes (delete) | delete from mytest |
| 30 | system user | | NULL | Connect | 13 | Waiting for an event from Coordinator | NULL |
| 31 | system user | | NULL | Connect | 13 | Waiting for an event from Coordinator | NULL |
| 32 | system user | | NULL | Connect | 13 | Waiting for an event from Coordinator | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
8 rows in set (0.00 sec)
mysql> kill 29;select * from information_schema.innodb_trx ;show processlist;
Query OK, 0 rows affected (0.00 sec)
*************************** 1. row ***************************
trx_id: 193802
trx_state: ROLLING BACK
trx_started: 2025-10-09 10:40:26
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 46919
trx_mysql_thread_id: 29
trx_query: delete from mytest
trx_operation_state: rollback of SQL statement
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 112
trx_lock_memory_bytes: 25400
trx_rows_locked: 46809
trx_rows_modified: 46807
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0
trx_schedule_weight: NULL
1 row in set (0.00 sec)
ERROR:
No query specified
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 776 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 82 | | NULL |
| 28 | system user | | NULL | Query | 32 | Replica has read all relay log; waiting for more updates | NULL |
| 29 | system user | | NULL | Killed | 191 | Applying batch of row changes (delete) | delete from mytest |
| 30 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
| 31 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
| 32 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
8 rows in set (0.00 sec)
mysql> show processlist;
+----+-------------+-----------------+--------------------+---------+------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------+------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 877 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 183 | | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------+------------------+
3 rows in set (0.00 sec)
errr日志:
2025-10-09T10:40:58.696188+08:00 29 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Worker 1 failed executing transaction '0da7e8d8-b6a3-11ef-a048-000c29d3a738:25' at master log mysql-bin.000001, end_log_pos 3440856; Could not execute Delete_rows event on table mytest123.mytest; Query execution was interrupted, Error_code: 1317; Got error 168 - 'Unknown (generic) error from engine' from storage engine, Error_code: 1030; handler error HA_ERR_GENERIC; the event's master log FIRST, end_log_pos 3440856, Error_code: MY-001317
2025-10-09T10:41:01.482653+08:00 28 [Warning] [MY-010584] [Repl] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
2025-10-09T10:41:01.482702+08:00 28 [Note] [MY-010596] [Repl] Error reading relay log event for channel '': slave SQL thread was killed
其他
shell
MTS KILL WORKER线程,在performance_schema.replication_applier_status_by_worker中显示为错误
1、如果某个事务正在执行,比如事务,这个时候kill会进行rollback
5.7.22 整个SQL和worker退出
#0 lock_rec_lock (impl=false, mode=1027, block=0x7fffdd0f4dd0, heap_no=2, index=0x7fff50022040, thr=0x7fff50025068) at /opt/mysql-5.7.40/storage/innobase/lock/lock0lock.cc:2048
#1 0x000000000194b3f3 in lock_clust_rec_read_check_and_lock (flags=0, block=0x7fffdd0f4dd0, rec=0x7fffdd804081 "\200", index=0x7fff50022040, offsets=0x7fff9916f600, mode=LOCK_X, gap_mode=1024, thr=0x7fff50025068) at /opt/mysql-5.7.40/storage/innobase/lock/lock0lock.cc:6422
#2 0x0000000001a4709b in sel_set_rec_lock (pcur=0x7fff50024668, rec=0x7fffdd804081 "\200", index=0x7fff50022040, offsets=0x7fff9916f600, mode=3, type=1024, thr=0x7fff50025068, mtr=0x7fff9916f920) at /opt/mysql-5.7.40/storage/innobase/row/row0sel.cc:1262
#3 0x0000000001a50003 in row_search_mvcc (buf=0x7fff5001c580 "\370\001", mode=PAGE_CUR_GE, prebuilt=0x7fff50024450, match_mode=1, direction=0) at /opt/mysql-5.7.40/storage/innobase/row/row0sel.cc:5607
#4 0x00000000018be5da in ha_innobase::index_read (this=0x7fff5001c990, buf=0x7fff5001c580 "\370\001", key_ptr=0x7fff5001c910 "\001", key_len=4, find_flag=HA_READ_KEY_EXACT) at /opt/mysql-5.7.40/storage/innobase/handler/ha_innodb.cc:8809
#5 0x00000000018bf97d in ha_innobase::rnd_pos (this=0x7fff5001c990, buf=0x7fff5001c580 "\370\001", pos=0x7fff5001c910 "\001") at /opt/mysql-5.7.40/storage/innobase/handler/ha_innodb.cc:9361
#6 0x0000000000f17619 in handler::ha_rnd_pos (this=0x7fff5001c990, buf=0x7fff5001c580 "\370\001", pos=0x7fff5001c910 "\001") at /opt/mysql-5.7.40/sql/handler.cc:2997
#7 0x0000000000f26202 in handler::rnd_pos_by_record (this=0x7fff5001c990, record=0x7fff5001c580 "\370\001") at /opt/mysql-5.7.40/sql/handler.h:2912
#8 0x00000000017a832f in Rows_log_event::do_index_scan_and_update (this=0x7fff44163b80, rli=0x7fff44024da0) at /opt/mysql-5.7.40/sql/log_event.cc:10443
#9 0x00000000017aa75c in Rows_log_event::do_apply_event (this=0x7fff44163b80, rli=0x7fff44024da0) at /opt/mysql-5.7.40/sql/log_event.cc:11331
#10 0x00000000017b9940 in Log_event::do_apply_event_worker (this=0x7fff44163b80, w=0x7fff44024da0) at /opt/mysql-5.7.40/sql/log_event.cc:792
#11 0x000000000183337e in Slave_worker::slave_worker_exec_event (this=0x7fff44024da0, ev=0x7fff44163b80) at /opt/mysql-5.7.40/sql/rpl_rli_pdb.cc:1866
#12 0x00000000018356ba in slave_worker_exec_job_group (worker=0x7fff44024da0, rli=0x7078fa0) at /opt/mysql-5.7.40/sql/rpl_rli_pdb.cc:2705
#13 0x000000000180d932 in handle_slave_worker (arg=0x7fff44024da0) at /opt/mysql-5.7.40/sql/rpl_slave.cc:6281
#14 0x0000000001ce8ce0 in pfs_spawn_thread (arg=0x6885e40) at /opt/mysql-5.7.40/storage/perfschema/pfs.cc:2197
#15 0x00007ffff7bbb764 in start_thread (arg=<optimized out>) at pthread_create.c:477
#16 0x00007ffff60ade2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
5.7.22 整个SQL和worker退出
2025-10-09T09:40:09.508628+08:00 2 [ERROR] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2025-10-09T09:40:09.508655+08:00 2 [Note] Error reading relay log event for channel '': slave SQL thread was killed
8.0.23
mysql> show processlist;
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 757 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 63 | | NULL |
| 28 | system user | | NULL | Query | 13 | Replica has read all relay log; waiting for more updates | NULL |
| 29 | system user | | mytest123 | Query | 172 | Applying batch of row changes (delete) | delete from mytest |
| 30 | system user | | NULL | Connect | 13 | Waiting for an event from Coordinator | NULL |
| 31 | system user | | NULL | Connect | 13 | Waiting for an event from Coordinator | NULL |
| 32 | system user | | NULL | Connect | 13 | Waiting for an event from Coordinator | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
8 rows in set (0.00 sec)
mysql> kill 29;select * from information_schema.innodb_trx ;show processlist;
Query OK, 0 rows affected (0.00 sec)
*************************** 1. row ***************************
trx_id: 193802
trx_state: ROLLING BACK
trx_started: 2025-10-09 10:40:26
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 46919
trx_mysql_thread_id: 29
trx_query: delete from mytest
trx_operation_state: rollback of SQL statement
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 112
trx_lock_memory_bytes: 25400
trx_rows_locked: 46809
trx_rows_modified: 46807
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0
trx_schedule_weight: NULL
1 row in set (0.00 sec)
ERROR:
No query specified
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 776 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 82 | | NULL |
| 28 | system user | | NULL | Query | 32 | Replica has read all relay log; waiting for more updates | NULL |
| 29 | system user | | NULL | Killed | 191 | Applying batch of row changes (delete) | delete from mytest |
| 30 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
| 31 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
| 32 | system user | | NULL | Connect | 32 | Waiting for an event from Coordinator | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------------------------------+--------------------+
8 rows in set (0.00 sec)
mysql> show processlist;
+----+-------------+-----------------+--------------------+---------+------+----------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------+------------------+
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
| 11 | system user | connecting host | NULL | Connect | 877 | Waiting for source to send event | NULL |
| 17 | root | localhost | information_schema | Sleep | 183 | | NULL |
+----+-------------+-----------------+--------------------+---------+------+----------------------------------+------------------+
3 rows in set (0.00 sec)
mysql>
2025-10-09T10:40:58.696188+08:00 29 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Worker 1 failed executing transaction '0da7e8d8-b6a3-11ef-a048-000c29d3a738:25' at master log mysql-bin.000001, end_log_pos 3440856; Could not execute Delete_rows event on table mytest123.mytest; Query execution was interrupted, Error_code: 1317; Got error 168 - 'Unknown (generic) error from engine' from storage engine, Error_code: 1030; handler error HA_ERR_GENERIC; the event's master log FIRST, end_log_pos 3440856, Error_code: MY-001317
2025-10-09T10:41:01.482653+08:00 28 [Warning] [MY-010584] [Repl] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
2025-10-09T10:41:01.482702+08:00 28 [Note] [MY-010596] [Repl] Error reading relay log event for channel '': slave SQL thread was killed
2、如果当前处于FTWRL 下 wait for global read lock 下,这种情况不需要回滚
5.7.22 整个SQL和worker退出
2025-10-09T09:47:10.553043+08:00 13 [ERROR] Slave SQL for channel '': Worker 1 failed executing transaction 'eafffa47-bde1-11ef-98ed-000c2922ff1a:26' at master log mysql-bin.000017, end_log_pos 1222; Error executing row event: 'Query execution was interrupted', Error_code: 1317
2025-10-09T09:47:10.553464+08:00 12 [Warning] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2025-10-09T09:47:10.553486+08:00 12 [Note] Error reading relay log event for channel '': slave SQL thread was killed
如果杀掉FTWRL session 则正常
8.0.23 整个SQL和worker退出
2025-10-09T10:58:19.176795+08:00 46 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Worker 1 failed executing transaction '0da7e8d8-b6a3-11ef-a048-000c29d3a738:25' at master log mysql-bin.000001, end_log_pos 2923752; Error executing row event: 'Query execution was interrupted', Error_code: MY-001317
2025-10-09T10:58:19.177417+08:00 45 [Warning] [MY-010584] [Repl] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
2025-10-09T10:58:19.177516+08:00 45 [Note] [MY-010596] [Repl] Error reading relay log event for channel '': slave SQL thread was killed
3、如果kill 非慢事务的worker
协调线程进入Waiting for workers to exit
| 51 | system user | | NULL | Query | 30 | Waiting for workers to exit | NULL |
| 52 | system user | | NULL | Killed | 52 | Applying batch of row changes (delete) | delete from mytest |
+----+-------------+-----------------+--------------------+---------+-------+----------------------------------------+--------------------+
#0 innobase_rollback_trx (trx=0x7fffe601dbc8) at /opt/source8028/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:5811
#1 0x000000000492cd8d in innobase_close_connection (hton=0xa6b1d50, thd=0x7fff64006070) at /opt/source8028/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:6030
#2 0x000000000362ad97 in closecon_handlerton (thd=0x7fff64006070, plugin=0x7fff5c1ec178) at /opt/source8028/mysql-8.0.28/sql/handler.cc:920
#3 0x00000000032e1d5d in plugin_foreach_with_mask (thd=0x7fff64006070, funcs=0x7fff5c1ec210, type=1, state_mask=4294967287, arg=0x0) at /opt/source8028/mysql-8.0.28/sql/sql_plugin.cc:2691
#4 0x00000000032e1e1d in plugin_foreach_with_mask (thd=0x7fff64006070, func=0x362ad1c <closecon_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x0)
at /opt/source8028/mysql-8.0.28/sql/sql_plugin.cc:2704
#5 0x000000000362ade5 in ha_close_connection (thd=0x7fff64006070) at /opt/source8028/mysql-8.0.28/sql/handler.cc:932
#6 0x00000000031d9a5b in THD::release_resources (this=0x7fff64006070) at /opt/source8028/mysql-8.0.28/sql/sql_class.cc:1347
#7 0x00000000044ec76a in handle_slave_sql (arg=0xa6f0bf0) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:7208
#8 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff580130e0) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#9 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff63788dd in clone () from /lib64/libc.so.6
#0 innobase_rollback_trx (trx=0x7fffe601dfb8) at /opt/source8028/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:5811
#1 0x000000000492cd8d in innobase_close_connection (hton=0xa6b1d50, thd=0x7fff60000d90) at /opt/source8028/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:6030
#2 0x000000000362ad97 in closecon_handlerton (thd=0x7fff60000d90, plugin=0x7fff5c2ee9e8) at /opt/source8028/mysql-8.0.28/sql/handler.cc:920
#3 0x00000000032e1d5d in plugin_foreach_with_mask (thd=0x7fff60000d90, funcs=0x7fff5c2eea80, type=1, state_mask=4294967287, arg=0x0) at /opt/source8028/mysql-8.0.28/sql/sql_plugin.cc:2691
#4 0x00000000032e1e1d in plugin_foreach_with_mask (thd=0x7fff60000d90, func=0x362ad1c <closecon_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x0)
at /opt/source8028/mysql-8.0.28/sql/sql_plugin.cc:2704
#5 0x000000000362ade5 in ha_close_connection (thd=0x7fff60000d90) at /opt/source8028/mysql-8.0.28/sql/handler.cc:932
#6 0x00000000031d9a5b in THD::release_resources (this=0x7fff60000d90) at /opt/source8028/mysql-8.0.28/sql/sql_class.cc:1347
#7 0x00000000044e7d79 in handle_slave_worker (arg=0x7fff64021320) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:5970
#8 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff6409b760) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#9 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff63788dd in clone () from /lib64/libc.so.6
slave_worker_exec_job_group
->while (true)
每次读取一个event
->if (unlikely(thd->killed || worker->running_status == Slave_worker::STOP_ACCEPTED))
如果被标记为killed 或者 已经 触发了stop slave
error = -1;
goto err;
设置错误为-1 并且跳入到err中
->
当前worker处于空闲状态
handle_slave_worker 5894
->slave_worker_exec_job_group 2502
->job_item = pop_jobs_item(worker, job_item) 2405
读取一个分发的event
->job_item->data = nullptr
-> while (!job_item->data && !thd->killed &&
(worker->running_status == Slave_worker::RUNNING ||
worker->running_status == Slave_worker::STOP))
A:job_item->data为null没有分发event B:thd->killed worker没有被KILL掉
C: WOKER状态处于RUNNING状态 (或者是stop状态)
->if (set_max_updated_index_on_stop(worker, job_item))
??
->if (job_item->data == nullptr)
->worker->wq_empty_waits++;
等待次数增加
->进入状态stage_replica_waiting_event_from_coordinator
->mysql_cond_wait(&worker->jobs_cond, &worker->jobs_lock);
处于等待状态,这里如果kill 需要做唤醒操作,由kill线程进行唤醒
->return job_item
返回空的job_item
->while (true)
每次读取一个event
-> if (unlikely(thd->killed ||worker->running_status == Slave_worker::STOP_ACCEPTED))
A 如果被标记为killed
B 已经 触发了stop slave
->error = -1;
->goto err
-> Slave_worker::retry_transaction
先进行事务回滚
->Slave_worker::check_and_report_end_of_retries
是否进行重试,参数slave_transaction_retries在这里生效,
如果为kill worker线程这里会直接返回true
...
err:
-> if (error)
->report_error_to_coordinator(worker)
??
->worker->slave_worker_ends_group(ev, error);
-> if (!error)
如果没有错误
...
->else
如果有错误
->running_status = ERROR_LEAVING;
设置错误标记
->Commit_order_manager::wait_and_finish(info_thd, true)
等待自己提交序列到来 设置整体回滚标记 唤醒其他的worker
-> c_rli->info_thd->awake(THD::KILL_QUERY);
对协调线程进行kill
->if (current_mts_submode->get_type() == MTS_PARALLEL_TYPE_DB_NAME)
db并发 不考虑
...
-> else
not DB-type scheduler logic_clock并发
...
-> if (unlikely(error))
如果出现错误
->mysql_cond_signal(&c_rli->logical_clock_cond);
协调线程唤醒 ?? 前面已经进行了awake
-> curr_group_seen_gtid = false
->return error
返回错误
Waiting for preceding transaction to commit
由kill线程唤醒
(gdb) bt
#0 0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000047fda7d in native_cond_wait (cond=0x7fff60099928, mutex=0x7fff60022738) at /opt/source8028/mysql-8.0.28/include/thr_cond.h:108
#2 0x00000000047fdbe8 in safe_cond_wait (cond=0x7fff60099928, mp=0x7fff60022710, file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", line=2405)
at /opt/source8028/mysql-8.0.28/mysys/thr_cond.cc:71
#3 0x00000000044c616f in my_cond_wait (cond=0x7fff60099928, mp=0x7fff600998f8, file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", line=2405)
at /opt/source8028/mysql-8.0.28/include/thr_cond.h:159
#4 0x00000000044c642c in inline_mysql_cond_wait (that=0x7fff60099928, mutex=0x7fff600998f8, src_file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", src_line=2405)
at /opt/source8028/mysql-8.0.28/include/mysql/psi/mysql_cond.h:180
#5 0x00000000044cde05 in pop_jobs_item (worker=0x7fff60096cb0, job_item=0x7fff5c3caa80) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:2405
#6 0x00000000044ce0e5 in slave_worker_exec_job_group (worker=0x7fff60096cb0, rli=0xa964540) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:2502
#7 0x00000000044e7a09 in handle_slave_worker (arg=0x7fff60096cb0) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:5894
#8 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff6009a310) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#9 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff63788dd in clone () from /lib64/libc.so.6
等待
gdb) thread 81
[Switching to thread 81 (Thread 0x7fff5c3cb700 (LWP 8163))]
#0 0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff7bcaa35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000047fda7d in native_cond_wait (cond=0x7fff60099928, mutex=0x7fff60022738) at /opt/source8028/mysql-8.0.28/include/thr_cond.h:108
#2 0x00000000047fdbe8 in safe_cond_wait (cond=0x7fff60099928, mp=0x7fff60022710, file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", line=2405)
at /opt/source8028/mysql-8.0.28/mysys/thr_cond.cc:71
#3 0x00000000044c616f in my_cond_wait (cond=0x7fff60099928, mp=0x7fff600998f8, file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", line=2405)
at /opt/source8028/mysql-8.0.28/include/thr_cond.h:159
#4 0x00000000044c642c in inline_mysql_cond_wait (that=0x7fff60099928, mutex=0x7fff600998f8, src_file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", src_line=2405)
at /opt/source8028/mysql-8.0.28/include/mysql/psi/mysql_cond.h:180
#5 0x00000000044cde05 in pop_jobs_item (worker=0x7fff60096cb0, job_item=0x7fff5c3caa80) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:2405
#6 0x00000000044ce0e5 in slave_worker_exec_job_group (worker=0x7fff60096cb0, rli=0xa964540) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:2502
#7 0x00000000044e7a09 in handle_slave_worker (arg=0x7fff60096cb0) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:5894
#8 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff6009a310) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#9 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff63788dd in clone () from /lib64/libc.so.6
条件变量都是cond=0x7fff60099928,因此由kill线程唤醒,具体如何找到cond需要再看
SQL线程终止worker
#0 slave_stop_workers (rli=0xa964540, mts_inited=0x7fff5c6d0677) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:6558
#1 0x00000000044ec1a8 in handle_slave_sql (arg=0xa7830e0) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:7121
#2 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff5803c460) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#3 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#4 0x00007ffff63788dd in clone () from /lib64/libc.so.6
回滚
#0 unlikely (expr=false) at /opt/source8028/mysql-8.0.28/include/my_compiler.h:55
#1 0x0000000004af68c7 in rec_offs_n_fields (offsets=0x7fff5c3ec920) at /opt/source8028/mysql-8.0.28/storage/innobase/rem/rec.h:445
#2 0x0000000004af6e34 in rec_offs_nth_extern (offsets=0x7fff5c3ec920, n=0) at /opt/source8028/mysql-8.0.28/storage/innobase/include/rem0rec.ic:654
#3 0x0000000004af86e6 in cmp_dtuple_rec_with_match_bytes (dtuple=0x7fff88038078, rec=0x7fff9e73ac1e "", index=0x7fff6ca2dd98, offsets=0x7fff5c3ec920, matched_fields=0x7fff5c3ecc50,
matched_bytes=0x7fff5c3ecc48) at /opt/source8028/mysql-8.0.28/storage/innobase/rem/rem0cmp.cc:772
#4 0x0000000004ac8c64 in page_cur_search_with_match_bytes (block=0x7fff9d969c78, index=0x7fff6ca2dd98, tuple=0x7fff88038078, mode=PAGE_CUR_LE, iup_matched_fields=0x7fff5c3edae0,
iup_matched_bytes=0x7fff5c3edad8, ilow_matched_fields=0x7fff5c3edad0, ilow_matched_bytes=0x7fff5c3edac8, cursor=0x7fff88037a80)
at /opt/source8028/mysql-8.0.28/storage/innobase/page/page0cur.cc:710
#5 0x0000000004ccdf1e in btr_cur_search_to_nth_level (index=0x7fff6ca2dd98, level=0, tuple=0x7fff88038078, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x7fff88037a78, has_search_latch=0,
file=0x6a63f10 "/opt/source8028/mysql-8.0.28/storage/innobase/row/row0row.cc", line=899, mtr=0x7fff5c3ee200) at /opt/source8028/mysql-8.0.28/storage/innobase/btr/btr0cur.cc:1246
#6 0x00000000049f70bb in btr_pcur_t::open (this=0x7fff88037a78, index=0x7fff6ca2dd98, level=0, tuple=0x7fff88038078, mode=PAGE_CUR_LE, latch_mode=2, mtr=0x7fff5c3ee200,
file=0x6a63f10 "/opt/source8028/mysql-8.0.28/storage/innobase/row/row0row.cc", line=899) at /opt/source8028/mysql-8.0.28/storage/innobase/include/btr0pcur.h:646
#7 0x0000000004b792c8 in row_search_on_row_ref (pcur=0x7fff88037a78, mode=2, table=0x7fff6c0bbda8, ref=0x7fff88038078, mtr=0x7fff5c3ee200)
at /opt/source8028/mysql-8.0.28/storage/innobase/row/row0row.cc:899
#8 0x0000000004b94d64 in row_undo_search_clust_to_pcur (node=0x7fff88037a08) at /opt/source8028/mysql-8.0.28/storage/innobase/row/row0undo.cc:178
#9 0x0000000004f2cdd6 in row_undo_mod_parse_undo_rec (node=0x7fff88037a08, thd=0x7fff8801d0f0, mdl=0x7fff5c3ee7f8) at /opt/source8028/mysql-8.0.28/storage/innobase/row/row0umod.cc:1246
#10 0x0000000004f2d03b in row_undo_mod (node=0x7fff88037a08, thr=0x7fff88030208) at /opt/source8028/mysql-8.0.28/storage/innobase/row/row0umod.cc:1278
#11 0x0000000004b952e1 in row_undo (node=0x7fff88037a08, thr=0x7fff88030208) at /opt/source8028/mysql-8.0.28/storage/innobase/row/row0undo.cc:300
#12 0x0000000004b95528 in row_undo_step (thr=0x7fff88030208) at /opt/source8028/mysql-8.0.28/storage/innobase/row/row0undo.cc:362
#13 0x0000000004af1631 in que_thr_step (thr=0x7fff88030208) at /opt/source8028/mysql-8.0.28/storage/innobase/que/que0que.cc:909
#14 0x0000000004af1805 in que_run_threads_low (thr=0x7fff88030208) at /opt/source8028/mysql-8.0.28/storage/innobase/que/que0que.cc:962
#15 0x0000000004af1a52 in que_run_threads (thr=0x7fff88030208) at /opt/source8028/mysql-8.0.28/storage/innobase/que/que0que.cc:997
#16 0x0000000004c2973f in trx_rollback_to_savepoint_low (trx=0x7fffe601d7d8, savept=0x7fffe601dac8) at /opt/source8028/mysql-8.0.28/storage/innobase/trx/trx0roll.cc:114
#17 0x0000000004c298b5 in trx_rollback_to_savepoint (trx=0x7fffe601d7d8, savept=0x7fffe601dac8) at /opt/source8028/mysql-8.0.28/storage/innobase/trx/trx0roll.cc:151
#18 0x0000000004c2a31c in trx_rollback_last_sql_stat_for_mysql (trx=0x7fffe601d7d8) at /opt/source8028/mysql-8.0.28/storage/innobase/trx/trx0roll.cc:312
#19 0x000000000492c292 in innobase_rollback (hton=0xa6b1010, thd=0x7fff8801d0f0, rollback_trx=false) at /opt/source8028/mysql-8.0.28/storage/innobase/handler/ha_innodb.cc:5801
#20 0x000000000362cba2 in ha_rollback_low (thd=0x7fff8801d0f0, all=false) at /opt/source8028/mysql-8.0.28/sql/handler.cc:2020
#21 0x00000000043ad300 in MYSQL_BIN_LOG::rollback (this=0x7fd4460 <mysql_bin_log>, thd=0x7fff8801d0f0, all=false) at /opt/source8028/mysql-8.0.28/sql/binlog.cc:2488
#22 0x000000000362ce6b in ha_rollback_trans (thd=0x7fff8801d0f0, all=false) at /opt/source8028/mysql-8.0.28/sql/handler.cc:2100
#23 0x0000000003444df0 in trans_rollback_stmt (thd=0x7fff8801d0f0) at /opt/source8028/mysql-8.0.28/sql/transaction.cc:578
#24 0x00000000044bbb03 in Relay_log_info::cleanup_context (this=0x7fff7c17d930, thd=0x7fff8801d0f0, error=true) at /opt/source8028/mysql-8.0.28/sql/rpl_rli.cc:1292
#25 0x00000000044cbcdb in operator() (__closure=0x7fff88028330) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:1890
#26 0x00000000044cef20 in std::__invoke_impl<void, Slave_worker::retry_transaction(uint, my_off_t, uint, my_off_t)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:60
#27 0x00000000044cee1e in std::__invoke_r<void, Slave_worker::retry_transaction(uint, my_off_t, uint, my_off_t)::<lambda()>&>(struct {...} &) (__fn=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:110
#28 0x00000000044ced02 in std::_Function_handler<void(), Slave_worker::retry_transaction(uint, my_off_t, uint, my_off_t)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/std_function.h:291
#29 0x000000000446e014 in std::function<void ()>::operator()() const (this=0x7fff5c3f08c8) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/std_function.h:622
#30 0x000000000446d6cc in raii::Sentry<std::function<void ()> >::~Sentry() (this=0x7fff5c3f08c0, __in_chrg=<optimized out>) at /opt/source8028/mysql-8.0.28/sql/raii/sentry.h:63
#31 0x00000000044cc2aa in Slave_worker::retry_transaction (this=0x7fff7c17d930, start_relay_number=23, start_relay_pos=11774459, end_relay_number=23, end_relay_pos=12111264)
at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:1897
#32 0x00000000044ce45d in slave_worker_exec_job_group (worker=0x7fff7c17d930, rli=0xa963720) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:2556
#33 0x00000000044e7a09 in handle_slave_worker (arg=0x7fff7c17d930) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:5894
#34 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff7c183a40) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#35 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#36 0x00007ffff63788dd in clone () from /lib64/libc.so.6
kill线程 worker1 sql线程(协调线程)
---- ---- ----
| | |
awake标记 | |
worker1 kill | |
同时可能需要唤醒 ---> | |
| |
| |
执行event报错 |
| |
| |
进行retry前回滚事务 |
| |
| |
由于出现报错不进行retry |
| |
| |
awake标记协调线程为killed ------>每slave_checkpoint_period
| 周期醒来判断是否设置了
| kill标记,如果设置了
退出 |
|
唤醒其他的worker线程 -------->通知其他worker线程退出
也进行退出设置标记
Slave_worker::STOP
|
|
等待所有worker线程
执行或者回滚完成
状态为
Waiting for workers to exit
如果kill 掉 MTS的worker线程会发生什么
这里我们看2个场景
1、存在某个大事物正在执行,对这个woker线程进行kill,这种可能是最多的场景
2、不存在任何大事物执行,worker处于空闲状态上
当然这里没有考虑更多的场景,比如
首先进行kill worker1命令发起过后,kill 线程会标记woker1线程的标记为killed,同时考虑唤醒当前worker1当前等待的条件变量,比如worker1线程无事可做的情况下就会等待
在mysql_cond_wait(&worker->jobs_cond, &worker->jobs_lock) 这个条件变量下,是需要唤醒的,如果不做唤醒操作只是做killed标记则无法继续。
如果这个时候worker1正在执行event,会识别到killed标记,将执行状态设置为error,接下来就需要考虑是否进行重试这个事务,但是在考虑重试之前必须要回滚掉已经执行过
的事务,接下来考虑是否重试这个事务,有2种情况是不会再次重试这个事务的,判断处于 Slave_worker::check_and_report_end_of_retries中
A: 是否为临时错误,所谓的临时错误大部分都是锁冲突和死锁触发,可以参考函数Slave_reporting_capability::has_temporary_error
B: 是否达到了重试的最大次数,这里和参数 有关
这里因为是kill worker1报错因此这里不会再重试事务了,回滚完成就继续下面的流程了。
接下来本worker1线程会标记sql线程(协调线程)同样是通过调用kill 命令的主要函数c_rli->info_thd->awake(THD::KILL_QUERY)完成,我们假设分发完了全部的event处于状态
Replica has read all relay log; waiting for more updates 等待状态下,这种状态下MTS的sql线程(协调线程)条件变量等待并不是一直等待而是有超时时间,默认为300毫秒醒来
一次进行kill标记的判断,这个和参数slave_checkpoint_period有关,sql线程(协调线程)醒来后判断是否设置了kill标记,参考sql_slave_killed函数。如果设置了kill标记则,一旦
sql线程(协调线程)设置了kill标记后就不会循环的读取event和分发event,会从handle_slave_sql的主循环中退出来,如下
while (!main_loop_error && !sql_slave_killed(thd, rli))
shell
接着调用slave_stop_workers唤醒其他的worker线程也进行退出,退出采用标记其他worker线程为Slave_worker::STOP状态,测试来看这个状态并不会导致回滚,而是其他worker线程
会将当前的事务执行完成,这和worker1的回滚的方式不同。遇到这种情况可能sql线程处于状态Waiting for workers to exit下,如下
| 51 | system user | | NULL | Query | 30 | Waiting for workers to exit | NULL |
| 52 | system user | | NULL | Killed | 52 | Applying batch of row changes (delete) | delete from mytest |
+----+-------------+-----------------+--------------------+---------+-------+----------------------------------------+--------------------+
因此我们发现当我们kill掉一个worker线程的时候,他的kill标记会传递到所有的worker线程和SQL协调线程,触发退出操作。
#0 0x00000000031037c0 in pthread_cond_signal@plt ()
#1 0x00000000044c60fd in native_cond_signal (cond=0x7fff60187648) at /opt/source8028/mysql-8.0.28/include/thr_cond.h:117
#2 0x00000000044c6594 in inline_mysql_cond_signal (that=0x7fff60187648, src_file=0x633ecc8 "/opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc", src_line=143)
at /opt/source8028/mysql-8.0.28/include/mysql/psi/mysql_cond.h:264
#3 0x00000000044c6793 in handle_slave_worker_stop (worker=0x7fff601849d0, job_item=0x7fff5c7f81b0) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:143
#4 0x00000000044c68b1 in set_max_updated_index_on_stop (worker=0x7fff601849d0, job_item=0x7fff5c7f81b0) at /opt/source8028/mysql-8.0.28/sql/rpl_rli_pdb.cc:172
#5 0x00000000044e9fec in slave_stop_workers (rli=0xa963720, mts_inited=0x7fff5c7f8677) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:6587
#6 0x00000000044ec1a8 in handle_slave_sql (arg=0xa6efe50) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:7121
#7 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff58035c50) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#8 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007ffff63788dd in clone () from /lib64/libc.so.6
#0 Rpl_applier_reader::wait_for_new_event (this=0x7fff5c7f82d0) at /opt/source8028/mysql-8.0.28/sql/rpl_applier_reader.cc:288
#1 0x000000000450a2e3 in Rpl_applier_reader::read_next_event (this=0x7fff5c7f82d0) at /opt/source8028/mysql-8.0.28/sql/rpl_applier_reader.cc:225
#2 0x00000000044ebfc1 in handle_slave_sql (arg=0xa6efe50) at /opt/source8028/mysql-8.0.28/sql/rpl_replica.cc:7073
#3 0x000000000504dc17 in pfs_spawn_thread (arg=0x7fff58035c50) at /opt/source8028/mysql-8.0.28/storage/perfschema/pfs.cc:2947
#4 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007ffff63788dd in clone () from /lib64/libc.so.6
slave_stop_workers
->