加速无索引表引起的主从延迟数据回放

加速无索引表引起的主从延迟数据回放

一、场景

由于某些原因,客户现场存在一张 8千万 的大表,而且该表上无任何索引(也无主键),平时该表上 UPDATEDELETE 只操作几条数据。忽然有一天业务进行了某种操作,DELETE 2万 条数据,悲剧发生了,当在主库上执行了之后,传到从库上之后一直回放,当时评估了下可能会回放10天,后来在经过业务同意之后,对表进行操作,用于加速回放日志,处理该问题。

二、处理思路

停掉复制线程,关闭 Binlog 日志记录,添加索引,重启复制线程。

三、复现步骤

1、准备数据:

创建库:

SQL 复制代码
greatsql> CREATE DATABASE qj;
Query OK, 1 row affected (0.01 sec)

准备数据,准备一张8千万数据的表:

SQL 复制代码
$ sysbench --db-driver=mysql --mysql-host=192.168.139.230 --mysql-port=3307 --mysql-user=greatsql --mysql-password=GreatSQL@2024 --mysql-db=qj --table_size=80000000 --tables=1 oltp_write_only  prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)

Creating table 'sbtest1'...
Inserting 80000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...

主节点修改表结构,将表的索引去掉

SQL 复制代码
greatsql> USE qj
Database changed
greatsqll> ALTER TABLE sbtest1 MODIFY id int not null;

Query OK, 80000000 rows affected (36 min 29.13 sec)
Records: 80000000  Duplicates: 0  Warnings: 0

greatsql> ALTER TABLE sbtest1 DROP key `k_1`;
Query OK, 0 rows affected (0.96 sec)
Records: 0  Duplicates: 0  Warnings: 0

greatsql> ALTER TABLE sbtest1 DROP PRIMARY KEY;

Query OK, 80000000 rows affected (14 min 29.31 sec)
Records: 80000000  Duplicates: 0  Warnings: 0

2、模拟延迟:

01、主节点:

模拟删除数据

SQL 复制代码
greatsql> DELETE FROM qj.sbtest1 WHERE  k<20000;
Query OK, 19894 rows affected (4 min 20.40 sec)

02、从节点:

等待时间超过DELETE语句的时间,观察延迟情况,确认从节点可读可写

SQL 复制代码
greatsql> SELECT sleep(300);
+------------+
| sleep(300) |
+------------+
|          0 |
+------------+
1 row in set (5 min 0.01 sec)

greatsql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.17.139.230
                  Master_User: greatsql
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: binlog.000078
          Read_Master_Log_Pos: 953750186
               Relay_Log_File: relaylog.000231
                Relay_Log_Pos: 949954084
        Relay_Master_Log_File: binlog.000078
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 949953874
              Relay_Log_Space: 953750683
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 1345
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1000403307
                  Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
            Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163234
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

greatsql> SHOW GLOBAL  VARIABLES LIKE 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only     | OFF   |
+---------------+-------+
1 row in set (0.01 sec)

03、通过操作从节点加速日志回放

找到回放线程,KILL掉回放线程,关闭当前会话binlog日志记录,添加索引,重新启动复制回放线程

SQL 复制代码
greatsql> SHOW PROCESSLIST;
+--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+
| Id     | User            | Host                 | db   | Command | Time    | State                                                    | Info             | Time_ms    | Rows_sent | Rows_examined |
+--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+
|      5 | event_scheduler | localhost            | NULL | Daemon  | 2369454 | Waiting on empty queue                                   | NULL             | 2369454039 |         0 |             0 |
|     17 | system user     | connecting host      | NULL | Connect | 2369447 | Waiting for source to send event                         | NULL             | 2369446555 |         0 |             0 |
| 543448 | greatsql         | 172.17.136.93:48298  | NULL | Sleep   |     110 |                                                          | NULL             |     109832 |         0 |             0 |
| 543500 | greatsql         | 172.17.139.230:35198 | NULL | Sleep   |     586 |                                                          | NULL             |     585885 |         0 |             0 |
| 543588 | greatsql         | 172.17.136.93:57948  | NULL | Sleep   |      50 |                                                          | NULL             |      49878 |         0 |             0 |
| 543663 | greatsql         | 172.17.139.230:35726 | NULL | Query   |       0 | init                                                     | show processlist |          0 |         0 |             0 |
| 543666 | greatsql         | 172.17.136.93:58908  | NULL | Sleep   |     290 |                                                          | NULL             |     289861 |         0 |             0 |
| 543708 | system user     |                      | NULL | Query   |     477 | Replica has read all relay log; waiting for more updates | NULL             |     476520 |         0 |             0 |
| 543709 | system user     |                      | qj   | Query   |    1383 | Applying batch of row changes (delete)                   | NULL             |      26102 |         0 |             0 |
| 543710 | system user     |                      | NULL | Connect |     477 | Waiting for an event from Coordinator                    | NULL             |     476570 |         0 |             0 |
| 543711 | system user     |                      | NULL | Connect |     477 | Waiting for an event from Coordinator                    | NULL             |     476569 |         0 |             0 |
| 543712 | system user     |                      | NULL | Connect |     477 | Waiting for an event from Coordinator                    | NULL             |     476568 |         0 |             0 |
| 543714 | greatsql         | 172.17.136.93:33582  | NULL | Sleep   |     470 |                                                          | NULL             |     469798 |         0 |             0 |
+--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+
13 rows in set (0.00 sec)

greatsql> KILL 543709;
Query OK, 0 rows affected (0.00 sec)

greatsql> SHOW PROCESSLIST;
+--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+
| Id     | User            | Host                 | db   | Command | Time    | State                            | Info             | Time_ms    | Rows_sent | Rows_examined |
+--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+
|      5 | event_scheduler | localhost            | NULL | Daemon  | 2369473 | Waiting on empty queue           | NULL             | 2369473241 |         0 |             0 |
|     17 | system user     | connecting host      | NULL | Connect | 2369466 | Waiting for source to send event | NULL             | 2369465757 |         0 |             0 |
| 543448 | greatsql         | 172.17.136.93:48298  | NULL | Sleep   |     129 |                                  | NULL             |     129034 |         0 |             0 |
| 543500 | greatsql         | 172.17.139.230:35198 | NULL | Sleep   |     605 |                                  | NULL             |     605087 |         0 |             0 |
| 543588 | greatsql         | 172.17.136.93:57948  | NULL | Sleep   |       9 |                                  | NULL             |       9132 |         0 |             0 |
| 543663 | greatsql         | 172.17.139.230:35726 | NULL | Query   |       0 | init                             | show processlist |          0 |         0 |             0 |
| 543666 | greatsql         | 172.17.136.93:58908  | NULL | Sleep   |     309 |                                  | NULL             |     309064 |         0 |             0 |
| 543714 | greatsql         | 172.17.136.93:33582  | NULL | Sleep   |     489 |                                  | NULL             |     489000 |         0 |             0 |
+--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+
8 rows in set (0.00 sec)

greatsql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.17.139.230
                  Master_User: greatsql
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: binlog.000078
          Read_Master_Log_Pos: 953750186
               Relay_Log_File: relaylog.000231
                Relay_Log_Pos: 949954084
        Relay_Master_Log_File: binlog.000078
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1317
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d0963e9-85d9-11ef-80e1-00163e28e06a:163235' at master log binlog.000078, end_log_pos 950241265. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 949953874
              Relay_Log_Space: 953750683
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1317
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d0963e9-85d9-11ef-80e1-00163e28e06a:163235' at master log binlog.000078, end_log_pos 950241265. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1000403307
                  Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 241105 19:06:59
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
            Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163234
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

greatsql> SET sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

greatsql> ALTER TABLE qj.sbtest1 ADD index tmp_k(k);
Query OK, 0 rows affected (4 min 51.92 sec)
Records: 0  Duplicates: 0  Warnings: 0

greatsql> START SLAVE;
Query OK, 0 rows affected, 1 warning (0.02 sec)
greatsql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.17.139.230
                  Master_User: greatsql
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: binlog.000078
          Read_Master_Log_Pos: 953750186
               Relay_Log_File: relaylog.000231
                Relay_Log_Pos: 953750396
        Relay_Master_Log_File: binlog.000078
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 953750186
              Relay_Log_Space: 953750683
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1000403307
                  Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
            Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)

四、思考

01、如何保持主从一致性

02、正在回放数据时,STOP SLAVE 是否会有问题

03、如果主从结构作为数据节点,上层还有计算节点该如何保持元数据的一致性即计算节点记录的表的结构和数据节点的表结构要一致

关于以上三个问题我的处理方案如下:

1、如何保持主从一致性

SQL 复制代码
#为保持主从表结构一致,主节点添加索引
greatsql> SET sql_log_bin=0;
greatsql> ALTER TABLE qj.sbtest1 ADD index tmp_k(k);

2、正在回放数据时,STOP SLAVE是否会有问题

SQL 复制代码
greatsql> STOP SLAVE --当该GTID回放很长时间时,stop可能会很长时间,我们可以采用 kill 线程方式结束回放
greatsql> SHOW PROCESSLIST; --找到会话ID
greatsql> KILL 会话号
greatsql> SHOW SLAVE STATUS  \G 
-- 添加索引即可

3、如果主从结构作为数据节点,上层还有计算节点该如何保持元数据的一致性即计算节点记录的表的结构和数据节点的表结构要一致

SQL 复制代码
-- 当从节点添加上索引之后,在计算节点即代理层给表添加索引,当然该索引名不要与临时添加索引名重复
-- 添加上索引之后,在将从节点上临时添加的索引删除即可
greatsql> STOP SLAVE;
greatsql> SHOW SLAVE STATUS \G 

greatsql> SET sql_log_bin=0;
greatsql> ALTER table qj.sbtest1 DROP index tmp_k;

greatsql> START SLAVE;
greatsql> SHOW SLAVE STATUS \G 

五、总结

加速无索引表的回放主要是针对该表上进行 DELETEUPDATE 操作时有效,而且我们操作的时候要注意是否能记录binlog日志,保证主从数据一致性,当从节点作为数据节点即存储节点的角色时,要关注计算节点即代理层元数据与存储节点元数据的一致性或者是符合计算节点关于该表的定义规则。

Enjoy GreatSQL :)

关于 GreatSQL

GreatSQL是适用于金融级应用的国内自主开源数据库,具备高性能、高可靠、高易用性、高安全等多个核心特性,可以作为MySQL或Percona Server的可选替换,用于线上生产环境,且完全免费并兼容MySQL或Percona Server。

相关链接: GreatSQL社区 Gitee GitHub Bilibili

GreatSQL社区:

社区有奖建议反馈: greatsql.cn/thread-54-1...

社区博客有奖征稿详情: greatsql.cn/thread-100-...

(对文章有疑问或者有独到见解都可以去社区官网提出或分享哦~)

技术交流群:

微信&QQ群:

QQ群:533341697

微信群:添加GreatSQL社区助手(微信号:wanlidbc )好友,待社区助手拉您进群。

相关推荐
潇湘秦41 分钟前
Oracle CDB自动处理表空间不足脚本
数据库·oracle
梓沂1 小时前
Oracle中与 NLS(National Language Support,国家语言支持) 相关的参数
数据库·oracle
angen20181 小时前
mysql 存储过程和自定义函数 详解
数据库·mysql
m0_748249542 小时前
DRGDIP 2.0时代下基于PostgreSQL的成本管理实践与探索(上)
数据库·postgresql·区块链
q567315233 小时前
无法在Django 1.6中导入自定义应用
android·开发语言·数据库·django·sqlite
茂桑3 小时前
Redis的数据过期策略和数据淘汰策略
java·数据库·redis
dxt_snow6 小时前
Centos7系统安装redis
数据库·redis·缓存
0wioiw07 小时前
Python基础(SQLAlchemy)
java·开发语言·数据库
CodeJourney.8 小时前
DeepSeek 关联 Word 使用教程:解锁办公新效率
数据库·人工智能·算法
ChinaRainbowSea8 小时前
十四. Redis 新功能
java·数据库·redis·缓存·bootstrap