利用trigger对大表在线同步 UDI

Applies to:

MySQL Server - Version 8.0 and later

Information in this document applies to any platform.

Goal

Modify the datatype of a column in a large table without extended app downtime.

Solution

Modifying the datatype of a column in a MySQL table can not be done as an online operation. The table has to be completely rebuilt. For large tables, this can result in extended application downtime.

Refer to the documentation on Online DDL.

As a workaround, it's possible to create a table that will exist in parallel with the existing table, modify the column datatype, load it with the existing table's data, and keep it up to date with the existing table via triggers. Then, a brief app downtime can be taken to rename the tables so that the new table replaces the old table.

Here is a working example from the standard MySQL "employees" test database:

Given the following table definition:

Create Table: CREATE TABLE 'employees' (

'emp_no' int NOT NULL,

'birth_date' date NOT NULL,

'first_name' varchar(14) NOT NULL,

'last_name' varchar(16) NOT NULL,

'gender' enum('M','F') NOT NULL,

'hire_date' date NOT NULL,

PRIMARY KEY ('emp_no')

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

It's easy to create a new, empty table just like it, but with emp_no as BIGINT vs. INT, like this:

mysql> CREATE TABLE IF NOT EXISTS employees_new LIKE employees;

Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> ALTER TABLE employees_new MODIFY COLUMN emp_no BIGINT;

Query OK, 0 rows affected (0.01 sec)

Records: 0 Duplicates: 0 Warnings: 0

Now we have a new table, employees_new, which is empty, but has the same definition as employees, but with BIGINT as the datatype for emp_no.

Note that any foreign key restraints in the original table definition will have to be manually added to the new table, because foreign key restraints are not automatically created when the new table is create with CREATE TABLE.... LIKE.

Triggers will be needed on INSERT, UPDATE, and DELETE to keep the new table in sync with the original table. For example, an AFTER INSERT trigger would look something like this:

DROP TRIGGER IF EXISTS empnew_insert;

DELIMITER //

CREATE TRIGGER empnew_insert

AFTER INSERT ON employees

FOR EACH ROW BEGIN

INSERT INTO

employees_new

SELECT * FROM employees WHERE emp_no = NEW.emp_no;

END;

DELIMITER ;

Similar triggers would be created for UPDATE and DELETE operations:

DROP TRIGGER IF EXISTS empnew_delete;

DELIMITER //

CREATE TRIGGER empnew_delete

AFTER DELETE ON employees

FOR EACH ROW BEGIN

DELETE FROM

employees_new

WHERE emp_no = OLD.emp_no;

END;

DELIMITER ;

DROP TRIGGER IF EXISTS empnew_update;

DELIMITER //

CREATE TRIGGER empnew_update

AFTER UPDATE ON employees

FOR EACH ROW BEGIN

DELETE FROM employees_new

WHERE emp_no = OLD.emp_no;

INSERT INTO

employees_new

SELECT * FROM employees WHERE emp_no = NEW.emp_no;

END;

DELIMITER ;

To initially load all the data from the original table without creating a huge transaction -- which is very important -- a stored procedure would be created to insert rows in groups into employees_new from employees, for rows that exist in employees, but are not yet loaded into employees_new, like this:

mysql> DELIMITER //

mysql> DROP PROCEDURE IF EXISTS copy_emp;

-> CREATE PROCEDURE copy_emp()

-> BEGIN

-> REPEAT
-> INSERT INTO employees_new SELECT employees.*
-> FROM employees LEFT JOIN employees_new
-> USING (emp_no) ---这个join 如果大表是不是有性能问题？ NL 的话数据量大，hash的话每次要全表扫描
-> WHERE employees_new.emp_no IS NULL LIMIT 10000;

-> UNTIL ROW_COUNT() = 0

-> END REPEAT;

-> END //

Query OK, 0 rows affected (0.02 sec)

In this example stored procedure, the table is loaded 10,000 rows at a time, and repeats this over and over until there are no more rows to add. The number of rows to be loaded on each iteration can be modified, but should be kept small enough that huge transactions are not created.

Here's an example of calling the stored procedure:

mysql> select count(*) from employees;

+----------+

| count(*) |

+----------+

| 300024 |

+----------+

1 row in set (0.02 sec)

mysql> select count(*) from employees_new;

+----------+

| count(*) |

+----------+

| 0 |

+----------+

1 row in set (0.00 sec)

mysql> call copy_emp();

Query OK, 0 rows affected (14.30 sec)

mysql> select count(*) from employees_new;

+----------+

| count(*) |

+----------+

| 300024 |

+----------+

All of this work to this point can be done without app downtime, though it will create some workload in terms of copying the data and writing the binary logs, etc.

Once the new table is in sync with the original table, the app would be taken down, and the tables would be renamed:

mysql> RENAME TABLE employees TO employees_old;

mysql> RENAME TABLE employees_new TO employees;

Note that any foreign key restraints that referenced the original table will have to be dropped and recreated referencing the original table name, because when the table is renamed, the FK references will automatically be modified to reference the new table name. Also, if any FK constraints that reference the column to be modified exist, the referencing columns in the tables containing those restraints will have to also be converted to BIGINT. To do this, the constraint would be dropped, the column definition modified, and the constraint added back in, referencing the original table name.

Once the new table is ready and any foreign key constraints have been added or modified as needed, the app can be brought back online.

As with any operation, be sure to test the process in a non-production environment prior to implementing in production.