在线上进行DDL操作时,相对于其可能带来的系统负载,其实,我们最担心的还是MDL其可能导致的阻塞问题。
一旦DDL操作因获取不到MDL被阻塞,后续其它针对该表的其它操作都会被阻塞。典型如下,如阻塞稍久的话,我们会看到Threads_running飙升,CPU告警。
MySQL> show processlist;
+----+-----------------+-----------+-----------+---------+------+---------------------------------+------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------+-----------+---------+------+---------------------------------+------------------------------------+
| 4 | event_scheduler | localhost | NULL | Daemon | 122 | Waiting on empty queue | NULL |
| 9 | root | localhost | NULL | Sleep | 57 | | NULL |
| 12 | root | localhost | employees | Query | 40 | Waiting for table metadata lock | alter table slowtech.t1 add c1 int |
| 13 | root | localhost | employees | Query | 35 | Waiting for table metadata lock | select * from slowtech.t1 |
| 14 | root | localhost | employees | Query | 30 | Waiting for table metadata lock | select * from slowtech.t1 |
| 15 | root | localhost | employees | Query | 19 | Waiting for table metadata lock | select * from slowtech.t1 |
| 16 | root | localhost | employees | Query | 10 | Waiting for table metadata lock | select * from slowtech.t1 |
| 17 | root | localhost | employees | Query | 0 | starting | show processlist |
+----+-----------------+-----------+-----------+---------+------+---------------------------------+------------------------------------+
8 rows in set (0.00 sec)
如果发生在线上,无疑会影响到业务。所以,一般建议将DDL操作放到业务低峰期做,其实有两方面的考虑,1. 避免对系统负载产生较大影响。2. 减少DDL被阻塞的概率。
MDL引入的背景
MDL是MySQL 5.5.3引入的,主要用于解决两个问题,
RR事务隔离级别下不可重复读的问题
如下所示,演示环境,MySQL 5.5.0。
session1> begin;
Query OK, 0 rows affected (0.00 sec)
session1> select * from t1;
+------+------+
| id | name |
+------+------+
| 1 | a |
| 2 | b |
+------+------+
2 rows in set (0.00 sec)
session2> alter table t1 add c1 int;
Query OK, 2 rows affected (0.02 sec)
Records: 2 Duplicates: 0 Warnings: 0
session1> select * from t1;
Empty set (0.00 sec)
session1> commit;
Query OK, 0 rows affected (0.00 sec)
session1> select * from t1;
+------+------+------+
| id | name | c1 |
+------+------+------+
| 1 | a | NULL |
| 2 | b | NULL |
+------+------+------+
2 rows in set (0.00 sec)
可以看到,虽然是RR隔离级别,但在开启事务的情况下,第二次查询却没有结果。
主从复制问题
包括主从数据不一致,主从复制中断等。
如下面的主从数据不一致。
session1> create table t1(id int,name varchar(10)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)
session1> begin;
Query OK, 0 rows affected (0.00 sec)
session1> insert into t1 values(1,'a');
Query OK, 1 row affected (0.00 sec)
session2> truncate table t1;
Query OK, 0 rows affected (0.46 sec)
session1> commit;
Query OK, 0 rows affected (0.35 sec)
session1> select * from t1;
Empty set (0.00 sec)
再来看看从库的结果