这里用的的物理备份工具是 Percona-XtraBackup(PXB),有的同学可能不清楚其备份流程,所以这里先简单说下,PXB的备份步骤是这样的:
1. 拷贝 InnoDB redo log,这是一个单独的线程在拷,直到备份结束;
2. 拷贝所有InnoDB ibd文件;
3. 加全局读锁,执行 FLUSH TABLES WITH READ LOCK(FTWRL);
4. 拷贝 frm、MYD、MYI 等文件;
5. 获取位点信息,执行 show slave status 和 show master status;
6. 解锁,UNLOCK TABLES;
7. 做一些收尾处理,备份结束。
如果 MyISAM 表很多话,全局读锁的持有时间会比较长,所以一般都在备库做备份。
另外 FLUSH TABLE WITH READ LOCK 这条命令会获取2个MDL锁,全局读锁(MDL_key::GLOBAL)和全局COMMIT(MDL_key::COMMIT)锁,MDL锁详情可以参考之前的月报MDL 实现分析。
死锁分析
CASE 1
我们先看一下死锁时的现场是怎样的:
MySQL> show processlist;
+----+-------------+-----------------+------+---------+------+----------------------------------------+-------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+------+----------------------------------------+-------------------+
| 1 | root | 127.0.0.1:53309 | NULL | Query | 278 | init | show slave status |
| 2 | system user | | NULL | Connect | 381 | Queueing master event to the relay log | NULL |
| 3 | system user | | NULL | Connect | 311 | Waiting for commit lock | NULL |
| 4 | root | 127.0.0.1:53312 | NULL | Query | 0 | init | show processlist |
+----+-------------+-----------------+------+---------+------+----------------------------------------+-------------------+
可以看到 show slave status 被堵了很久,另外 SQL 线程在 Waiting for commit lock,说明在等待 COMMIT 锁。
这时候如果我们再连接进去执行 show slave status 也会被堵,并且即使 Ctrl-C kill 掉线程,线程依然还在。
mysql> show processlist;
+----+-------------+-----------------+------+---------+------+----------------------------------------+-------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------------+------+---------+------+----------------------------------------+-------------------+
| 1 | root | 127.0.0.1:53309 | NULL | Query | 753 | init | show slave status |
| 2 | system user | | NULL | Connect | 856 | Queueing master event to the relay log | NULL |
| 3 | system user | | NULL | Connect | 786 | Waiting for commit lock | NULL |
| 4 | root | 127.0.0.1:53312 | NULL | Killed | 188 | init | show slave status |
| 5 | root | 127.0.0.1:53314 | NULL | Query | 0 | init | show processlist |
| 8 | root | 127.0.0.1:53318 | NULL | Killed | 125 | init | show slave status |
| 11 | root | 127.0.0.1:53321 | NULL | Killed | 123 | init | show slave status |
| 14 | root | 127.0.0.1:53324 | NULL | Query | 120 | init | show slave status |
+----+-------------+-----------------+------+---------+------+----------------------------------------+-------------------+