刘博:携程技术保障中心数据库高级经理,主要关注Sql server和Mysql的运维和故障处理。
【环境】版本号:Server version: 5.6.21
隔离级别:REPEATABLE READ
接到监控报警,有一个线上的应用DeadLock报错,每15分钟会准时出现,报错统计如下图:
登录Mysql服务器查看日志:
*** (1) TRANSACTION:
TRANSACTION 102973, ACTIVE 11 sec starting index read
mysql tables in use 3, locked 3
LOCK WAIT 4 lock struct(s), heap size 1136, 3 row lock(s)
MySQL thread id 6, OS thread handle 140024996574976, query id 83 localhost us updating
UPDATE TestTable
SET column1 = 1,
Column2 = sysdate(),
Column3= '026’
Column4 = 0
AND column5 = 485
AND column6 = 'SEK'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 417 page no 1493 n bits 1000 index idx_column6 of table test.TestTable trx id 102973 lock_mode X waiting
Record lock, heap no 859 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 53454b; asc SEK;;
1: len 8; hex 80000000007e1452; asc ~ R;;
*** (2) TRANSACTION:
TRANSACTION 102972, ACTIVE 26 sec starting index read
mysql tables in use 3, locked 3
219 lock struct(s), heap size 24784, 2906 row lock(s), undo log entries 7
MySQL thread id 5, OS thread handle 140024996841216, query id 84 localhost us updating
UPDATE TestTable
SET Column1 = 1,
Column2 = sysdate(),
Column3 = '026'
Column4 = 0
AND Column5 = 485
AND Column6 = 'SEK'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 417 page no 1493 n bits 1000 index idx_Column6 of table test.TestTable trx id 102972 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 859 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 53454b; asc SEK;;
1: len 8; hex 80000000007e1452; asc ~ R;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 601 page no 89642 n bits 1000 index idx_column6 of table test.TestTable trx id 32231892482 lock_mode X locks rec but not gap waiting
Record lock, heap no 38 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 3; hex 53454b; asc SEK;;
1: len 8; hex 80000000007eea14; asc ~ ;;
大致一看,更新同一索引的同一行,应该是一个Block,报TimeOut的错才对,怎么会报DeadLock?
【初步分析】先分析下(2) TRANSACTION,TRANSACTION 32231892482。
等待的锁信息为:
1: len 8; hex 80000000007eea14; asc
持有的锁信息为:
0: len 3; hex 53454b; asc SEK;;
1: len 8; hex 80000000007eeac4; asc
再先分析下(1) TRANSACTION,TRANSACTION 32231892617。
等待的锁信息为:
0: len 3; hex 53454b; asc SEK;;
1: len 8; hex 80000000007eeac4; asc
于是可以画出的死锁表,两个资源相互依赖,造成死锁:
TRANSACTION | Hold | Wait
------------|----------------------|-----------------------|
32231892617| 53454b\80000000007eea14 |53454b\80000000007eeac4|
32231892482| 53454b\80000000007eeac4 |53454b\80000000007eea14|
让我们再看一下explain结果:
*************************** 1. row ***************************
id: 1
select_type: UPDATE
table: TestTable
partitions: NULL
type: index_merge
possible_keys: column5_index,idx_column5_column6_Column1,idxColumn6
key: column5_index,idxColumn6
key_len: 8,9
ref: NULL
rows: 7
filtered: 100.00
Extra: Using intersect(column5_index,idxColumn6); Using where
可以看到 EXTRA 列:
从5.1开始,引入了 index merge 优化技术,对同一个表可以使用多个索引分别进行条件扫描。
相关文档:
根据以上初步分析,猜测应该就是intersect造成的,于是在测试环境模拟验证,开启2个session模拟死锁:
时间序列| Session1 |Session2
-------|-----------|----------|
1 |Begin; |
2 |UPDATE TestTable SET Column2 = sysdate() Column4 = 0 AND Column5 = 47 AND Column6 = 'SEK
执行成功,影响7行|
3 | |Begin;
4 | | UPDATE TestTable SET Column2 = sysdate(),Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK';
被Blocking
5 |UPDATE TestTable SET Column2 = sysdate(),Column4 = 0 AND Column5 = 485 AND Column6 = 'SEK';
执行成功 |ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction