总所周知,MySQL集群又名ndb cluster,而ndb就是network based database的简称,数据库节点之间依靠网络来通信和保证数据分块间的一致性。今天由于机房交换机损坏,导致集群4个数据节点(复制数为2)应用全部关闭。网络恢复后再启动遇到以下问题:
"2016-11-03 16:37:40 [ndbd] INFO -- Unable to start missing node group! starting: 0000000000000002 (missing fs for: 0000000000000000)
2016-11-03 16:37:40 [ndbd] INFO -- QMGR (Line: 1872) 0x00000002
2016-11-03 16:37:40 [ndbd] INFO -- Error handler shutting down system
2016-11-03 16:37:40 [ndbd] INFO -- Error handler shutdown completed - exiting
2016-11-03 16:37:41 [ndbd] ALERT -- Node 1: Forced node shutdown completed. Occured during startphase 1. Caused by error 2353: 'Insufficent nodes for system restart(Restart error). Temporary error, restart node'."
网络搜索了一下,发现问题可能同提交的这个bug有关,链接https://bugs.mysql.com/bug.php?id=22316。
system restart fails as you dont start all 4 nodes fast enough...
With default setting you have 30s for allowing nodes to get in contact with each other.
实际原因为,各个节点之间启动时间差太久,造成集群数据节点数不够而不能启动起来。
最后,使用xshell的To all sessions功能发送ndbmtd命令,同时启动四个节点,正常恢复集群运行。