深入理解Redis高可用方案(2)

28234:X 08 Oct 16:04:04.288 # +sdown master mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:04.378 # +new-epoch 1
28234:X 08 Oct 16:04:04.385 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28234:X 08 Oct 16:04:04.385 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
28234:X 08 Oct 16:04:04.385 # Next failover delay: I will not start a failover before Mon Oct  8 16:10:04 2018
28234:X 08 Oct 16:04:05.630 # +config-update-from sentinel 18311edfbfb7bf89fe4b67d08ef432053db62fff 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
28234:X 08 Oct 16:04:05.630 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:05.630 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
28234:X 08 Oct 16:04:35.709 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

slave2

28244:S 08 Oct 16:04:04.762 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:04.762 # Error condition on socket for SYNC: Connection refused
28244:S 08 Oct 16:04:05.630 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:43880 fd=12 name= age=148 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=224 qbuf-free=
32544 obl=81 oll=0 omem=0 events=r cmd=slaveof')28244:S 08 Oct 16:04:05.636 # CONFIG REWRITE executed with success.
28244:S 08 Oct 16:04:05.770 * Connecting to MASTER 127.0.0.1:6381
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync started
28244:S 08 Oct 16:04:05.770 * Non blocking connect for SYNC fired the event.
28244:S 08 Oct 16:04:05.770 * Master replied to PING, replication can continue...
28244:S 08 Oct 16:04:05.770 * Trying a partial resynchronization (request b95802ca8afd97c578b355a5838d219681d0af27:24302).
28244:S 08 Oct 16:04:05.770 * Successful partial resynchronization with master.
28244:S 08 Oct 16:04:05.770 # Master replication ID changed to a4022bb5c361353a4773fd460cec5cdcc5c02031
28244:S 08 Oct 16:04:05.770 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

slave3

28253:S 08 Oct 16:04:03.655 * MASTER <-> SLAVE sync started
28253:S 08 Oct 16:04:03.655 # Error condition on socket for SYNC: Connection refused
28253:M 08 Oct 16:04:04.586 # Setting secondary replication ID to b95802ca8afd97c578b355a5838d219681d0af27, valid up to offset: 24302. New replication ID is a4022bb5c361353a4773fd460cec5cdc
c5c0203128253:M 08 Oct 16:04:04.586 * Discarding previously cached master state.
28253:M 08 Oct 16:04:04.586 * MASTER MODE enabled (user request from 'id=9 addr=127.0.0.1:49316 fd=8 name=sentinel-18311edf-cmd age=137 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-
free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')28253:M 08 Oct 16:04:04.593 # CONFIG REWRITE executed with success.
28253:M 08 Oct 16:04:05.770 * Slave 127.0.0.1:6380 asks for synchronization
28253:M 08 Oct 16:04:05.770 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 156 bytes of backlog starting from offset 24302.

结合上面的日志,可以看到,

各个Sentinel节点都判断127.0.0.1 6379为主观下线(Subjectively Down,缩写为sdown)。

28163:X 08 Oct 16:04:04.289 # +sdown master mymaster 127.0.0.1 6379

达到quorum的设置,Sentinel-2判断其为客观下线(Objectively Down,缩写为odown)。结合其它两个Sentinel节点的日志,可以看到,Sentinel-2最先判定其客观下线。接下来,会进行Sentinel的领导者选举。一般来说,谁先完成客观下线的判定,谁就是领导者,只有Sentinel领导者才能进行failover。

28163:X 08 Oct 16:04:04.366 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
28163:X 08 Oct 16:04:04.366 # +new-epoch 1
28163:X 08 Oct 16:04:04.366 # +try-failover master mymaster 127.0.0.1 6379
28163:X 08 Oct 16:04:04.373 # +vote-for-leader 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # 3e9eb1aa9378d89cfe04fe21bf4a05a901747fa8 voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.385 # d4424b8684977767be4f5abd1e364153fbb0adbd voted for 18311edfbfb7bf89fe4b67d08ef432053db62fff 1
28163:X 08 Oct 16:04:04.450 # +elected-leader master mymaster 127.0.0.1 6379

寻找合适的slave作为master

28163:X 08 Oct 16:04:04.450 # +failover-state-select-slave master mymaster 127.0.0.1 6379

+failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable slave for promotion.

将127.0.0.1 6381设置为新主

28163:X 08 Oct 16:04:04.528 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+selected-slave <instance details> -- We found the specified good slave to promote.

命令6381节点执行slaveof no one,使其成为主节点

28163:X 08 Oct 16:04:04.528 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

+failover-state-send-slaveof-noone <instance details> -- We are trying to reconfigure the promoted slave as master, waiting for it to switch.

等待6381节点升级为主节点

28163:X 08 Oct 16:04:04.586 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

确认6381节点已经升级为主节点

28163:X 08 Oct 16:04:05.543 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

再来看看16:04:04.528到16:04:05.543这个时间段slave3的日志输出。可以看到,其开启了MASTER模式,且重写了配置文件。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/ddf48dc6fa2f09a2b39dd660c960db06.html