127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.1.11:6379,slaves=2,sentinels=3
127.0.0.1:26379> sentinel masters
1) 1) "name"
2) "mymaster"
3) "ip"
4) "192.168.1.11"
5) "port"
6) "6379"
7) "runid"
8) "e0ae553828b47db69e0d75ef8c20b30f1ed96c3c"
9) "flags"
10) "master"
.....................................................
127.0.0.1:26379> sentinel slaves mymaster
1) 1) "name"
2) "192.168.1.12:6379"
3) "ip"
4) "192.168.1.12"
5) "port"
6) "6379"
7) "runid"
8) "486ebcb9ad89bf9c6889fd98b0d669c0addb9d10"
9) "flags"
10) "slave"
....................................................
31) "master-link-status"
32) "ok"
33) "master-host"
34) "192.168.1.11"
35) "master-port"
36) "6379"
37) "slave-priority"
38) "95"
39) "slave-repl-offset"
40) "75763"
2) 1) "name"
2) "192.168.1.13:6379"
3) "ip"
4) "192.168.1.13"
5) "port"
6) "6379"
7) "runid"
8) "30fdcff948a6e249a87da41ef42f41897eaf4104"
9) "flags"
10) "slave"
.................................................
31) "master-link-status"
32) "ok"
33) "master-host"
34) "192.168.1.11"
35) "master-port"
36) "6379"
37) "slave-priority"
38) "97"
39) "slave-repl-offset"
40) "75763"
4、进行容灾测试:
4.1、模拟redis的HA集群slave服务器宕机
停掉一台slave,观察集群的状态,这里将slave2的redis-server停掉
[root@station13 ~]# killall redis-server
首先查看三台sentinel的日志信息,都会刷新一条,如下:
[root@station11 ~]# tail -f /var/log/sentinel_master.log
6640:X 23 Jan 00:33:05.032 # +sdown slave 192.168.1.13:6379 192.168.1.13 6379 @ mymaster 192.168.1.11 6379
可以看到192.168.1.13掉了
到master服务器上查看主从复制信息:
192.168.1.11:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.1.12,port=6379,state=online,offset=131310,lag=0
master_repl_offset:131310
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:131309
可以看到192.168.1.13已经被master剔除了!
再重新将slave2的redis-server启动起来,继续观察:
[root@station13 ~]# nohup redis-server /etc/redis/redis.conf&
三台sentinel的日志信息,如下:
[root@station11 ~]# tail -f /var/log/sentinel_master.log
6640:X 23 Jan 00:36:10.845 * +reboot slave 192.168.1.13:6379 192.168.1.13 6379 @ mymaster 192.168.1.11 6379
6640:X 23 Jan 00:36:10.936 # -sdown slave 192.168.1.13:6379 192.168.1.13 6379 @ mymaster 192.168.1.11 6379
可以看出192.168.1.13已经重新启动了!
在master上继续查看复制信息,如下:13主机又重回集群
192.168.1.11:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.12,port=6379,state=online,offset=162401,lag=1
slave1:ip=192.168.1.13,port=6379,state=online,offset=162401,lag=1
master_repl_offset:162401
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:162400
4.2、模拟redis的HA集群master服务器宕机
说明:停掉master的6379端口,假设master是因为外部问题宕机了(直接kill掉redis-server进程)
[root@station11 ~]# killall redis-server
观察三台redis的sentinel日志:
[root@station11 ~]#tail -f /var/log/sentinel_master.log