八、heartbeat的安装配置
1. 实现heartbeat高可用需要一些软件实现,需要安装上去
#yum install libnet
#yum install heartbeat-devel
#yum install heartbeat-ldirectord
#yum install heartbeat
安装完后会自动建立用户hacluster和组haclient确保两个节点上hacluster用户的的UID和GID相同在两台节点上分别执行:
#cat /etc/passwd | grep hacluster | awk -F “:” ‘{print $3}’cat /etc/passwd | grep hacluster | awk -F “:” ‘{print $4}’cat /etc/group | grep haclient | awk -F “:” ‘{print $3}’结果都相同即可
2. 查看所安装的软件
———————————————–
# rpm -qa | grep heartbeatheartbeat-stonith-2.1.3-3.el5.CentOS
heartbeat-devel-2.1.3-3.el5.centos
heartbeat-ldirectord-2.1.3-3.el5.centos
heartbeat-pils-2.1.3-3.el5.centos
heartbeat-2.1.3-3.el5.centos
# rpm -qa | grep libnetlibnet-1.1.2.1-2.rf
# rpm -qa | grep ipvsadmipvsadm-1.24-13.el5
———————————————–
在从节点上也作同样的操作
3. 配置主节点的heartbeatHeartbeat的主要配置文件有ha.cf、haresources、authkeys,均在/etc/ha.d目录下,在通过yum安装Heartbeat后,默认并没有这三个文件,可从解压的源码目录中找到,这里手动创建并编辑。
1)主配置文件:ha.cf配置heartbeat的检测机制本次实例中,内容设置如下:
———————————————–
# cat /etc/ha.d/ha.cf
debugfile /var/log/ha-debu
logfile /var/log/ha-log
logfacility local0
keepalive 2
warntime 10
deadtime 30
initdead 120
hopfudge 1
udpport 694
bcast eth0
ucast eth0 192.168.137.133
auto_failback on
node TM-Mater
node TM-Slave
ping 192.168.137.254
respawn root /usr/lib/heartbeat/ipfail
apiauth ipfail gid=root uid=root
———————————————–
说明:
debugfile /var/log/ha-debug #用于记录heartbeat的调试信息
logfile /var/log/ha-log #用于记录heartbeat的日志信息
logfacility local0 #系统日志级别
keepalive 2 #设定心跳(监测)间隔时间,默认单位为秒
warntime 10 ##警告时间,通常为deadtime时间的一半
deadtime 30 # 超出30秒未收到对方节点的心跳,则认为对方已经死亡
initdead 120 #网络启动时间,至少为deadtime的两倍。
hopfudge 1 #可选项:用于环状拓扑结构,在集群中总共跳跃节点的数量
udpport 694 #使用udp端口694 进行心跳监测
bcast eth0
ucast eth0 192.168.137.133 #采用单播,进行心跳监测,IP为对方主机
IPauto_failback on #on表示当拥有该资源的属主恢复之后,资源迁移到属主上
node TM-Mater #设置集群中的节点,节点名须与uname –n相匹配
node TM-Slave #节点2
ping 192.168.137.254 #ping集群以外的节点,这里是网关,用于检测网络的连接性
respawn root /usr/lib/heartbeat/ipfail
apiauth ipfail gid=root uid=root #设置所指定的启动进程的权限
———————————————–
注:heartbeat的两台主机分别为主节点和从节点。主节点在正常情况下占用资源并运行所有的服务,遇到故障时把资源交给从节点并由从节点运行服务。
2)资源文件haresourcesha.cf文件设置了heartbeat的检验机制,没有执行机制。Haresources用来设置当主服务器出现问题时heartbeat的执行机制。 其内容为:当主服务器宕机后,该怎样进行切换操作。切换内容通常有IP地址的切换、服务的切换、共享存储的切换,从而使从服务器具有和主服务器同样的 IP、SERVICE、SHARESTORAGE,从而使client没有察觉。在两个HA节点上该文件必须完全一致。本次实例中,内容设置如下:
———————————————–
配置资源文件,把参数直接追加进去了
#echo “TM-Master IPaddr::10.8.50.13/8/eth0 drbddisk::web Filesystem::/dev/drbd0::/home/drdb::ext3 killnfsd” >> /etc/ha.d/haresources
解释:10.8.50.13,是虚拟ip地址,/home/drdb 是共享的文件还需要创建一个文件关于nfs的,要受到heartbeat的控制
# cd /etc/ha.d/resource.d/
# touch killnfsd#chmod 755 /etc/ha.d/resource.d/killnfsd (改变它的权限)
#echo “killall -9 nfsd; /etc/init.d/nfs restart; exit 0 ” >> /etc/ha.d/resource.d/killnfsd
(把参数追加上去)———————————————–
3)认证文件authkeys用于配置心跳的加密方式,该文件主要是用于集群中两个节点的认证,采用的算法和密钥在集群中节点上必须相同,目前提供了3种算法:md5,sha1和crc。其中crc不能够提供认证,它只能够用于校验数据包是否损坏,而sha1,md5需要一个密钥来进行认证。本次实例中,内容设置如下:
# dd if=/dev/random bs=512 count=1 |openssl md5 (随机产生一个参数用md5加密)
0+1 records in
0+1 records out
128 bytes (128 B) copied, 0.000572 seconds, 224 kB/s
48112364a106901dbd8afc6b0305ad72———————————————
# vim /etc/ha.d/authkeys(编辑认证文件)
auth 3
3 md5 9bf2f23aae3a63c16ea681295ffd7666———————————————–
注:需要更改该文件的属性为600,否则heartbeat启动将失败
#chmod 600 /etc/ha.d/authkeys
4) 配置从节点的heartbeat将主节点上的heartbeat配置文件拷贝到从节点,并确保两个节点上的配置文件权限相同:
———————————————–
# scp /etc/ha.d/ha.cf root@192.168.137.133:/etc/ha.d/
# scp /etc/ha.d/haresources root@192.168.137.133:/etc/ha.d/
# scp /etc/ha.d/authkeys root@192.168.137.133:/etc/ha.d/
———————————————–
ha.cf文件需修改ucast的内容,将其指向主节点:ucast eth1 192.168.137.132 #指定对方IP其他文件内容无需修改。
九、使用NFS服务测试
heartbeat编辑各自主机的测试用文件index.hr0l,放到/home/drbd目录下,内容分别为“TM-Master”和“TM-Slave”
在两机上分别启动NFS服务和heartbeat服务
#service nfs start
#service heartbeat start
再次确认drbb是否工作,主NFS设置drbdadm primary r0
#drbdadm primary r0
#/etc/init.d/drbd status
#ifconfig
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:88:EC:85
inet addr:192.168.137.13 Bcast:192.168.137.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
查看主NFS网卡,发现多了一个,虚拟网卡,主要是把资源征用了
模拟故障
1. 主NFS服务器宕机
[root@TM-Master ~]# shutdown -h now
2 监控从NFS服务器
[root@TM-Slave ~]#tail -f /var/log/ha-log
heartbeat[10419]: 2013/07/12_17:36:14 WARN: node TM-Master: is dead
heartbeat[10419]: 2013/07/12_17:36:14 info: Comm_now_up(): updating status to active
heartbeat[10419]: 2013/07/12_17:36:14 info: Local status now set to: 'active'
heartbeat[10419]: 2013/07/12_17:36:14 info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,496)
heartbeat[10419]: 2013/07/12_17:36:14 WARN: No STONITH device configured.
heartbeat[10419]: 2013/07/12_17:36:14 WARN: Shared disks are not protected.
heartbeat[10419]: 2013/07/12_17:36:14 info: Resources being acquired from TM-Master.
heartbeat[10500]: 2013/07/12_17:36:14 info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498 gid 496 (pid 10500)
heartbeat[10502]: 2013/07/12_17:36:14 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys TM-Slave] to acquire.
heartbeat[10419]: 2013/07/12_17:36:14 info: Initial resource acquisition complete (T_RESOURCES(us))
harc[10501]: 2013/07/12_17:36:14 info: Running /etc/ha.d/rc.d/status status
mach_down[10530]: 2013/07/12_17:36:14 info: Taking over resource group IPaddr::192.168.137.13/24/eth0
ResourceManager[10556]: 2013/07/12_17:36:14 info: Acquiring resource group: TM-Master IPaddr::192.168.137.13/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/home/drbd::ext3 killnfsd
IPaddr[10583]: 2013/07/12_17:36:14 INFO: Resource is stopped
ResourceManager[10556]: 2013/07/12_17:36:14 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.13/24/eth0 start
IPaddr[10681]: 2013/07/12_17:36:14 INFO: Using calculated netmask for 192.168.137.13: 255.255.255.0
IPaddr[10681]: 2013/07/12_17:36:14 INFO: eval ifconfig eth0:0 192.168.137.13 netmask 255.255.255.0 broadcast 192.168.137.255
IPaddr[10652]: 2013/07/12_17:36:14 INFO: Success
ResourceManager[10556]: 2013/07/12_17:36:15 info: Running /etc/ha.d/resource.d/drbddisk r0 start
Filesystem[10829]: 2013/07/12_17:36:15 INFO: Resource is stopped
ResourceManager[10556]: 2013/07/12_17:36:15 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home/drbd ext3 start
Filesystem[10910]: 2013/07/12_17:36:15 INFO: Running start for /dev/drbd0 on /home/drbd
Filesystem[10899]: 2013/07/12_17:36:15 INFO: Success
mach_down[10530]: 2013/07/12_17:36:15 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[10530]: 2013/07/12_17:36:15 info: mach_down takeover complete for node TM-Master.
heartbeat[10419]: 2013/07/12_17:36:15 info: mach_down takeover complete.
heartbeat[10419]: 2013/07/12_17:36:24 info: Local Resource acquisition completed. (none)
heartbeat[10419]: 2013/07/12_17:36:24 info: local resource transition completed.
[root@TM-Slave ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:5B:3A:C1
inet addr:192.168.137.133 Bcast:192.168.137.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7671943 errors:0 dropped:0 overruns:0 frame:0
TX packets:2248139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:11511319129 (10.7 GiB) TX bytes:174620385 (166.5 MiB)
eth0:0 Link encap:Ethernet HWaddr 00:0C:29:5B:3A:C1
inet addr:192.168.137.13 Bcast:192.168.137.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
[root@TM-Slave ~]# mount | grep drbd0
/dev/drbd0 on /home/drbd type ext3 (rw)
已成功接管,重新挂载NFS共享!
重启主NFS服务器:
[root@TM-Master ~]# tail -f /var/log/ha-log
heartbeat[2134]: 2013/07/12_17:56:02 WARN: node TM-Slave: is dead
heartbeat[2134]: 2013/07/12_17:56:02 info: Comm_now_up(): updating status to active
heartbeat[2134]: 2013/07/12_17:56:02 info: Local status now set to: 'active'
heartbeat[2134]: 2013/07/12_17:56:02 info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,496)
heartbeat[2134]: 2013/07/12_17:56:02 WARN: No STONITH device configured.
heartbeat[2134]: 2013/07/12_17:56:02 WARN: Shared disks are not protected.
heartbeat[2134]: 2013/07/12_17:56:02 info: Resources being acquired from TM-Slave.
heartbeat[2183]: 2013/07/12_17:56:02 info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498 gid 496 (pid 2183)
harc[2184]: 2013/07/12_17:56:02 info: Running /etc/ha.d/rc.d/status status
mach_down[2230]: 2013/07/12_17:56:02 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[2230]: 2013/07/12_17:56:02 info: mach_down takeover complete for node TM-Slave.
heartbeat[2134]: 2013/07/12_17:56:02 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat[2134]: 2013/07/12_17:56:02 info: mach_down takeover complete.
IPaddr[2257]: 2013/07/12_17:56:02 INFO: Resource is stopped
heartbeat[2185]: 2013/07/12_17:56:02 info: Local Resource acquisition completed.
harc[2321]: 2013/07/12_17:56:03 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[2321]: 2013/07/12_17:56:03 received ip-request-resp IPaddr::192.168.137.13/24/eth0 OK yes
ResourceManager[2342]: 2013/07/12_17:56:03 info: Acquiring resource group: TM-Master IPaddr::192.168.137.13/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/home/drbd::ext3 killnfsd
IPaddr[2369]: 2013/07/12_17:56:03 INFO: Resource is stopped
ResourceManager[2342]: 2013/07/12_17:56:03 info: Running /etc/ha.d/resource.d/IPaddr 192.168.137.13/24/eth0 start
IPaddr[2467]: 2013/07/12_17:56:03 INFO: Using calculated netmask for 192.168.137.13: 255.255.255.0
IPaddr[2467]: 2013/07/12_17:56:03 INFO: eval ifconfig eth0:0 192.168.137.13 netmask 255.255.255.0 broadcast 192.168.137.255
IPaddr[2438]: 2013/07/12_17:56:03 INFO: Success
ResourceManager[2342]: 2013/07/12_17:56:03 info: Running /etc/ha.d/resource.d/drbddisk r0 start
Filesystem[2615]: 2013/07/12_17:56:03 INFO: Resource is stopped
ResourceManager[2342]: 2013/07/12_17:56:03 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /home/drbd ext3 start
Filesystem[2696]: 2013/07/12_17:56:03 INFO: Running start for /dev/drbd0 on /home/drbd
Filesystem[2685]: 2013/07/12_17:56:04 INFO: Success
heartbeat[2134]: 2013/07/12_17:56:13 info: Local Resource acquisition completed. (none)
heartbeat[2134]: 2013/07/12_17:56:13 info: local resource transition completed.
重新接回来了!
相关阅读:
Ubuntu 12.04安装NFS server
NFS服务器安装配置实现Ubuntu 12.04与ARM文件共享
Heartbeat_ldirector+LB+NFS实现HA及LB、文件共享