Linux 下安装及配置Heartbeat

Heartbeat 是一个基于Linux开源的,被广泛使用的高可用集群系统。主要包括心跳服务和资源接管两个高可用集群组件。本文简要描述了在Linux环境下安装heartbeat 2.1.4,同时描述了heartbeat的3个重要配置文件的配置方法。

有关heartbeat集群组件相关概念可参考: Heartbeat 集群组件概述

一、安装heartbeat
###准备安装文件
###由于heartbeat V2版本已经不再更新,V2版本最终版为2.1.4。
###对于需要在Linux对于需要在Linux 6下安装的可以从以下链接下载:
###对于Linux 5系列的可以在此下载:和https://dl.Fedoraproject.org/pub/epel/5/x86_64/repoview/letter_h.group.html
# rpm -Uvh PyXML-0.8.4-19.el6.x86_64.rpm
# rpm -Uvh perl-MailTools-2.04-4.el6.noarch.rpm
# rpm -Uvh perl-TimeDate-1.16-11.1.el6.noarch.rpm
# rpm -Uvh libnet-1.1.6-7.el6.x86_64.rpm
# rpm -Uvh ipvsadm-1.26-2.el6.x86_64.rpm
# rpm -Uvh lm_sensors-libs.x86_64 0:3.1.1-17.el6 
# rpm -Uvh net-snmp-libs.x86_64.rpm

# rpm -Uvh heartbeat-pils-2.1.4-12.el6.x86_64.rpm
# rpm -Uvh heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
# rpm -Uvh heartbeat-2.1.4-12.el6.x86_64.rpm

###以下2个rpm包根据需要安装,一个是Heartbeat development package,一个是针对lvs
# rpm -Uvh heartbeat-devel-2.1.4-12.el6.x86_64.rpm     
# rpm -Uvh heartbeat-ldirectord-2.1.4-12.el6.x86_64.rpm 

###验证安装包
# rpm -qa |grep -i heartbeat
heartbeat-2.1.4-12.el6.x86_64
heartbeat-pils-2.1.4-12.el6.x86_64
heartbeat-stonith-2.1.4-12.el6.x86_64
heartbeat-ldirectord-2.1.4-12.el6.x86_64
heartbeat-devel-2.1.4-12.el6.x86_64

#复制样本配置文件到/etc/ha.d目录下并作相应修改
# cp /usr/share/doc/heartbeat-2.1.4/ha.cf /etc/ha.d/
# cp /usr/share/doc/heartbeat-2.1.4/haresources /etc/ha.d/
# cp /usr/share/doc/heartbeat-2.1.4/authkeys /etc/ha.d/
#

二、配置heartbeat
heartbeat配置主要由3个文件组成,一个是ha.cf,一个是authkeys,一个是haresources,下面分别描述。

1、ha.cf
该文件是heartbeat的主要配置文件,大致包括如下信息:
    heartbeat日志文件输出级别,位置;
    心跳时长,告警时长,脑裂时长,初始化时长等;
    心跳通讯方式,IP,端口号,串口设备,波特率等;
    节点名称,隔离方式等。

示例文件描述 
[root@orasrv1 ha.d]# more ha.cf
#
#      There are lots of options in this file.  All you have to have is a set
#      of nodes listed {"node ...} one of {serial, bcast, mcast, or ucast},
#      and a value for "auto_failback".
#      必须设置的有节点列表集{node ...},{serial,bcast,mcast,或ucast}中的一个,auto_failback的值
#
#      ATTENTION: As the configuration file is read line by line,
#                  THE ORDER OF DIRECTIVE MATTERS!
#      配置文件是逐行读取的,并且选项的顺序是会影响最终结果的。
#
#      In particular, make sure that the udpport, serial baud rate
#      etc. are set before the heartbeat media are defined!
#      debug and log file directives go into effect when they
#      are encountered.
#
#      确保在udpport,serial baud rate在heartbeat检测前预先定义或预留可用
#      也就是是在定义网卡,串口等心跳检测接口前先要定义端口号。
#
#      All will be fine if you keep them ordered as in this example.
#      如果保持本样例中的定义顺序,本配置将会正常工作。
#
#      Note on logging:
#      If all of debugfile, logfile and logfacility are not defined,
#      logging is the same as use_logd yes. In other case, they are
#      respectively effective. if detering the logging to syslog,
#      logfacility must be "none".
#      记录日志方面的注意事项:
#      如果debugfile,logfile和logfacility都没有定义,日志记录就相当于use_logd yes。
#      否则,他们将分别生效。如果要阻止记录日志到syslog,那么logfacility必须设置为“none”
#
#      File to write debug messages to 
#写入debug消息的文件
#debugfile /var/log/ha-debug
#
#
#      File to write other messages to 
#     
#单独指定日志文件
logfile        /var/log/ha-log
#
#
#      Facility to use for syslog()/logger
#用于syslog()/logger的设备,通常情况下不建议与logfile同时启用
#logfacility    local0
#
#
#      A note on specifying "how long" times below...
#
#      The default time unit is seconds
#              10 means ten seconds
#
#      You can also specify them in milliseconds
#              1500ms means 1.5 seconds
#
#
#      keepalive: how long between heartbeats?
#心跳时长
#keepalive 2
#
#      deadtime: how long-to-declare-host-dead?
#
#              If you set this too low you will get the problematic
#              split-brain (or cluster partition) problem.
#              See the FAQ for how to use warntime to tune deadtime.
#              如果这个时间值设置得过长将导致脑裂或集群分区的问题。
#心跳丢失后死亡时长
#deadtime 30
#
#      warntime: how long before issuing "late heartbeat" warning?
#      See the FAQ for how to use warntime to tune deadtime.
#     
#     
#心跳丢失后警告时长
#warntime 10
#
#
#      Very first dead time (initdead)
#
#      On some machines/OSes, etc. the network takes a while to come up
#      and start working right after you've been rebooted.  As a result
#      we have a separate dead time for when things first come up.
#      It should be at least twice the normal dead time.
#      在某些机器/操作系统等中,网络在机器启动或重启后需要花一定的时间启动并正常工作。
#      因此我们必须分开他们初次起来的dead time,这个值应该最少设置为两倍的正常dead time。
#
#初始死亡时长
#initdead 120
#
#
#      What UDP port to use for bcast/ucast communication?
#
#端口号的配置
#udpport        694
#
#      Baud rate for serial ports...                     
#
#波特率的配置
#baud  19200
#
#      serial  serialportname ...     
#串口名称                   
#serial /dev/ttyS0      # Linux
#serial /dev/cuaa0      # FreeBSD
#serial /dev/cuad0      # FreeBSD 6.x
#serial /dev/cua/a      # Solaris
#
#
#      What interfaces to broadcast heartbeats over?           
#
#广播的网络接口名称
#bcast  eth0            # Linux
#bcast  eth1 eth2      # Linux
#bcast  le0            # Solaris
#bcast  le1 le2        # Solaris
#
#      Set up a multicast heartbeat medium               
#      mcast [dev] [mcast group] [port] [ttl] [loop]
#
#      [dev]          device to send/rcv heartbeats on
#      [mcast group]  multicast group to join (class D multicast address
#                      224.0.0.0 - 239.255.255.255)
#      [port]          udp port to sendto/rcvfrom (set this value to the
#                      same value as "udpport" above)
#      [ttl]          the ttl value for outbound heartbeats.  this effects
#                      how far the multicast packet will propagate.  (0-255)
#                      Must be greater than zero.
#      [loop]          toggles loopback for outbound multicast heartbeats.
#                      if enabled, an outbound packet will be looped back and
#                      received by the interface it was sent on. (0 or 1)
#                      Set this value to zero.
#
#有关多播的配置
#mcast eth0 225.0.0.1 694 1 0
#
#      Set up a unicast / udp heartbeat medium           
#      ucast [dev] [peer-ip-addr]
#
#      [dev]          device to send/rcv heartbeats on
#      [peer-ip-addr]  IP address of peer to send packets to
#
#
#ucast eth0 192.168.1.2 
#
#对于广播,单播或多播,各有优缺点。
#单播多用于2节点情形,但是2节点上则不能使用相同的配置文件,因为ip地址不一样                                   
#
#
#      About boolean values...  关于boolean值
#     
#      下面的任意不区分大小写敏感值将被当作true
#      Any of the following case-insensitive values will work for true:
#              true, on, yes, y, 1
#      下面的任意不区分大小写敏感值将被当作false
#      Any of the following case-insensitive values will work for false:
#              false, off, no, n, 0
#     
#
#
#
#      auto_failback:  determines whether a resource will
#      automatically fail back to its "primary" node, or remain
#      on whatever node is serving it until that node fails, or
#      an administrator intervenes.
#      决定一个resource是否自动恢复到它的初始primary节点,
#      或者继续运行在转移后的节点直到出现故障或管理员进行干预。
#
#      The possible values for auto_failback are:
#              on      - enable automatic failbacks
#              off    - disable automatic failbacks
#              legacy  - enable automatic failbacks in systems
#                      where all nodes do not yet support
#                      the auto_failback option.
#
#      auto_failback "on" and "off" are backwards compatible with the old
#              "nice_failback on" setting.
#
#      See the FAQ for information on how to convert
#              from "legacy" to "on" without a flash cut.
#              (i.e., using a "rolling upgrade" process)
#
#      The default value for auto_failback is "legacy", which
#      will issue a warning at startup.  So, make sure you put
#      an auto_failback directive in your ha.cf file.
#      (note: auto_failback can be any boolean or "legacy")
#
#自动failback配置
auto_failback on
#
#
#      Basic STONITH support
#      Using this directive assumes that there is one stonith
#      device in the cluster.  Parameters to this device are
#      read from a configuration file. The format of this line is:
#
#        stonith <stonith_type> <configfile>
#
#      NOTE: it is up to you to maintain this file on each node in the
#      cluster!
#
#基本STONITH支持
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
#      STONITH support
#      You can configure multiple stonith devices using this directive.
#      The format of the line is:
#        stonith_host <hostfrom> <stonith_type> <params...>
#        <hostfrom> is the machine the stonith device is attached
#              to or * to mean it is accessible from any host.
#        <stonith_type> is the type of stonith device (a list of
#              supported drives is in /usr/lib/stonith.)
#        <params...> are driver specific parameters.  To see the
#              format for a particular device, run:
#          stonith -l -t <stonith_type>
#
#
#      Note that if you put your stonith device access information in
#      here, and you make this file publically readable, you're asking
#      for a denial of service attack ;-)
#
#      To get a list of supported stonith devices, run
#              stonith -L
#      For detailed information on which stonith devices are supported
#      and their detailed configuration options, run this command:
#              stonith -h
#
#stonith_host *    baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3  rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
#
#      Watchdog is the watchdog timer.  If our own heart doesn't beat for
#      a minute, then our machine will reboot.
#      NOTE: If you are using the software watchdog, you very likely
#      wish to load the module with the parameter "nowayout=0" or
#      compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
#      an orderly shutdown of heartbeat will trigger a reboot, which is
#      very likely NOT what you want.
#
#watchdog计时器的配置
#watchdog /dev/watchdog
#     
#      Tell what machines are in the cluster
#      node    nodename ...    -- must match uname -n
#
#节点名称配置,重要,必须与uname -n获得的名字等同
#node  ken3
#node  kathy
#
#      Less common options...
#
#      Treats 10.10.10.254 as a psuedo-cluster-member
#      Used together with ipfail below...
#      note: don't use a cluster node as ping node
#      将10.10.10.254看成一个伪集群成员,与下面的 ipfail一起使用。
#      注意:不要使用一个集群节点作为ping节点,通常可以设置为Ping 网关。
#      此作用用于觉定集群重构的仲裁票数
#
#ping 10.10.10.254
#
#      Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member
#      called group1. If either 10.10.10.254 or 10.10.10.253 are up
#      then group1 is up
#      Used together with ipfail below...
#      同上,意思是两个IP当中,任意一个ping通即可
#
#ping_group group1 10.10.10.254 10.10.10.253
#
#      HBA ping derective for Fiber Channel
#      Treats fc-card-name as psudo-cluster-member
#      used with ipfail below ...
#
#      You can obtain HBAAPI from   You need
#      to get the library specific to your HBA directly from the vender
#      To install HBAAPI stuff, all You need to do is to compile the common
#      part you obtained from the sourceforge. This will produce libHBAAPI.so
#      which you need to copy to /usr/lib. You need also copy hbaapi.h to
#      /usr/include.
#
#      The fc-card-name is the name obtained from the hbaapitest program
#      that is part of the hbaapi package. Running hbaapitest will produce
#      a verbose output. One of the first line is similar to:
#              Apapter number 0 is named: qlogic-qla2200-0
#      Here fc-card-name is qlogic-qla2200-0.
#
#hbaping fc-card-name
#
#
#      Processes started and stopped with heartbeat.  Restarted unless
#              they exit with rc=100
#      指定当一个heartbeat服务或节点宕机时如何处理。
#      开启ipfail则是重启对应的节点,该进程被自动监视,遇到故障则重新启动。
#      ipfail进程用于检测和处理网络故障,需要配合ping语句指定的ping node来检测网络连接。
#
#respawn userid /path/name/to/run
#respawn hacluster /usr/lib/heartbeat/ipfail
#
#      Access control for client api
#              default is no access
#
#apiauth client-name gid=gidlist uid=uidlist
#apiauth ipfail gid=haclient uid=hacluster

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/72eed8338f177d4cfcf58d3b151046a9.html