RHCS 排错常用命令(2)

针对每个不同的fence设备,redhat提供了相应的工具fence_drac、fence_ilo等,可在命令下直接加载fence设备参数进行测试。参数-o指定执行的动作,可为reboot\off\on\status等,详见man fence_drac。

如:

[root@db2 ~]# fence_drac -a 192.168.114.106 -l admin -p wlhmbst@2008 -o status

status: on

1.7. 手动群集切换clusvcadmin

The clusvcadm command allows you to enable, disable, relocate, and restart high-availability services in a cluster. For more information about this tool, refer to the clusvcadm(8) man page.

做rhcs的切换测试方式有很多,比如拔网线、模拟宕机操作。但是日常维护作业过程中需要做群集的切换,我们希望以对系统破坏最小的操作进行。你们就可以使用clusvcadmin命令。

[root@db2 /]# clusvcadm -r wbdb_service -m db2.fjnet114.com
Trying to relocate service:wbdb_service to db2.fjnet114.com...Success
service:wbdb_service is now running on db2.fjnet114.com

 

2.      IP端口使用情况

Port Number

 

Protocol

 

Component

 

5404, 5405

 

UDP

 

cman (Cluster Manager)                 

 

11111

 

TCP

 

ricci (part of Conga remote agent)   

 

14567

 

TCP

 

gnbd (Global Network Block Device)     

 

16851

 

TCP

 

modclusterd (part of Conga remote agen

 

21064

 

TCP

 

dlm (Distributed Lock Manager)       

 

50006, 50008,50009 

 

TCP

 

ccsd (Cluster Configuration System daemon)

 

50007

 

UDP

 

ccsd (Cluster Configuration System daemon)

 

3.      常见故障分析

If a node in your cluster is repeatedly getting fenced, it means that one of the nodes in your cluster is not seeing enough "heartbeat" network messages from the node that is getting fenced. Most of the time, this is a result of flaky or faulty hardware, such as bad cables or bad ports on the network hub or switch. Test your communications paths thoroughly without the cluster software running to make sure your hardware is working correctly.

如果群集中的一个节点被反复执行fenced而重启,这意味着群集中的另一节点没有发现被fenced节点足够多的心跳信息。大多数情况下,这是硬件故障导致的,如网络交换机中的故障线缆、端口等。在没有群集软件运行的情况下,测试通信链路以确认你的硬件环境工作正常。

• If a node in your cluster is repeatedly getting fenced right at startup, if may be due to system activities that occur when a node joins a cluster. If your network is busy, your cluster may decide it is not getting enough heartbeat packets. To address this, you may have to increase the post_join_delay setting in your cluster.

如果群集中的一个节点在开机时被反复fenced而重启,这可能是由这样一种系统活动导致的,当节点正在加入群集,一旦网络繁忙,群集可能觉得没有足够的心跳信息而被fenced。为解决这个情况,你需要将cluster.conf中的post_join_delay参数调大些,如由3改为60。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/psjdx.html