之后再重新启动master节点后,就出现了下面的问题
--master节点缺少HMaster进程
[grid@gc bin]$ ./start-hbase.sh
rac1: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac1.localdomain.out
rac2: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac2.localdomain.out
gc: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-gc.localdomain.out
starting master, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-gc.localdomain.out
rac2: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac2.localdomain.out
rac1: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac1.localdomain.out
[grid@gc bin]$ jps
3871 NameNode
4075 JobTracker
8853 Jps
4011 SecondaryNameNode
8673 HQuorumPeer
--两slave节点rac1,rac2进程正常
[grid@rac1 bin]$ jps
10353 HQuorumPeer
10576 Jps
6457 DataNode
6579 TaskTracker
10448 HRegionServer
[grid@rac2 ~]$ jps
10311 HQuorumPeer
10534 Jps
6426 DataNode
6546 TaskTracker
10391 HRegionServer
下面是部分日志
--master节点gc的日志
[grid@gc logs]$ tail -100f hbase-grid-master-gc.localdomain.log
2012-12-25 15:23:45,842 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server gc/192.168.2.100:2181
2012-12-25 15:23:45,853 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to gc/192.168.2.100:2181, initiating session
2012-12-25 15:23:45,861 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2012-12-25 15:23:46,930 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac1/192.168.2.101:2181
2012-12-25 15:23:47,167 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
2012-12-25 15:23:48,251 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac2/192.168.2.102:2181
2012-12-25 15:23:48,362 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed
2012-12-25 15:23:48,362 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2012-12-25 15:23:48,367 ERROR org.apache.Hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1065)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1079)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:931)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:134)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:219)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1060)
[grid@gc logs]$ tail -100f hbase-grid-zookeeper-gc.localdomain.log
2012-12-25 15:23:57,380 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 2 at election address rac2/192.168.2.102:3888
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
at java.lang.Thread.run(Thread.java:619)
.......
2012-12-25 15:23:57,670 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.home=/home/grid
2012-12-25 15:23:57,671 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.dir=/home/grid/hbase-0.90.5
2012-12-25 15:23:57,679 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 180000 datadir /home/grid/hbase-0.90.5/zookeeper/version-2 snapdir /home/grid/hbase-0.90.5/zookeeper/version-2
2012-12-25 15:23:58,118 WARN org.apache.zookeeper.server.quorum.Learner: Unexpected exception, tries=0, connecting to rac1/192.168.2.101:2888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:212)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644) at
2012-12-25 15:24:00,886 INFO org.apache.zookeeper.server.quorum.Learner: Getting a snapshot from leader
2012-12-25 15:24:00,897 INFO org.apache.zookeeper.server.quorum.Learner: Setting leader epoch 9
2012-12-25 15:24:01,051 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 900000000
2012-12-25 15:24:03,218 INFO org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection from /192.168.2.101:12397
2012-12-25 15:24:03,377 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /192.168.2.101:12397
2012-12-25 15:24:03,396 WARN org.apache.zookeeper.server.quorum.Learner: Got zxid 0x900000001 expected 0x1
2012-12-25 15:24:03,400 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.900000001
2012-12-25 15:24:03,470 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x3bd0f2560e0000 with negotiated timeout 180000 for client /192.168.2.101:12397
2012-12-25 15:24:07,057 INFO org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection from /192.168.2.102:52300
2012-12-25 15:24:07,690 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /192.168.2.102:52300
2012-12-25 15:24:07,712 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x3bd0f2560e0001 with negotiated timeout 180000 for client /192.168.2.102:52300
2012-12-25 15:24:10,016 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
2012-12-25 15:24:30,422 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
2012-12-25 15:24:30,423 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
--slave节点rac2的日志
[grid@rac2 logs]$ tail -100f hbase-grid-regionserver-rac2.localdomain.log
2012-12-25 15:23:46,939 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac1/192.168.2.101:2181
2012-12-25 15:23:47,154 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to rac1/192.168.2.101:2181, initiating session
2012-12-25 15:23:47,453 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2012-12-25 15:23:47,977 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server gc/192.168.2.100:2181
2012-12-25 15:23:48,354 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to gc/192.168.2.100:2181, initiating session
2012-12-25 15:23:49,583 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server gc/192.168.2.100:2181, sessionid = 0x3bd0f2560e0001, negotiated timeout = 180000
2012-12-25 15:23:52,052 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020
解决方法
禁用IPV6,将/etc/hosts文件里面的::1 localhost那一行删掉重启
[grid@rac1 ~]$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
# ::1 localhost6.localdomain6 localhosti6
192.168.2.101 rac1.localdomain rac1
192.168.2.102 rac2.localdomain rac2
192.168.2.100 gc.localdomain gc
Hbase故障解决: