只叙述secondary namenode部署出错所产生的错误及解决方法
环境:SUSE 10.1
namenode 单独部署在cloud1
secondary namenode 单独部署在 cloud3
集群部署完成后使用Jps查看进程,发现该有的进程都有,hdfs也能上传下载文件
查看secondary name 上的log,发现在doCheckpoint都失败
2011-11-11 00:02:58,154 ERROR org.apache.Hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 2011-11-11 00:02:58,155 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at java.net.Socket.connect(Socket.java:478) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.(HttpClient.java:395) at sun.net.(HttpClient.java:530) at sun.net.<init>(HttpClient.java:234) at sun.net.(HttpClient.java:307) at sun.net.(HttpClient.java:324)发现secondary namenode不能产生image文件夹更别说image内的fimage,edits等文件
查看name node上的log
2011-11-11 23:13:03,628 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.0.1.162 2011-11-11 23:18:03,642 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.0.1.162没有错误,但是没有没有提示获取成功信息
关闭集群,在hdfs-site.xml中指定
<property> <name>dfs.http.address</name> <value>{your_namenode_ip}:50070</value> </property>分发到下去,重启,发现image文件夹内文件都有了,再次产看secondar namenode上的日志仍然doCheckPoint失败
查看namenode上的log
2011-11-17 13:31:57,434 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at java.net.Socket.connect(Socket.java:478) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.(HttpClient.java:395) at sun.net.(HttpClient.java:530) at sun.net.<init>(HttpClient.java:234) at sun.net.(HttpClient.java:307) at sun.net.(HttpClient.java:324) at sun.net.(HttpURLConnection.java:970) at sun.net.(HttpURLConnection.java:911) at sun.net.(HttpURLConnection.java:836) at sun.net.(HttpURLConnection.java:1172)可以得知secondary namenode上传image失败,所以namenode也就getimage失败了