Hadoop中的若干异常的解决办法

Hadoop@Ubuntu:~$ hadoop/bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input01 output01
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop-0.20.2-examples.jar
    at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
    at java.util.zip.ZipFile.open(Native Method)
    at java.util.zip.ZipFile.<init>(ZipFile.java:131)
    at java.util.jar.JarFile.<init>(JarFile.java:150)
    at java.util.jar.JarFile.<init>(JarFile.java:87)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:88)

发生这个异常后,找了很多帖子都没有解答,也有很多人遇到了类似的情况。其实这一般并不是java包有问题,问题也简单的可笑,就是上面的命令行中
hadoop-0.20.2-examples.jar
路径不完整造成的,需要注意一下命令行当前的位置,比如对于我的情况,改为hadoop/hadoop-0.20.2-examples.jar就可以了

异常2

hadoop@ubuntu:~$ hadoop/bin/hadoop jar hadoop/hadoop-0.20.2-examples.jar wordcount input01 output02

java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

11/03/15 12:54:09 WARN mapred.JobClient: Error reading task output:50060/tasklog?plaintext=true&taskid=attempt_201103151252_0001_m_000004_1&filter=stdout
......

这个问题困扰了我整整一晚上,中文博客基本没搜到什么有参考价值的文章,老外的很多博客提到了,但是很多也没说清楚。其中有一些有提示作用,比如:

Just an FYI, found the solution to this problem.

Apparently, it's an OS limit on the number of sub-directories that can be reated in another directory.  In this case, we had 31998 sub-directories uder hadoop/userlogs/, so any new tasks would fail in Job Setup.

From the unix command line, mkdir fails as well:
  $ mkdir hadoop/userlogs/testdir
  mkdir: cannot create directory `hadoop/userlogs/testdir': Too many links

Difficult to track down because the Hadoop error message gives no hint whasoever.  And normally, you'd look in the userlog itself for more info, butin this case the userlog couldn't be created.

问题是,我可以通过这个小测试,在userlogs下面可以添加任意的文件夹和文件,当然也有可能某些人确实就是这个问题,不能添加。
然后我的解决办法是,直接把这个userlogs给去掉或者换一个文件夹名

hadoop@ubuntu:~$ mv /home/hadoop/hadoop/logs/uerlogs/ /home/hadoop/hadoop/logs/uerlogsOLD/

即,把原来的文件夹改名成userlogsOLD(相当于一种移除、保存方式了),重新运行

hadoop@ubuntu:~$ hadoop/bin/hadoop jar hadoop/hadoop-0.20.2-examples.jar wordcount input01 output03

11/03/15 14:21:23 INFO input.FileInputFormat: Total input paths to process : 3
11/03/15 14:21:23 INFO mapred.JobClient: Running job: job_201103151252_0004
11/03/15 14:21:24 INFO mapred.JobClient:  map 0% reduce 0%
11/03/15 14:21:32 INFO mapred.JobClient:  map 66% reduce 0%
11/03/15 14:21:35 INFO mapred.JobClient:  map 100% reduce 0%
11/03/15 14:21:44 INFO mapred.JobClient:  map 100% reduce 100% 11/03/15 14:21:46 INFO mapred.JobClient: Job complete: job_201103151252_0004
......

问题自此解决了!但是我还是不懂这是什么原因造成的,但可以肯定的是关于日志的存储量的问题。因为才开始学,eclpse下新建MapReduce工程也能跑起来了,慢慢估计会了解。留此权当笔记!

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/pxppj.html