Hadoop完全分布式环境搭建(3)

Hadoop提供了MapReduce编程框架,其并行处理能力的发挥需要通过开发Map及Reduce程序实现。为了便于系统测试,Hadoop提供了一个单词统计的应用程序算法样例,其位于Hadoop安装目录下名称类似hadoop-examples-*.jar的文件中。除了单词统计,这个jar文件还包含了分布式运行的grep等功能的实现,这可以通过如下命令查看。
注:rpm包安装后,其算法样例位于/usr/share/hadoop/hadoop-examples-1.2.1.jar
[hadoop@master ~]$ hadoop jar  /usr/share/hadoop/hadoop-examples-1.2.1.jar
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  dbcount: An example job that count the pageview counts from a database.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using monte-carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sleep: A job that sleeps at each map and reduce task.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.

首先创建in文件夹,put两个文件进去,然后进行测试
[hadoop@master ~]$ hadoop fs -mkdir in
[hadoop@master ~]$ hadoop fs -put /etc/fstab /etc/profile in

测试:
[hadoop@master ~]$ hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount in out
14/03/06 11:26:42 INFO input.FileInputFormat: Total input paths to process : 2
14/03/06 11:26:42 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/03/06 11:26:42 WARN snappy.LoadSnappy: Snappy native library not loaded
14/03/06 11:26:43 INFO mapred.JobClient: Running job: job_201403061123_0001
14/03/06 11:26:44 INFO mapred.JobClient:  map 0% reduce 0%
14/03/06 11:26:50 INFO mapred.JobClient:  map 100% reduce 0%
14/03/06 11:26:57 INFO mapred.JobClient:  map 100% reduce 33%
14/03/06 11:26:58 INFO mapred.JobClient:  map 100% reduce 100%
14/03/06 11:26:59 INFO mapred.JobClient: Job complete: job_201403061123_0001
14/03/06 11:26:59 INFO mapred.JobClient: Counters: 29
14/03/06 11:26:59 INFO mapred.JobClient:  Job Counters
14/03/06 11:26:59 INFO mapred.JobClient:    Launched reduce tasks=1
14/03/06 11:26:59 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=7329
14/03/06 11:26:59 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/06 11:26:59 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
14/03/06 11:26:59 INFO mapred.JobClient:    Launched map tasks=2
14/03/06 11:26:59 INFO mapred.JobClient:    Data-local map tasks=2
14/03/06 11:26:59 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=8587
14/03/06 11:26:59 INFO mapred.JobClient:  File Output Format Counters
14/03/06 11:26:59 INFO mapred.JobClient:    Bytes Written=2076
14/03/06 11:26:59 INFO mapred.JobClient:  FileSystemCounters
14/03/06 11:26:59 INFO mapred.JobClient:    FILE_BYTES_READ=2948
14/03/06 11:26:59 INFO mapred.JobClient:    HDFS_BYTES_READ=3139
14/03/06 11:26:59 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=167810
14/03/06 11:26:59 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=2076
14/03/06 11:26:59 INFO mapred.JobClient:  File Input Format Counters
14/03/06 11:26:59 INFO mapred.JobClient:    Bytes Read=2901
14/03/06 11:26:59 INFO mapred.JobClient:  Map-Reduce Framework
14/03/06 11:26:59 INFO mapred.JobClient:    Map output materialized bytes=2954
14/03/06 11:26:59 INFO mapred.JobClient:    Map input records=97
14/03/06 11:26:59 INFO mapred.JobClient:    Reduce shuffle bytes=2954
14/03/06 11:26:59 INFO mapred.JobClient:    Spilled Records=426
14/03/06 11:26:59 INFO mapred.JobClient:    Map output bytes=3717
14/03/06 11:26:59 INFO mapred.JobClient:    Total committed heap usage (bytes)=336994304
14/03/06 11:26:59 INFO mapred.JobClient:    CPU time spent (ms)=2090
14/03/06 11:26:59 INFO mapred.JobClient:    Combine input records=360
14/03/06 11:26:59 INFO mapred.JobClient:    SPLIT_RAW_BYTES=238
14/03/06 11:26:59 INFO mapred.JobClient:    Reduce input records=213
14/03/06 11:26:59 INFO mapred.JobClient:    Reduce input groups=210
14/03/06 11:26:59 INFO mapred.JobClient:    Combine output records=213
14/03/06 11:26:59 INFO mapred.JobClient:    Physical memory (bytes) snapshot=331116544
14/03/06 11:26:59 INFO mapred.JobClient:    Reduce output records=210
14/03/06 11:26:59 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=3730141184
14/03/06 11:26:59 INFO mapred.JobClient:    Map output records=360

注:这里out文件夹需没有创建

七. 各种错误的总结
1. map 100%,reduce 0%
一般是由于主机名和IP地址不对应造成的,仔细检查3个节点的/etc/hosts文件

2. Error: Java heap space
分配的堆内存不够,在mapred-site.xml中,将mapred.child.java.opts的值改大,改为1024试试看

3. namenode无法启动
请修改默认的临时目录,在上面的文章中有提到

4. Name node is in safe mode,或者JobTracker is in safe mode
Namenode并不会持久存储数据块与其存储位置的对应信息,因为这些信息是在HDFS集群启动由Namenode根据各Datanode发来的信息进行重建而来。这个重建过程被称为HDFS的安全模式。
这种情况下只需等待会就好

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/bdf404b39257834dc82ce072ac3d5648.html