Hadoop 单机节点集群安装(4)

1、准备需要进行wordcount的文件
vi /tmp/test.txt
(打开后随便输入一些内容,如"mu ha ha ni da ye da ye da",然后保存退出)

2、将准备的测试文件上传到dfs文件系统中的firstTest目录下
Hadoop dfs -copyFromLocal /tmp/test.txt firstTest
(注:如dfs中不包含firstTest目录的话就会自动创建一个,关于查看dfs文件系统中已有目录的指令为"hadoop dfs -ls")

3、执行wordcount
hadoop jar hadoop-mapred-example0.21.0.jar wordcount firstTest result
(注:此语句意为“对firstTest下的所有文件执行wordcount,将统计结果输出到result文件夹中”,若result文件夹不存在则会自动创建一个)

hadoop-mapred-example0.21.0.jar 在 hadoop的根目录下

root@Ubuntu:/hadoop-1.1.0/bin# ./hadoop jar ../hadoop-examples-1.1.0.jar wordcount firstTest result
12/10/29 19:24:32 INFO input.FileInputFormat: Total input paths to process : 1
12/10/29 19:24:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/10/29 19:24:32 WARN snappy.LoadSnappy: Snappy native library not loaded
12/10/29 19:24:33 INFO mapred.JobClient: Running job: job_201210291856_0001
12/10/29 19:24:34 INFO mapred.JobClient:  map 0% reduce 0%
12/10/29 19:24:52 INFO mapred.JobClient:  map 100% reduce 0%
12/10/29 19:25:03 INFO mapred.JobClient:  map 100% reduce 100%
12/10/29 19:25:04 INFO mapred.JobClient: Job complete: job_201210291856_0001
12/10/29 19:25:04 INFO mapred.JobClient: Counters: 29
12/10/29 19:25:04 INFO mapred.JobClient:  Job Counters
12/10/29 19:25:04 INFO mapred.JobClient:    Launched reduce tasks=1
12/10/29 19:25:04 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=16020
12/10/29 19:25:04 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
12/10/29 19:25:04 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
12/10/29 19:25:04 INFO mapred.JobClient:    Launched map tasks=1
12/10/29 19:25:04 INFO mapred.JobClient:    Data-local map tasks=1
12/10/29 19:25:04 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=11306
12/10/29 19:25:04 INFO mapred.JobClient:  File Output Format Counters
12/10/29 19:25:04 INFO mapred.JobClient:    Bytes Written=26
12/10/29 19:25:04 INFO mapred.JobClient:  FileSystemCounters
12/10/29 19:25:04 INFO mapred.JobClient:    FILE_BYTES_READ=52
12/10/29 19:25:04 INFO mapred.JobClient:    HDFS_BYTES_READ=134
12/10/29 19:25:04 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=47699
12/10/29 19:25:04 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=26
12/10/29 19:25:04 INFO mapred.JobClient:  File Input Format Counters
12/10/29 19:25:04 INFO mapred.JobClient:    Bytes Read=28
12/10/29 19:25:04 INFO mapred.JobClient:  Map-Reduce Framework
12/10/29 19:25:04 INFO mapred.JobClient:    Map output materialized bytes=52
12/10/29 19:25:04 INFO mapred.JobClient:    Map input records=1
12/10/29 19:25:04 INFO mapred.JobClient:    Reduce shuffle bytes=52
12/10/29 19:25:04 INFO mapred.JobClient:    Spilled Records=10
12/10/29 19:25:04 INFO mapred.JobClient:    Map output bytes=64
12/10/29 19:25:04 INFO mapred.JobClient:    CPU time spent (ms)=6830
12/10/29 19:25:04 INFO mapred.JobClient:    Total committed heap usage (bytes)=210698240
12/10/29 19:25:04 INFO mapred.JobClient:    Combine input records=9
12/10/29 19:25:04 INFO mapred.JobClient:    SPLIT_RAW_BYTES=106
12/10/29 19:25:04 INFO mapred.JobClient:    Reduce input records=5
12/10/29 19:25:04 INFO mapred.JobClient:    Reduce input groups=5
12/10/29 19:25:04 INFO mapred.JobClient:    Combine output records=5
12/10/29 19:25:04 INFO mapred.JobClient:    Physical memory (bytes) snapshot=181235712
12/10/29 19:25:04 INFO mapred.JobClient:    Reduce output records=5
12/10/29 19:25:04 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=751153152
12/10/29 19:25:04 INFO mapred.JobClient:    Map output records=9

4、查看结果
hadoop dfs -cat result/part-r-00000
(注:结果文件默认是输出到一个名为“part-r-*****”的文件中的,可用指令“hadoop dfs -ls result”查看result目录下包含哪些文件)

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/c93de893317137584cc1f5857ce87bce.html