Hadoop测试例子WordCount

 

 

首页服务器应用

背景:

阅读新闻

Hadoop测试例子WordCount

[日期:2013-01-23]   来源:Linux社区  作者:luxh   [字体:]  

1、建立一个测试的目录  

[root@localhost Hadoop-1.1.1]# bin/hadoop dfs -mkdir /hadoop/input

2、建立测试文件

[root@localhost test]# vi test.txt

hello hadoop
hello World
Hello Java
Hey man
i am a programmer

3、将测试文件放到测试目录中

[root@localhost hadoop-1.1.1]# bin/hadoop dfs -put ./test/test.txt /hadoop/input

4、执行wordcount程序

[root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output

/hadoop/output目录必须不存在,否则会报错:

org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /hadoop/output already exists

因为Hadoop执行的是耗费资源的运算,产生的结果默认是不能被覆盖的。

执行成功的话,显示下面的信息:

[root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output
/01/17 00:36:06 INFO input.FileInputFormat: Total input paths to process : 1
/01/17 00:36:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
/01/17 00:36:06 WARN snappy.LoadSnappy: Snappy native library not loaded
/01/17 00:36:07 INFO mapred.JobClient: Running job: job_201301162205_0006
/01/17 00:36:08 INFO mapred.JobClient:  map 0% reduce 0%
/01/17 00:36:14 INFO mapred.JobClient:  map 100% reduce 0%
/01/17 00:36:22 INFO mapred.JobClient:  map 100% reduce 33%
/01/17 00:36:24 INFO mapred.JobClient:  map 100% reduce 100%
/01/17 00:36:25 INFO mapred.JobClient: Job complete: job_201301162205_0006
/01/17 00:36:25 INFO mapred.JobClient: Counters: 29
/01/17 00:36:25 INFO mapred.JobClient:  Job Counters
/01/17 00:36:25 INFO mapred.JobClient:    Launched reduce tasks=1
/01/17 00:36:25 INFO mapred.JobClient:    SLOTS_MILLIS_MAPS=6863
/01/17 00:36:25 INFO mapred.JobClient:    Total time spent by all reduces waiting after reserving slots (ms)=0
/01/17 00:36:25 INFO mapred.JobClient:    Total time spent by all maps waiting after reserving slots (ms)=0
/01/17 00:36:25 INFO mapred.JobClient:    Launched map tasks=1
/01/17 00:36:25 INFO mapred.JobClient:    Data-local map tasks=1
/01/17 00:36:25 INFO mapred.JobClient:    SLOTS_MILLIS_REDUCES=9207
/01/17 00:36:25 INFO mapred.JobClient:  File Output Format Counters
/01/17 00:36:25 INFO mapred.JobClient:    Bytes Written=78
/01/17 00:36:25 INFO mapred.JobClient:  FileSystemCounters
/01/17 00:36:25 INFO mapred.JobClient:    FILE_BYTES_READ=128
/01/17 00:36:25 INFO mapred.JobClient:    HDFS_BYTES_READ=170
/01/17 00:36:25 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=48059
/01/17 00:36:25 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=78
/01/17 00:36:25 INFO mapred.JobClient:  File Input Format Counters
/01/17 00:36:25 INFO mapred.JobClient:    Bytes Read=62
/01/17 00:36:25 INFO mapred.JobClient:  Map-Reduce Framework
/01/17 00:36:25 INFO mapred.JobClient:    Map output materialized bytes=128
/01/17 00:36:25 INFO mapred.JobClient:    Map input records=5
/01/17 00:36:25 INFO mapred.JobClient:    Reduce shuffle bytes=128
/01/17 00:36:25 INFO mapred.JobClient:    Spilled Records=22
/01/17 00:36:25 INFO mapred.JobClient:    Map output bytes=110
/01/17 00:36:25 INFO mapred.JobClient:    CPU time spent (ms)=1650
/01/17 00:36:25 INFO mapred.JobClient:    Total committed heap usage (bytes)=176492544
/01/17 00:36:25 INFO mapred.JobClient:    Combine input records=12
/01/17 00:36:25 INFO mapred.JobClient:    SPLIT_RAW_BYTES=108
/01/17 00:36:25 INFO mapred.JobClient:    Reduce input records=11
/01/17 00:36:25 INFO mapred.JobClient:    Reduce input groups=11
/01/17 00:36:25 INFO mapred.JobClient:    Combine output records=11
/01/17 00:36:25 INFO mapred.JobClient:    Physical memory (bytes) snapshot=180088832
/01/17 00:36:25 INFO mapred.JobClient:    Reduce output records=11
/01/17 00:36:25 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=756244480
/01/17 00:36:25 INFO mapred.JobClient:    Map output records=12
[root@localhost hadoop-1.1.1]#

 

伪分布模式安装Hadoop

ssh执行远程服务器需要sudo提权的脚本

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/1b677d2d3c7cbc72e80b18b5846b11ee.html