Hadoop 三节点集群安装配置详细实例(5)

==========================
MapReduce 测试
==========================
[@Hadoop48 ~]$ vi test.txt
a b c d
a b c d
aa bb cc dd
ee ff gg hh

由前面.bashrc设置,fs为hadoop dfs的别称
hls为 hadoop -ls的别称
[@Hadoop48 hadoop-1.0.3]$ fs -put test.txt test.txt
[@Hadoop48 hadoop-1.0.3]$ hls
Found 1 items
-rw-r–r– 3 zhouhh supergroup 40 2012-05-23 19:39 /user/zhouhh/test.txt

执行mapreduce测试wordcount例子:

[@Hadoop48 hadoop-1.0.3]$ ./bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /user/zhouhh/test.txt output
12/05/23 19:40:52 INFO input.FileInputFormat: Total input paths to process : 1
12/05/23 19:40:52 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/23 19:40:52 WARN snappy.LoadSnappy: Snappy native library not loaded
12/05/23 19:40:52 INFO mapred.JobClient: Running job: job_201205231824_0001
12/05/23 19:40:53 INFO mapred.JobClient: map 0% reduce 0%
12/05/23 19:41:07 INFO mapred.JobClient: map 100% reduce 0%
12/05/23 19:41:19 INFO mapred.JobClient: map 100% reduce 100%
12/05/23 19:41:24 INFO mapred.JobClient: Job complete: job_201205231824_0001
12/05/23 19:41:24 INFO mapred.JobClient: Counters: 29
12/05/23 19:41:24 INFO mapred.JobClient: Job Counters
12/05/23 19:41:24 INFO mapred.JobClient: Launched reduce tasks=1
12/05/23 19:41:24 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11561
12/05/23 19:41:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/05/23 19:41:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/05/23 19:41:24 INFO mapred.JobClient: Launched map tasks=1
12/05/23 19:41:24 INFO mapred.JobClient: Data-local map tasks=1
12/05/23 19:41:24 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9934
12/05/23 19:41:24 INFO mapred.JobClient: File Output Format Counters
12/05/23 19:41:24 INFO mapred.JobClient: Bytes Written=56
12/05/23 19:41:24 INFO mapred.JobClient: FileSystemCounters
12/05/23 19:41:24 INFO mapred.JobClient: FILE_BYTES_READ=110
12/05/23 19:41:24 INFO mapred.JobClient: HDFS_BYTES_READ=147
12/05/23 19:41:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43581
12/05/23 19:41:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=56
12/05/23 19:41:24 INFO mapred.JobClient: File Input Format Counters
12/05/23 19:41:24 INFO mapred.JobClient: Bytes Read=40
12/05/23 19:41:24 INFO mapred.JobClient: Map-Reduce Framework
12/05/23 19:41:24 INFO mapred.JobClient: Map output materialized bytes=110
12/05/23 19:41:24 INFO mapred.JobClient: Map input records=4
12/05/23 19:41:24 INFO mapred.JobClient: Reduce shuffle bytes=110
12/05/23 19:41:24 INFO mapred.JobClient: Spilled Records=24
12/05/23 19:41:24 INFO mapred.JobClient: Map output bytes=104
12/05/23 19:41:24 INFO mapred.JobClient: CPU time spent (ms)=1490
12/05/23 19:41:24 INFO mapred.JobClient: Total committed heap usage (bytes)=194969600
12/05/23 19:41:24 INFO mapred.JobClient: Combine input records=16
12/05/23 19:41:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=107
12/05/23 19:41:24 INFO mapred.JobClient: Reduce input records=12
12/05/23 19:41:24 INFO mapred.JobClient: Reduce input groups=12
12/05/23 19:41:24 INFO mapred.JobClient: Combine output records=12
12/05/23 19:41:24 INFO mapred.JobClient: Physical memory (bytes) snapshot=271958016
12/05/23 19:41:24 INFO mapred.JobClient: Reduce output records=12
12/05/23 19:41:24 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1126625280
12/05/23 19:41:24 INFO mapred.JobClient: Map output records=16

可见,效率不高,但成功了。
[@Hadoop48 ~]$ hls
Found 2 items
drwxr-xr-x – zhouhh supergroup 0 2012-05-23 19:41 /user/zhouhh/output
-rw-r–r– 3 zhouhh supergroup 40 2012-05-23 19:39 /user/zhouhh/test.txt
hls所列,实际存在于分布式系统中。
[@Hadoop48 ~]$ hadoop dfs -get /user/zhouhh/output .
[@Hadoop48 ~]$ cat output/*
cat: output/_logs: Is a directory
a 2
aa 1
b 2
bb 1
c 2
cc 1
d 2
dd 1
ee 1
ff 1
gg 1
hh 1
或直接远程查看:
[@Hadoop48 ~]$ hadoop dfs -cat output/*
cat: File does not exist: /user/zhouhh/output/_logs
a 2
aa 1

可见,分布式hadoop配置成功。

参考:
群集配置教程:
命令手册:
hdfs手册:

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/f92fb26a54a6417e4f91d1b5c8d089dd.html