Hadoop集群安装好后,可以测试hadoop的基本功能。hadoop自带了一个jar包(hadoop-examples-0.20.205.0.jar,不同版本最后不同)中wordcount程序可以测试统计单词的个数,先来体验一下再说。
[hadoop@master ~]$ mkdir input #先创建一个输入目录 [hadoop@master ~]$ cd input/ [hadoop@master input]$ echo "hello world">text1.txt #将要输入的文件放到该目录 [hadoop@master input]$ echo "hello hadoop">text2.txt [hadoop@master input]$ ls text1.txt text2.txt [hadoop@master input]$ cat text1.txt hello world [hadoop@master input]$ cat text2.txt hello hadoop [hadoop@master input]$ cd .. [hadoop@master ~]$ ls input log 公共的 模板 视频 图片 文档 下载 新文件~ 音乐 桌面 [hadoop@master ~]$ /usr/bin/hadoop dfs -put ./input in #将input目录中的两个文件放到hdfs中 [hadoop@master ~]$ /usr/bin/hadoop dfs -ls ./in/* #查看hdfs中的两个文件 -rw-r--r-- 2 hadoop supergroup 12 2012-09-13 16:16 /user/hadoop/in/text1.txt -rw-r--r-- 2 hadoop supergroup 13 2012-09-13 16:16 /user/hadoop/in/text2.txt #运行hadoop自带的一个jar包中的wordcount程序,这个程序统计单词的出现次数 #程序的输入是in这个目录中的两个文件,结果输出到out目录 [hadoop@master ~]$ /usr/bin/hadoop jar /usr/hadoop-examples-0.20.205.0.jar wordcount in out 12/09/13 16:20:32 INFO input.FileInputFormat: Total input paths to process : 2 12/09/13 16:20:36 INFO mapred.JobClient: Running job: job_201209131425_0001 12/09/13 16:20:37 INFO mapred.JobClient: map 0% reduce 0% 12/09/13 16:23:38 INFO mapred.JobClient: map 50% reduce 0% 12/09/13 16:24:31 INFO mapred.JobClient: map 100% reduce 16% 12/09/13 16:24:40 INFO mapred.JobClient: map 100% reduce 100% 12/09/13 16:24:45 INFO mapred.JobClient: Job complete: job_201209131425_0001 12/09/13 16:24:45 INFO mapred.JobClient: Counters: 29 12/09/13 16:24:45 INFO mapred.JobClient: Job Counters 12/09/13 16:24:45 INFO mapred.JobClient: Launched reduce tasks=1 12/09/13 16:24:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=230205 12/09/13 16:24:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/09/13 16:24:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/09/13 16:24:45 INFO mapred.JobClient: Launched map tasks=3 12/09/13 16:24:45 INFO mapred.JobClient: Data-local map tasks=3 12/09/13 16:24:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=58667 12/09/13 16:24:45 INFO mapred.JobClient: File Output Format Counters 12/09/13 16:24:45 INFO mapred.JobClient: Bytes Written=25 12/09/13 16:24:45 INFO mapred.JobClient: FileSystemCounters 12/09/13 16:24:45 INFO mapred.JobClient: FILE_BYTES_READ=55 12/09/13 16:24:45 INFO mapred.JobClient: HDFS_BYTES_READ=241 12/09/13 16:24:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=64354 12/09/13 16:24:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25 12/09/13 16:24:45 INFO mapred.JobClient: File Input Format Counters 12/09/13 16:24:45 INFO mapred.JobClient: Bytes Read=25 12/09/13 16:24:45 INFO mapred.JobClient: Map-Reduce Framework 12/09/13 16:24:45 INFO mapred.JobClient: Map output materialized bytes=61 12/09/13 16:24:45 INFO mapred.JobClient: Map input records=2 12/09/13 16:24:45 INFO mapred.JobClient: Reduce shuffle bytes=61 12/09/13 16:24:45 INFO mapred.JobClient: Spilled Records=8 12/09/13 16:24:45 INFO mapred.JobClient: Map output bytes=41 12/09/13 16:24:45 INFO mapred.JobClient: CPU time spent (ms)=13840 12/09/13 16:24:45 INFO mapred.JobClient: Total committed heap usage (bytes)=319361024 12/09/13 16:24:45 INFO mapred.JobClient: Combine input records=4 12/09/13 16:24:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=216 12/09/13 16:24:45 INFO mapred.JobClient: Reduce input records=4 12/09/13 16:24:45 INFO mapred.JobClient: Reduce input groups=3 12/09/13 16:24:45 INFO mapred.JobClient: Combine output records=4 12/09/13 16:24:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=329932800 12/09/13 16:24:45 INFO mapred.JobClient: Reduce output records=3 12/09/13 16:24:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1133260800 12/09/13 16:24:45 INFO mapred.JobClient: Map output records=4 #运行完成后,可以看到多了一个out目录,注意hdfs中没有当前目录的概念,也不能使用cd命令 [hadoop@master ~]$ /usr/bin/hadoop dfs -ls Found 2 items drwxr-xr-x - hadoop supergroup 0 2012-09-13 16:16 /user/hadoop/in drwxr-xr-x - hadoop supergroup 0 2012-09-13 16:24 /user/hadoop/out [hadoop@master ~]$ /usr/bin/hadoop dfs -ls ./out #进入到out目录 Found 3 items -rw-r--r-- 2 hadoop supergroup 0 2012-09-13 16:24 /user/hadoop/out/_SUCCESS drwxr-xr-x - hadoop supergroup 0 2012-09-13 16:20 /user/hadoop/out/_logs -rw-r--r-- 2 hadoop supergroup 25 2012-09-13 16:24 /user/hadoop/out/part-r-00000 [hadoop@master ~]$ /usr/bin/hadoop dfs -cat ./out/part-r-00000 #查看结果 hadoop 1 hello 2 world 1 [hadoop@master ~]$Hadoop集群测试(单词计数)
内容版权声明:除非注明,否则皆为本站原创文章。
转载注明出处:http://www.heiqu.com/5b9de107d5e2061b473e488a5524be58.html