在Linux单机上运行Hadoop(2)

3、运行wordcount实例

wordcount例子是Hadoop发行包中自带的实例,通过运行实例可以感受并尝试理解hadoop在执行MapReduce任务时的执行过程。按照官方的“Hadoop Quick Start”教程基本可以容易地实现,下面简单说一下我的练习过程。

导航到hadoop目录下面,我的是/root/hadoop-0.19.0。

(1)格式化HDFS

执行格式化HDFS的命令行:

[root@localhost hadoop-0.19.0]# bin/hadoop namenode -format

格式化执行信息如下所示:
10/08/01 19:04:02 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.19.0
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
************************************************************/
Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y
Format aborted in /tmp/hadoop-root/dfs/name
10/08/01 19:04:05 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

(2)启动Hadoop相关后台进程

执行命令行:

[root@localhost hadoop-0.19.0]# bin/start-all.sh

启动执行信息如下所示:
starting namenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-namenode-localhost.out
localhost: starting datanode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-datanode-localhost.out
localhost: starting secondarynamenode, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-secondarynamenode-localhost.out
starting jobtracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-jobtracker-localhost.out
localhost: starting tasktracker, logging to /root/hadoop-0.19.0/bin/../logs/hadoop-root-tasktracker-localhost.out

(3)准备执行wordcount任务的数据

首先,这里在本地创建了一个数据目录input,并拷贝一些文件到该目录下面,如下所示:

[root@localhost hadoop-0.19.0]# mkdir input
[root@localhost hadoop-0.19.0]# cp CHANGES.txt LICENSE.txt NOTICE.txt README.txt input/

然后,将本地目录input上传到HDFS文件系统上,执行如下命令:

[root@localhost hadoop-0.19.0]# bin/hadoop fs -put input/ input

(4)启动wordcount任务

执行如下命令行:

[root@localhost hadoop-0.19.0]# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount input output

元数据目录为input,输出数据目录为output。

任务执行信息如下所示:
10/08/01 19:06:15 INFO mapred.FileInputFormat: Total input paths to process : 4
10/08/01 19:06:15 INFO mapred.JobClient: Running job: job_201008011904_0002
10/08/01 19:06:16 INFO mapred.JobClient:  map 0% reduce 0%
10/08/01 19:06:22 INFO mapred.JobClient:  map 20% reduce 0%
10/08/01 19:06:24 INFO mapred.JobClient:  map 40% reduce 0%
10/08/01 19:06:25 INFO mapred.JobClient:  map 60% reduce 0%
10/08/01 19:06:27 INFO mapred.JobClient:  map 80% reduce 0%
10/08/01 19:06:28 INFO mapred.JobClient:  map 100% reduce 0%
10/08/01 19:06:38 INFO mapred.JobClient:  map 100% reduce 26%
10/08/01 19:06:40 INFO mapred.JobClient:  map 100% reduce 100%
10/08/01 19:06:41 INFO mapred.JobClient: Job complete: job_201008011904_0002
10/08/01 19:06:41 INFO mapred.JobClient: Counters: 16
10/08/01 19:06:41 INFO mapred.JobClient:   File Systems
10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes read=301489
10/08/01 19:06:41 INFO mapred.JobClient:     HDFS bytes written=113098
10/08/01 19:06:41 INFO mapred.JobClient:     Local bytes read=174004
10/08/01 19:06:41 INFO mapred.JobClient:     Local bytes written=348172
10/08/01 19:06:41 INFO mapred.JobClient:   Job Counters
10/08/01 19:06:41 INFO mapred.JobClient:     Launched reduce tasks=1
10/08/01 19:06:41 INFO mapred.JobClient:     Launched map tasks=5
10/08/01 19:06:41 INFO mapred.JobClient:     Data-local map tasks=5
10/08/01 19:06:41 INFO mapred.JobClient:   Map-Reduce Framework
10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input groups=8997
10/08/01 19:06:41 INFO mapred.JobClient:     Combine output records=10860
10/08/01 19:06:41 INFO mapred.JobClient:     Map input records=7363
10/08/01 19:06:41 INFO mapred.JobClient:     Reduce output records=8997
10/08/01 19:06:41 INFO mapred.JobClient:     Map output bytes=434077
10/08/01 19:06:41 INFO mapred.JobClient:     Map input bytes=299871
10/08/01 19:06:41 INFO mapred.JobClient:     Combine input records=39193
10/08/01 19:06:41 INFO mapred.JobClient:     Map output records=39193
10/08/01 19:06:41 INFO mapred.JobClient:     Reduce input records=10860

(5)查看任务执行结果

可以通过如下命令行:

bin/hadoop fs -cat output/*

执行结果,截取部分显示如下所示:

vijayarenu      20
violations.     1
virtual 3
vis-a-vis       1
visible 1
visit   1
volume  1
volume, 1
volumes 2
volumes.        1
w.r.t   2
wait    9
waiting 6
waiting.        1
waits   3
want    1
warning 7
warning,        1
warnings        12
warnings.       3
warranties      1
warranty        1
warranty,       1

(6)终止Hadoop相关后台进程

执行如下命令行:

[root@localhost hadoop-0.19.0]# bin/stop-all.sh

执行信息如下所示:
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode

已经将上面列出的5个进程jobtracker、tasktracker、namenode、datanode、secondarynamenode终止。

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wwsppd.html