Hadoop实战教程:一步一步运行WordCont

WordCount是学习Hadoop的经典入门范例。下面通过一步步的操作,来编译、打包、运行WordCount程序。

1、在Hadoop 1.0.4的解压目录的如下位置可以找到WordCount.java的源文件

src/examples/org/apache/hadoop/examples/WordCount.java

2、新建一个dev的文件夹,将WordCount.java拷贝至dev/wordcount文件夹下

Ubuntu@ubuntu:~/dev/wordcount$ pwd
/home/ubuntu/dev/wordcount
ubuntu@ubuntu:~/dev/wordcount$ ls
bin  compile.txt  WordCount.java

3、在dev/wordcount文件夹下创建一个bin文件夹,并将编译WordCount.java得到的class文件生成至bin文件夹下

javac -classpath /home/ubuntu/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/ubuntu/hadoop-1.0.4/lib/commons-cli-1.2.jar -d bin WordCount.java

4、将生成的class文件打包成jar包

jar -cvf WordCount.jar *.class

5、在bin下新建一个input文件夹,并生成两个输入文件

ubuntu@ubuntu:~/dev/wordcount/bin/input$ ls
words-1.txt  words-2.txt
ubuntu@ubuntu:~/dev/wordcount/bin/input$ cat words-1.txt
i am a student!
how are you?
my name is lily.
ubuntu@ubuntu:~/dev/wordcount/bin/input$ cat words-2.txt
i am a student!
how are you?
she is lily
he is my brother
ubuntu@ubuntu:~/dev/wordcount/bin/input$

6、在hdfs上创建input和output文件夹,并将两个输入文件上传至input文件夹

ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop fs -mkdir /tmp/input
ubuntu@ubuntu:~/dev/wordcount/bin$ hadoop fs -mkdir /tmp/output
ubuntu@ubuntu:~/dev/wordcount/bin/input$ hadoop fs -put words-1.txt /tmp/input
ubuntu@ubuntu:~/dev/wordcount/bin/input$ hadoop fs -put words-2.txt /tmp/input

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:http://www.heiqu.com/102b27a14b631d743760a9c73c9dbcce.html