Map/Reduce 通过WordCount例子的变化来了解新版hado

日期：2020-08-28 栏目：程序人生浏览：次

-----------------------

今天，偶然发现Hadoop主页已经更新了文档，已经有了对于r0.21.0版本的最新文档，大家可以参考：

hadoop的文档还是非常详尽值得细细品味的，本文留在这里，供大家了解新旧版本之间的差别。

-----------------------

Example程序通常是我们学习一个新的API的最好的方式。WordCount是一个非常经典的用来介绍怎么样用Hadoop - Map/Reduce来编写自己的云计算程序的example程序。但是随着Hadoop的不断演进，很多接口API都已经发生了变化，本文意在通过对比新旧WordCount程序的实现来了解在当前最新hadoop版本中推荐的使用Hadoop来创建云计算程序的方式。

首先，先介绍一下WordCount程序是干什么的：WordCount是一个简单的应用，它可以计算出指定数据集中每一个单词出现的次数。

然后，我们再来看看在当前，网上可以找到的大部分实现方式，包括Hadoop官网doc中的WordCount的实现方式

public class WordCount {

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = valuetoString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizerhasMoreTokens()) {
wordset(tokenizernextToken());
outputcollect(word, one);
}
}
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (valueshasNext()) {
sum += valuesnext()get();
}
outputcollect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCountclass);
confsetJobName("wordcount");

confsetOutputKeyClass(Textclass);
confsetOutputValueClass(IntWritableclass);

confsetMapperClass(Mapclass);
confsetCombinerClass(Reduceclass);
confsetReducerClass(Reduceclass);

confsetInputFormat(TextInputFormatclass);
confsetOutputFormat(TextOutputFormatclass);

FileInputFormatsetInputPaths(conf, new Path(args[0]));
FileOutputFormatsetOutputPath(conf, new Path(args[1]));

JobClientrunJob(conf);
}
}

这里，我暂时不对程序做任何解释，让我们再来看看最新版中WordCount程序的实现，通过对比来了解其中的变化。

转载注明出处：http://www.heiqu.com/094a23dc3457b1083f4212b3d0a36335.html

Map/Reduce 通过WordCount例子的变化来了解新版hado

相关推荐