Map/Reduce 通过理解org.apache.hadoop.mapreduce.Job类来学(2)

日期：2020-08-28 栏目：程序人生浏览：次

现在，我们已经得到了一个Job类实例，接下来就是要给Job类设值相应的属性，从而使Job能够正确的执行。这里，我们就以wordcount例子中的方法为例来说明：

1. job.setJarByClass(WordCount.class)

在Java doc中，对于这个function的注释是：Set the Jar by finding where a given class came from.

同时，Job类还提供另外一个方法，直接设值Job的Jar文件：setJar(String jar)，通过指定全路径，直接设值Job的jar文件。

2. job.setMapperClass(TokenizerMapper.class);

该方法用来设置Job的Mapper，这里，输入的参数应该是一个Mapper类的子类的class属性。

3. job.setCombinerClass(IntSumReducer.class);

Set the combiner class for the job

4. job.setReducerClass(IntSumReducer.class);

Set the Reducer for the job.

5. job.setOutputKeyClass(Text.class);

Set the key class for the job output data.

6. job.setOutputValueClass(IntWritable.class);

Set the value class for job outputs.

7. FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

这里，我们要注意，FileInputFormat的全路径是org.apache.Hadoop.mapreduce.lib.input.FileInputFormat，因为原来也有一个FileInputFormat在mapred包下面，那个类已经是Deprecated的类，在新的版本当中，采用org.apache.hadoop.mapreduce.lib.input.FileInputFormat来替换。addInputPath函数将输入的path加入到job的INPUT_DIR中去。FileInputFormat类提供了几个类似的类来实现相关的功能，如：setInputPaths(Job, String)，addInputPaths(Job, String)，setInputPaths(Job, Path...)，addInputPath(Job, Path)

8. FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

org.apache.hadoop.mapreduce.lib.output.FileOutputFormat用setOutputPath方法给Job设置输出目录。

到此为止，我们已经给Job实例设置了足够的属性，我们现在可以启动Job来完成我们预想的功能了。

Job类中提供了两种启动Job的方式：

1. submit()

submit函数会把Job提交给对应的Cluster，然后不等待Job执行结束就立刻返回。同时会把Job实例的状态设置为JobState.RUNNING，从而来表示Job正在进行中。然后在Job运行过程中，可以调用getJobState()来获取Job的运行状态。

2. waitForCompletion(boolean)

waitForCompletion函数会提交Job到对应的Cluster，并等待Job执行结束。函数的boolean参数表示是否打印Job执行的相关信息。返回的结果是一个boolean变量，用来标识Job的执行结果。

转载注明出处：http://www.heiqu.com/305a51b8567cf6da46afac48e758c116.html

Map/Reduce 通过理解org.apache.hadoop.mapreduce.Job类来学(2)

相关推荐