用户在使用Mapreduce时默认以part-*命名,MultipleOutputs可以将不同的键值对输出到用户自定义的不同的文件中。
实现过程是在调用output.write(key, new IntWritable(total), key.toString());
方法时候第三个参数是 public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) 指定了输出文件的命名前缀,那么我们可以通过对不同的key使用不同的baseOutputPath来使不同key对应的value输出到不同的文件中,比如将同一天的数据输出到以该日期命名的文件中
Spark 颠覆 MapReduce 保持的排序记录
Hadoop的HDFS和MapReduce
Hadoop技术内幕:深入解析MapReduce架构设计与实现原理 PDF高清扫描版
测试数据:ip-to-hosts.txt
18.217.167.70 United States
206.96.54.107 United States
196.109.151.139 Mauritius
174.52.58.113 United States
142.111.216.8 Canada
162.100.49.185 United States
146.38.26.54 United States
36.35.107.36 China
95.214.95.13 Spain
2.96.191.111 United Kingdom
62.177.119.177 Czech Republic
21.165.189.3 United States
46.190.32.115 Greece
113.173.113.29 Vietnam
42.65.172.142 Taiwan
197.91.198.199 South Africa
68.165.71.27 United States
110.119.165.104 China
171.50.76.89 India
171.207.52.113 Singapore
40.174.30.170 United States
191.170.95.175 United States
17.81.129.101 United States
91.212.157.202 France
173.83.82.99 United States
129.75.56.220 United States
149.25.104.198 United States
103.110.22.19 Indonesia
204.188.117.122 United States
138.23.10.72 United States
172.50.15.32 United States
85.88.38.58 Belgium
49.15.14.6 India
19.84.175.5 United States
50.158.140.215 United States
161.114.120.34 United States
118.211.174.52 Australia
220.98.113.71 Japan
182.101.16.171 China
25.45.75.194 United Kingdom
168.16.162.99 United States
155.60.219.154 Australia
26.216.17.198 United States
68.34.157.157 United States
89.176.196.28 Czech Republic
173.11.51.134 United States
116.207.191.159 China
164.210.124.152 United States
168.17.158.38 United States
174.24.173.11 United States
143.64.173.176 United States
160.164.158.125 Italy
15.111.128.4 United States
22.71.176.163 United States
105.57.100.182 Morocco
111.147.83.42 China
137.157.65.89 Australia
该文件中每行数据有两个字段 分别是ip地址和该ip地址对应的国家,以\t分隔
上代码
public static class IPCountryReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
private MultipleOutputs output;
@Override
protected void setup(Context context
) throws IOException, InterruptedException {
output = new MultipleOutputs(context);
}
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException {
int total = 0;
for(IntWritable value: values) {
total += value.get();
}
<span> output.write(new Text("Output by MultipleOutputs"), NullWritable.get(), key.toString());
output.write(key, new IntWritable(total), key.toString());</span>
}
@Override
protected void cleanup(Context context
) throws IOException, InterruptedException {
output.close();
}
}
在reduce的setup方法中
output = new MultipleOutputs(context);
然后在reduce中通过该output将内容输出到不同的文件中
private Configuration conf;
public static final String NAME = "named_output";