A Combiner is a local aggregation function for repeated keys produced by same map. For associative options like sum, count, max. Since the combiner decreases the size of intermediate data, hence it is an optimization, Hadoop does not guarantee of how many times it will call it for particular map output record.
Set the combiner in driver class by calling "job.setCombinerClass(Combiner.class)" method.
We have climate data in the following format.
Data is tab separated represent year and recorded temperature in that year. Now we have to find the maximum global temperature recorder in each year.
Both reducer and combiner are identical. So we could use a combiner function just like a reduce function, to find the Max temperature for each output. Eg:-
Let’s first analyze the Driver class.
Here we are using "KeyValueText" Input Format.
Run the file using below command
To download source, click here
This post is written by
Shashank Rai - Linkedin, Google+
He is a freelance writer, loves to explore latest features in Java technology.