MapReduce to implement Counter on Hadoop
Counter Counters are useful channel for gathering statistics data about the job. You can count the number of records processed and discovered invalid records in the input dataset. Hadoop maintains some built-in counters for every job and these report various metrics.
It is provide a way for Mappers or Reducers to pass aggregate values back to driver after the job has completed. Their values are also visible from the JobTracker’s Web UI and are reported on the console when the job ends.
Counter can be set and incremented via the method
Retrieve the counter in Driver code after job is complete via method
Let take an MR example with Job Counters
Input - Below program create a Map-only MapReduce job which will use a Web server’s access log to count the number of times gifs, jpeg and other resources have been retrieved.
Output - It will report three figures: no. of gif requests, no. of jpeg requests, and no. of other requests. Expected result should be like
Edit the following command by specifying your own input & ouput directory then run the command.
Do not rely on a counter’s value from the Web UI while a job is running -
1) Due to possible speculative execution, a counters value could appear larger than the actual final value.
2) Modifications to counters from subsequently killed/failed tasks will be removed from final count.
To download source, click here
This post is written by
Shashank Rai - Linkedin, Google+
He is a freelance writer, loves to explore latest features in Java technology.