MapReduce to implement Counter on Hadoop - Java @ Desk

Saturday, January 17, 2015

MapReduce to implement Counter on Hadoop


MapReduce to implement Counter on Hadoop


Counter Counters are useful channel for gathering statistics data about the job. You can count the number of records processed and discovered invalid records in the input dataset. Hadoop maintains some built-in counters for every job and these report various metrics.

It is provide a way for Mappers or Reducers to pass aggregate values back to driver after the job has completed. Their values are also visible from the JobTracker’s Web UI and are reported on the console when the job ends.

Counter can be set and incremented via the method
Context.getCounter(GroupName, CounterName).increment(long  incr);
Example - context.getCounter("ImageCounter", "jpg").increment(1);

Retrieve the counter in Driver code after job is complete via method
job.getCounters().findCounter("ImageCounter","jpg").getValue();


Let take an MR example with Job Counters
Input - Below program create a Map-only MapReduce job which will use a Web server’s access log to count the number of times gifs, jpeg and other resources have been retrieved.
Input log:-
96.7.4.14 - - [24/Apr/2011:04:20:11 -0400] "GET /cat.jpg HTTP/1.1" 200 12433
96.7.4.14 - - [24/Apr/2011:04:20:11 -0400] "GET /cat.gif HTTP/1.1" 200 12433
96.7.4.10 - - [24/Apr/2011:04:20:11 -0400] "GET /cat.jpg HTTP/1.1" 200 12433
So on.

Output - It will report three figures: no. of gif requests, no. of jpeg requests, and no. of other requests. Expected result should be like
Jpg==25
gif==23
other==75


Program

ImageCounterDriver.java
public class ImageCounterDriver extends Configured implements Tool {

 @Override
 public int run(String[] args) throws Exception {
  if (args.length != 2) {
   System.out.printf("Usage: ImageCounter <input dir> <output dir>\n");
   return -1;
  }
  Job job = new Job(getConf());
  job.setJarByClass(ImageCounterDriver.class);
  job.setJobName("Image Counter");
  FileInputFormat.setInputPaths(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  job.setNumReduceTasks(0);
  job.setMapperClass(ImageCounterMapper.class);

  boolean success = job.waitForCompletion(true);
  if (success) {
   /*
   * Print out the counters that the mappers have been incrementing.
   */
   long jpg = job.getCounters().findCounter("ImageCounter","jpg").getValue();
   long gif = job.getCounters().findCounter("ImageCounter","gif").getValue();
   long other = job.getCounters().findCounter("ImageCounter","other").getValue();
   System.out.println("Jpg=="+jpg);
   System.out.println("gif=="+gif);
   System.out.println("other=="+other);
   return 0;
  } else
   return 1;
  }

 public static void main(String[] args) throws Exception {
  int exitCode = ToolRunner.run(new Configuration(), new ImageCounterDriver(), args);
  System.exit(exitCode);
 }
}




ImageCounterMapper.java
public class ImageCounterMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
 @Override
 public void map(LongWritable key, Text value, Context context)
  throws IOException, InterruptedException {
  String[] record = value.toString().split("\"");
  if(record.length > 2){
   String[] line = record[1].split(" ");
   if(line.length>1) {
    if(line[1].endsWith(".jpg")) {  //check whether the text is ending with .jpg
     context.getCounter("ImageCounter", "jpg").increment(1);
    } else if(line[1].endsWith(".gif")) {//check whether the text is ending with .gif
     context.getCounter("ImageCounter", "gif").increment(1);
    } else
     context.getCounter("ImageCounter", "other").increment(1);
   }
  }
 }
}


Edit the following command by specifying your own input & ouput directory then run the command.
$ hadoop jar logcounter.jar ImageCounterDriver testlog Output/logCounter 


CAUTION
Do not rely on a counter’s value from the Web UI while a job is running -
1) Due to possible speculative execution, a counters value could appear larger than the actual final value.
2) Modifications to counters from subsequently killed/failed tasks will be removed from final count.

To download source, click here

This post is written by
Shashank Rai - Linkedin, Google+
He is a freelance writer, loves to explore latest features in Java technology.








No comments:

Post a Comment