Compiling Hadoop example MaxTemperature.java

I’m working through some of the examples in this Hadoop book. I’m a little rusty on compiling java programs and had a little trouble with this one so I’m documenting it here for anyone else how might be having issues.

Firstly, I tried compiling the examples like this;

javac MaxTemperature.java

That wasn’t too successful;

MaxTemperature.java:3: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.Path;
                           ^
MaxTemperature.java:4: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.IntWritable;
                           ^
MaxTemperature.java:5: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
                           ^
MaxTemperature.java:6: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Job;
                                  ^
MaxTemperature.java:7: error: package org.apache.hadoop.mapreduce.lib.input does not exist
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
                                            ^
MaxTemperature.java:8: error: package org.apache.hadoop.mapreduce.lib.output does not exist
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
                                             ^
MaxTemperature.java:18: error: cannot find symbol
    Job job = new Job();
    ^
  symbol: class Job
  location: class MaxTemperature
MaxTemperature.java:18: error: cannot find symbol
    Job job = new Job();
                  ^
  symbol: class Job
  location: class MaxTemperature
MaxTemperature.java:22: error: cannot find symbol
    FileInputFormat.addInputPath(job, new Path(args[0]));
                                          ^
  symbol: class Path
  location: class MaxTemperature
MaxTemperature.java:22: error: cannot find symbol
    FileInputFormat.addInputPath(job, new Path(args[0]));
    ^
  symbol: variable FileInputFormat
  location: class MaxTemperature
MaxTemperature.java:23: error: cannot find symbol
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
                                            ^
  symbol: class Path
  location: class MaxTemperature
MaxTemperature.java:23: error: cannot find symbol
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    ^
  symbol: variable FileOutputFormat
  location: class MaxTemperature
MaxTemperature.java:28: error: cannot find symbol
    job.setOutputKeyClass(Text.class);
                          ^
  symbol: class Text
  location: class MaxTemperature
MaxTemperature.java:29: error: cannot find symbol
    job.setOutputValueClass(IntWritable.class);
                            ^
  symbol: class IntWritable
  location: class MaxTemperature
14 errors

After a little messing about I found the correct procedure. When executing these commands you must be in the MaxTemperature project directory. First compile the MaxTemperatureMapper.java file. The classpath should contain the path to the hadoop-core-1.0.4.jar file.

javac -verbose -classpath /home/rhys/hadoop-1.0.4/hadoop-core-1.0.4.jar MaxTemperatureMapper.java

Next we can compile the MaxTemperature.java file. This time the classpath contain the path to the hadoop-core-1.0.4.jar file as well as the MaxTemperatire project directory where we compiled MaxTemperatureMapper.java

javac -classpath /home/rhys/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/rhys/Downloads/hadoop-book-master/ch02/src/main/java  MaxTemperature.java

That should compile, if so we can then run the job with the provided sample data;

hadoop MaxTemperature ../../../../input/ncdc/sample.txt output

You should see output similar to below;

13/01/27 15:08:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/01/27 15:08:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/01/27 15:08:16 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/01/27 15:08:16 INFO input.FileInputFormat: Total input paths to process : 1
13/01/27 15:08:16 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/27 15:08:17 INFO mapred.JobClient: Running job: job_local_0001
13/01/27 15:08:18 INFO util.ProcessTree: setsid exited with exit code 0
13/01/27 15:08:18 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@71780051
13/01/27 15:08:18 INFO mapred.MapTask: io.sort.mb = 100
13/01/27 15:08:19 INFO mapred.JobClient:  map 0% reduce 0%
13/01/27 15:08:20 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/27 15:08:20 INFO mapred.MapTask: record buffer = 262144/327680
13/01/27 15:08:20 INFO mapred.MapTask: Starting flush of map output
13/01/27 15:08:20 INFO mapred.MapTask: Finished spill 0
13/01/27 15:08:20 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/01/27 15:08:21 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@114f6322
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Merger: Merging 1 sorted segments
13/01/27 15:08:21 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 57 bytes
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/01/27 15:08:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output
13/01/27 15:08:22 INFO mapred.JobClient:  map 100% reduce 0%
13/01/27 15:08:24 INFO mapred.LocalJobRunner: reduce > reduce
13/01/27 15:08:24 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/01/27 15:08:25 INFO mapred.JobClient:  map 100% reduce 100%
13/01/27 15:08:25 INFO mapred.JobClient: Job complete: job_local_0001
13/01/27 15:08:25 INFO mapred.JobClient: Counters: 20
13/01/27 15:08:25 INFO mapred.JobClient:   File Output Format Counters
13/01/27 15:08:25 INFO mapred.JobClient:     Bytes Written=29
13/01/27 15:08:25 INFO mapred.JobClient:   FileSystemCounters
13/01/27 15:08:25 INFO mapred.JobClient:     FILE_BYTES_READ=1493
13/01/27 15:08:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=63627
13/01/27 15:08:25 INFO mapred.JobClient:   File Input Format Counters
13/01/27 15:08:25 INFO mapred.JobClient:     Bytes Read=529
13/01/27 15:08:25 INFO mapred.JobClient:   Map-Reduce Framework
13/01/27 15:08:25 INFO mapred.JobClient:     Reduce input groups=2
13/01/27 15:08:25 INFO mapred.JobClient:     Map output materialized bytes=61
13/01/27 15:08:25 INFO mapred.JobClient:     Combine output records=0
13/01/27 15:08:25 INFO mapred.JobClient:     Map input records=5
13/01/27 15:08:25 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/01/27 15:08:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/01/27 15:08:25 INFO mapred.JobClient:     Reduce output records=2
13/01/27 15:08:25 INFO mapred.JobClient:     Spilled Records=10
13/01/27 15:08:25 INFO mapred.JobClient:     Map output bytes=45
13/01/27 15:08:25 INFO mapred.JobClient:     CPU time spent (ms)=0
13/01/27 15:08:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=230694912
13/01/27 15:08:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/01/27 15:08:25 INFO mapred.JobClient:     Combine input records=0
13/01/27 15:08:25 INFO mapred.JobClient:     Map output records=5
13/01/27 15:08:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=131
13/01/27 15:08:25 INFO mapred.JobClient:     Reduce input records=5