Compiling Hadoop example MaxTemperature.java
I’m working through some of the examples in this Hadoop book. I’m a little rusty on compiling java programs and had a little trouble with this one so I’m documenting it here for anyone else how might be having issues.
Firstly, I tried compiling the examples like this;
javac MaxTemperature.java
That wasn’t too successful;
MaxTemperature.java:3: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.Path;
^
MaxTemperature.java:4: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.IntWritable;
^
MaxTemperature.java:5: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
^
MaxTemperature.java:6: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Job;
^
MaxTemperature.java:7: error: package org.apache.hadoop.mapreduce.lib.input does not exist
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
^
MaxTemperature.java:8: error: package org.apache.hadoop.mapreduce.lib.output does not exist
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
^
MaxTemperature.java:18: error: cannot find symbol
Job job = new Job();
^
symbol: class Job
location: class MaxTemperature
MaxTemperature.java:18: error: cannot find symbol
Job job = new Job();
^
symbol: class Job
location: class MaxTemperature
MaxTemperature.java:22: error: cannot find symbol
FileInputFormat.addInputPath(job, new Path(args[0]));
^
symbol: class Path
location: class MaxTemperature
MaxTemperature.java:22: error: cannot find symbol
FileInputFormat.addInputPath(job, new Path(args[0]));
^
symbol: variable FileInputFormat
location: class MaxTemperature
MaxTemperature.java:23: error: cannot find symbol
FileOutputFormat.setOutputPath(job, new Path(args[1]));
^
symbol: class Path
location: class MaxTemperature
MaxTemperature.java:23: error: cannot find symbol
FileOutputFormat.setOutputPath(job, new Path(args[1]));
^
symbol: variable FileOutputFormat
location: class MaxTemperature
MaxTemperature.java:28: error: cannot find symbol
job.setOutputKeyClass(Text.class);
^
symbol: class Text
location: class MaxTemperature
MaxTemperature.java:29: error: cannot find symbol
job.setOutputValueClass(IntWritable.class);
^
symbol: class IntWritable
location: class MaxTemperature
14 errors
After a little messing about I found the correct procedure. When executing these commands you must be in the MaxTemperature project directory. First compile the MaxTemperatureMapper.java file. The classpath should contain the path to the hadoop-core-1.0.4.jar file.
javac -verbose -classpath /home/rhys/hadoop-1.0.4/hadoop-core-1.0.4.jar MaxTemperatureMapper.java
Next we can compile the MaxTemperature.java file. This time the classpath contain the path to the hadoop-core-1.0.4.jar file as well as the MaxTemperatire project directory where we compiled MaxTemperatureMapper.java
javac -classpath /home/rhys/hadoop-1.0.4/hadoop-core-1.0.4.jar:/home/rhys/Downloads/hadoop-book-master/ch02/src/main/java MaxTemperature.java
That should compile, if so we can then run the job with the provided sample data;
hadoop MaxTemperature ../../../../input/ncdc/sample.txt output
You should see output similar to below;
13/01/27 15:08:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/01/27 15:08:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/01/27 15:08:16 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/01/27 15:08:16 INFO input.FileInputFormat: Total input paths to process : 1
13/01/27 15:08:16 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/27 15:08:17 INFO mapred.JobClient: Running job: job_local_0001
13/01/27 15:08:18 INFO util.ProcessTree: setsid exited with exit code 0
13/01/27 15:08:18 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@71780051
13/01/27 15:08:18 INFO mapred.MapTask: io.sort.mb = 100
13/01/27 15:08:19 INFO mapred.JobClient: map 0% reduce 0%
13/01/27 15:08:20 INFO mapred.MapTask: data buffer = 79691776/99614720
13/01/27 15:08:20 INFO mapred.MapTask: record buffer = 262144/327680
13/01/27 15:08:20 INFO mapred.MapTask: Starting flush of map output
13/01/27 15:08:20 INFO mapred.MapTask: Finished spill 0
13/01/27 15:08:20 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/01/27 15:08:21 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@114f6322
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Merger: Merging 1 sorted segments
13/01/27 15:08:21 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 57 bytes
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/01/27 15:08:21 INFO mapred.LocalJobRunner:
13/01/27 15:08:21 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
13/01/27 15:08:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output
13/01/27 15:08:22 INFO mapred.JobClient: map 100% reduce 0%
13/01/27 15:08:24 INFO mapred.LocalJobRunner: reduce > reduce
13/01/27 15:08:24 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
13/01/27 15:08:25 INFO mapred.JobClient: map 100% reduce 100%
13/01/27 15:08:25 INFO mapred.JobClient: Job complete: job_local_0001
13/01/27 15:08:25 INFO mapred.JobClient: Counters: 20
13/01/27 15:08:25 INFO mapred.JobClient: File Output Format Counters
13/01/27 15:08:25 INFO mapred.JobClient: Bytes Written=29
13/01/27 15:08:25 INFO mapred.JobClient: FileSystemCounters
13/01/27 15:08:25 INFO mapred.JobClient: FILE_BYTES_READ=1493
13/01/27 15:08:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=63627
13/01/27 15:08:25 INFO mapred.JobClient: File Input Format Counters
13/01/27 15:08:25 INFO mapred.JobClient: Bytes Read=529
13/01/27 15:08:25 INFO mapred.JobClient: Map-Reduce Framework
13/01/27 15:08:25 INFO mapred.JobClient: Reduce input groups=2
13/01/27 15:08:25 INFO mapred.JobClient: Map output materialized bytes=61
13/01/27 15:08:25 INFO mapred.JobClient: Combine output records=0
13/01/27 15:08:25 INFO mapred.JobClient: Map input records=5
13/01/27 15:08:25 INFO mapred.JobClient: Reduce shuffle bytes=0
13/01/27 15:08:25 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/01/27 15:08:25 INFO mapred.JobClient: Reduce output records=2
13/01/27 15:08:25 INFO mapred.JobClient: Spilled Records=10
13/01/27 15:08:25 INFO mapred.JobClient: Map output bytes=45
13/01/27 15:08:25 INFO mapred.JobClient: CPU time spent (ms)=0
13/01/27 15:08:25 INFO mapred.JobClient: Total committed heap usage (bytes)=230694912
13/01/27 15:08:25 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/01/27 15:08:25 INFO mapred.JobClient: Combine input records=0
13/01/27 15:08:25 INFO mapred.JobClient: Map output records=5
13/01/27 15:08:25 INFO mapred.JobClient: SPLIT_RAW_BYTES=131
13/01/27 15:08:25 INFO mapred.JobClient: Reduce input records=5