Cloud

Benchmarks for Hadoop 2.0

Back to Hadoop section

TestDFSIO: IO

  • Write
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-
2.2.0.2.0.6.0-101-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 1000 
-resFile write.txt 

=> Write 10 files of 1000 MB = 10 GB.

  • Read
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-
2.2.0.2.0.6.0-101-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000 
-resFile read.txt 
  • Clean : remove data from HDFS
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-
2.2.0.2.0.6.0-101-tests.jar TestDFSIO -clean 

TeraSort: IO + MR

  • TeraGen
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-
2.2.0.2.0.6.0-101.jar teragen 1000000000 /user/hadoop/terasort-input 

=> Write 100 GB of data/datanode.
Options:
-D mapreduce.job.maps=30 => number of map tasks

  • TeraSort
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-
2.2.0.2.0.6.0-101.jar terasort /user/hadoop/terasort-input /user/hadoop/terasort-output

Options:
-D mapreduce.job.reduces=15

  • TeraValidate
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-
2.2.0.2.0.6.0-101.jar teravalidate /user/hadoop/terasort-output /user/hadoop/terasort-validate

=> Validate the sort.