## Monday, July 15, 2013

Importance of effective testing strategy can be understand from the fact that normally developer spend more time in testing and debugging than actual coding. So is for hadoop, But hadoop or any distributed system like R, MPI even Multi-Threaded application comes with it's own problems. Because in real time we can not capture state of distributed environment. This makes us to develop even more powerful testing. But here we will talk only about hadoop.

ClusterMapReduceTestCase:  Hadoop community is putting lot of effort to approach this very problem by developing MiniMRCluster and MiniDFSCluster for testing purpose. This allow developer to test all aspects of hadoop applications like MapReduce, Yarn application, HBase, etc... Just like real environment. But it has one big disadvantage it takes about about 10 odd second to setup test cluster, and may discourage many developer to test all functionality.

MRUnit:  To address this cloudera developed MRUnit framework which now apache TLP(Top Level Project) with version 1.0.0.  The MRUnit bridges gap between MapReduce and JUnit testing by providing interfaces to test MapReduce jobs.  We can use functionality provided by MRUnit in JUnit test and test Mapper,  Reducer, Driver and Combiner. With PipelineMapreduceDriver we can test workflow of series of MapReduce Jobs. Currently it does not allows to test partitioner. Also it is not efficient to test whole of infrastructure of MaprReduce Jobs, Since job may contain custom Input/Output Formats, Custom Record Reader/Writer, Data serialization etc...

LocalJobRunner:   Hadoop comes with LocalJobRunner that runs on single JVM and allows to test complete Hadoop stack.

Mockito:   Above all frameworks approach  to test different component by creating test environment, So it makes testing very slow. Most test are functional test or end-to-end test, So for efficiency we can use Mocking framework like Mockito, to mocking out functional components of MapReduce Jobs.