As we see programmer always create utils libraries to be used with all relevant projects, To reduce
Development time. But Hadoop ecosystem where computation is carried out on cluster of N-number of host. we have following options to load those libs to be used by our application:
   
Development time. But Hadoop ecosystem where computation is carried out on cluster of N-number of host. we have following options to load those libs to be used by our application:
- We can copy utils libraries to some specific location on all hosts, but It is not possible to copy such utils libraries to all hosts to run MapReduce task that will make maintenance too cumbersome. Or
- We can embed utils libraries in application itself, That causes increase in size of application and also maintaining library versioning. Or
- We can copy utils libraries in DtributedCache and utilize them. This approach will overcome isses in above mentioned method.
Now how can we accomplish this with 3rd method :
STEP  1:
hdfs dfs -copyFromLocal YourLib.jar /lib/YourLib·jarSTEP 2: Now add cached library to apllication by
                  DistributedCache.addFileToClassPath(new Path("/lib/YourLib.jar.jar"), job); 
That's it.
Sample Application to demonstrate : MatrixTranspose 
 
No comments:
Post a Comment