Wednesday, June 19, 2013

Hadoop load Native libraries

How to load Native third party libraries in Hadoop?

Some time we might need some native libraries for performance gain in java. Problem with native libraries is they are platform dependent. In java we can do it either by System.load() or System.loadLibrary().  But hadoop is run on heterogeneous cluster, means we might have many plaforms(Linux, MacOSX, etc...).  And  the basic question is how to overcome this?

Answer is:
You can load any native shared library using DistributedCache for distributing and symlinking the library files.
This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.
  1. First copy the library to the HDFS: bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1
  2. The job launching program should contain the following: DistributedCache.createSymlink(conf); DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);
  3. The MapReduce task can contain: System.loadLibrary("mylib.so");
Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.  

No comments:

Post a Comment