Wednesday, 14 December 2016

Install Apache Oozie in Hadoop single node cluster


System Requirements for oozie 4.3.0 installation:
·        Unix box
·        Java JDK 1.6+
·        Maven 3.3.9
·        Hadoop 2.7.3
·        Pig 0.16.0

Assuming java and Hadoop are already installed, lets us install maven and pig

For Maven
I have Hadoop in the /srv path, so will download everything at this path

cd srv


tar xvfz apache-maven-3.3.9-bin.tar.gz

rm apache-maven-3.3.9-bin.tar.gz

For Pig



tar xvfz pig-0.16.0.tar.gz

rm pig-0.16.0.tar.gz

For Oozie


tar xvfz oozie-4.3.0.tar.gz

rm oozie-4.3.0.tar.gz 

Now set the environment variables

vi ~/.bashrc (or you can setup in the . ~/environment.sh)

export MAVEN_HOME=/srv/apache-maven-3.3.9
export PATH=$PATH:/srv/apache-maven-3.3.9/bin




Before running oozie build command, correct the paths in the action-conf/hive.xml file under oozie/conf directory, as per your hadoop implementation path



<property>
      <name>hadoop.bin.path</name>
      <value>/srv/hadoop-2.7.3/bin</value>
   </property>

   <property>
      <name>hadoop.config.dir</name>
      <value>/srv/hadoop-2.7.3/etc/hadoop</value>
   </property>

Go to oozie directory and run below command to build supported files

cd /srv/oozie-4.3.0

$bin/mkdistro.sh –DskipTests

If you don’t  pass –DskipTests, it runs some tests for 2hrs (if you allow tests, you can see more details about the tests it runs in the folder ~/oozie-4.3.0/core/target/test-data/oozietests, and the test reports are available in the folder ~/oozie-4.3.0/core/target/surefire-reports) do not interrupt until it finishes everything.

If error pops up, run the command $mvn clean And re run the same command again. If it still errors out, try running with bin/mkdistro.sh -DskipTests -X     x is for debug, but strangely it finished in success when ran.   

 

Oozie server setup

System Requirements

  • Unix (tested in Linux and Mac OS X)
  • Java 1.6+
  • Hadoop
  • ExtJS library (optional, to enable Oozie webconsole)
The Java 1.6+ bin directory should be in the command path.

Add below properties in the Hadoop core-site.xml

Standard properties

<!-- OOZIE -->
  <property>
    <name>hadoop.proxyuser.[OOZIE_SERVER_USER].hosts</name>
    <value>[OOZIE_SERVER_HOSTNAME]</value>
  </property>
  <property>
    <name>hadoop.proxyuser.[OOZIE_SERVER_USER].groups</name>
    <value>[USER_GROUPS_THAT_ALLOW_IMPERSONATION]</value>
  </property>

In my case these properties will be 

<!-- OOZIE -->
  <property>
    <name>hadoop.proxyuser.hadoop.hosts</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hadoop.groups</name>
    <value>hadoop</value>
  </property>

Only for the lower versions: Now lets look at ExtJS, this is optional, do this only if you want web consle of oozie to monitor the jobs. You need to download extjs (a framework for oozie web console), only version 2.2 or 2.3 works for now, and can only be downloadable with url http://archive.cloudera.com/gplextras/misc/ext-2.2.zip


Create a libext folder under the oozie directory

$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0

$mkdir libext

If you have already download ExtJs zip file, move the file to this directory. If not downloaded, download with below command

$cd libext


Now copy Hadoop libraries and hcatalog libraries into libext folder of oozie.
copy all the jars from the paths
/srv/hadoop-2.7.3/share/hadoop/common/*.jar 
/srv/hadoop-2.7.3/share/hadoop/common/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/hdfs/*.jar
/srv/hadoop-2.7.3/share/hadoop/hdfs/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/mapreduce/*.jar 
/srv/hadoop-2.7.3/share/hadoop/mapreduce/lib/*.jar 
 /srv/hadoop-2.7.3/share/hadoop/yarn/*.jar  
/srv/hadoop-2.7.3/share/hadoop/yarn/lib/*.jar 
 

you can use the below unix command to copy jar files from sub directories

$cp **/*.jar /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/libext/

Now let us setup war file and sharelib with below commands.

$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0
 
$ bin/oozie-setup.sh prepare-war 
 
Before creating sharelib, we have to copy all properties in /srv/hadoop-2.7.3/etc/hadoop/core-site.xml to 
the file /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf/core-site.xml
 
Also copy yarn and mapred site.xml files to oozie conf directory
 
cp /srv/hadoop-2.7.3/etc/hadoop/yarn-site.xml /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf
cp /srv/hadoop-2.7.3/etc/hadoop/mapred-site.xml /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf And add below property to the /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/oozie-site.xml

 <property>
        <name>oozie.service.WorkflowAppService.system.libpath</name>
        <value>hdfs:///user/hadoop/share/lib/</value>
        <description>
            System library path to use for workflow applications.
            This path is added to workflow application if their job properties sets
            the property 'oozie.use.system.libpath' to true.
        </description>
    </property>
 
where hadoop is the your login user for hadoop. This way we dont face the "oozie could not locate sharelib" error.

Now run the below command
 
$bin/oozie-setup.sh  sharelib create -fs hdfs://localhost:54310 -locallib oozie-sharelib-4.3.0.tar.gz
 
Create database with below command

$bin/ooziedb.sh create -sqlfile oozie.sql -run

And run this command to start oozie daemon

$bin/oozied.sh start  

or to run as a process,

$bin/oozied.sh run

If you want to set up client node of oozie

$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0

$tar xvfz oozie-client-4.3.0.tar.gz

$cd oozie-client-4.3.0/bin

$oozie

You can access the web console of oozie, go to the url

or to check the status of oozie process, run the below command

$bin/oozie admin -oozie http://localhost:11000/oozie -status

to check available sharelibs

$bin/oozie admin -oozie http://localhost:11000/oozie -shareliblist

to see what sharelib is being used by oozie while the daemon is running,


$bin/oozie admin -sharelibupdate 

Installation of Oozie 4.3.0 is different to its lower versions, mainly prior versions will generate required hadoop libraries when build is complete. But in this version, we have to manually copy hadoop library jar files from hadoop distribution.

Also standard Oozie distribution comes with derby database which does not support multiple connections at the same time. Configuring Oozie with MySql is detailed in the below external references (by Apache) if required. 


Run the example oozie jobs and test the installation.


Firstly, navigate to the path /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/ and untar the examples file



$tar xvfz oozie-examples.tar.gz


$cd examples/apps

You see different example folders such as pig,hive,mapreduce etc.., i am taking mapreduce to explain below steps. In the job.properties files of each folder, change the name node port and job tracker port as highlighted below and add the parameter highlighted below. It should look like

From Hadoop 2.x, you must give resource manager port details to the Jobtracker property.

nameNode=hdfs://localhost:54310
jobTracker=localhost:8032
queueName=default
examplesRoot=examples

oozie.use.system.libpath=true

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/hive

Then move the examples folder to hdfs

$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/

$Hadoop fs –put examples /user/Hadoop/examples

Submit the job, here I’m submitting oozie for mapreduce job

to run the oozie example
$bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

to check status of the job
$bin/oozie job -oozie http://localhost:11000/oozie -info 0000000-161215085944975-oozie-hado-W

to check the log of the job
$bin/oozie job -oozie http://localhost:11000/oozie -log 0000000-161215085944975-oozie-hado-W

Highlighted is the job id you need to replace. You can monitor the jobs in the web console.
 

Please leave the comment for any questions or clarifications. 

External references for further information

https://oozie.apache.org/docs/4.2.0/ENG_Building.html

Possible errors while building oozie

 If you face any warnings with maven-clover2-plugin, navigate to the below link and see the latest version https://repo.maven.apache.org/maven2/com/atlassian/maven/plugins/maven-clover2-plugin/
and add the version tag in the pom.xml in the oozie directory for the property  maven-clover2-plugin

7 comments:

  1. Hi Nith,

    Nice Post on Oozie installation. Now i followed your post to install oozie, but i didn't find conf directory in oozie directory.

    you said to follow us like
    "Before running oozie build command, correct the paths in the action-conf/hive.xml file under oozie/conf directory, as per your hadoop implementation path"

    Please suggest me how to get this conf directory.

    ReplyDelete
  2. Hi I am instailling oozie on my ubantu system and I m facing this issue. could you guid me ?

    When I fire below command then I am getting this error.

    ./oozie-setup.sh sharelib create -fs hdfs://master:9000 -locallib /usr/local/oozie/oozie-4.3.0/sharelib/target/oozie-sharelib-4.3.0.tar.gz


    Error

    java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "master":9000; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost

    ReplyDelete
  3. Hello,

    Please re check the hostname of your machine, you passed "master" as hostname in the above command, it is throwing "invalid hostname" error.

    Run this command in shell to verify if you are passing right hostname.

    $hostname

    ReplyDelete
  4. Hello Nith,

    Thanks for reply.


    I am using below host name in /etc/hosts file.

    192.168.1.149 master (Master server)
    127.0.0.1 developer4
    192.168.1.161 slave (Slave server)

    I can able to check hdfs respository using hdfs://master:9000

    If I am wrong then please guide me which name I need to use.

    Thanks in advance.



    ReplyDelete
    Replies
    1. Hello Nith,

      I resolved above issue thanks for your reply.

      Delete
  5. Hello Nith,

    I have installed all hadoop tool like (Hadoop (2.7.3),Pig,Hive,Hbase(1.2.4),Sqoop1,Hue(3.11.0) and Oozie(4.3.0). I am facing one issue with running oozie jobs through command line or hue web interface.

    Error
    Exception in check(). Message[org.apache.hadoop.security.authorize.AuthorizationException: User: developer4 is not allowed to impersonate developer4]
    java.io.IOException: org.apache.hadoop.security.authorize.AuthorizationException: User: developer4 is not allowed to impersonate developer4

    I had set prozxyuser in core-site.xml in etc/hadoop/core-site.xml


    hadoop.proxyuser.developer4.hosts
    *



    hadoop.proxyuser.developer4.groups
    *


    Let me know what I am doing wrong. I am facing this issue from last two days. your help will be appreciate.

    Thanks in advance.

    Let me know if you need more information.

    ReplyDelete
  6. Hello,

    I am getting error 500 internal server error when i run map-reduce job. and on web console i got 500 null pointer exception.
    hadoop version: 2.7.1
    oozie version: 4.3.0
    OS: Ubuntu

    Please help.

    ReplyDelete