System Requirements for oozie 4.3.0 installation:
· Unix box
· Java JDK 1.6+
· Maven 3.3.9
· Hadoop 2.7.3
· Pig 0.16.0
Assuming java and Hadoop are already installed, lets us install maven and pig
For Maven
I have Hadoop in the /srv path, so will download everything at this path
cd srv
tar xvfz apache-maven-3.3.9-bin.tar.gz
rm apache-maven-3.3.9-bin.tar.gz
For Pig
tar xvfz pig-0.16.0.tar.gz
rm pig-0.16.0.tar.gz
For Oozie
tar xvfz oozie-4.3.0.tar.gz
rm oozie-4.3.0.tar.gz
Now set the environment variables
vi ~/.bashrc (or you can setup in the . ~/environment.sh)
export MAVEN_HOME=/srv/apache-maven-3.3.9
export PATH=$PATH:/srv/apache-maven-3.3.9/bin
Before running oozie build command, correct the paths in the action-conf/hive.xml file under oozie/conf directory, as per your hadoop implementation path
<property>
<name>hadoop.bin.path</name>
<value>/srv/hadoop-2.7.3/bin</value>
</property>
<property>
<name>hadoop.config.dir</name>
<value>/srv/hadoop-2.7.3/etc/hadoop</value>
</property>
Go to oozie directory and run below command to build supported files
cd /srv/oozie-4.3.0
$bin/mkdistro.sh –DskipTests
If you don’t pass –DskipTests, it runs some tests for 2hrs (if you allow tests, you can see more details about the tests it runs in the folder ~/oozie-4.3.0/core/target/test-data/oozietests, and the test reports are available in the folder ~/oozie-4.3.0/core/target/surefire-reports) do not interrupt until it finishes everything.
If error pops up, run the command $mvn clean And re run the same command again. If it still errors out, try running with bin/mkdistro.sh -DskipTests -X x is for debug, but strangely it finished in success when ran.
Oozie server setup
System Requirements
- Unix (tested in Linux and Mac OS X)
- Java 1.6+
- Hadoop
- Apache Hadoop (tested on 2.7.6)
- ExtJS library (optional, to enable Oozie webconsole)
The Java 1.6+ bin directory should be in the command path.
Add below properties in the Hadoop core-site.xml
Standard properties
<!-- OOZIE -->
<property>
<name>hadoop.proxyuser.[OOZIE_SERVER_USER].hosts</name>
<value>[OOZIE_SERVER_HOSTNAME]</value>
</property>
<property>
<name>hadoop.proxyuser.[OOZIE_SERVER_USER].groups</name>
<value>[USER_GROUPS_THAT_ALLOW_IMPERSONATION]</value>
</property>
In my case these properties will be
<!-- OOZIE -->
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>localhost</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hadoop</value>
</property>
Only for the lower versions: Now lets look at ExtJS, this is optional, do this only if you want web consle of oozie to monitor the jobs. You need to download extjs (a framework for oozie web console), only version 2.2 or 2.3 works for now, and can only be downloadable with url http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
Create a libext folder under the oozie directory
$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0
$mkdir libext
If you have already download ExtJs zip file, move the file to this directory. If not downloaded, download with below command
$cd libext
Now copy Hadoop libraries and hcatalog libraries into libext folder of oozie.
copy all the jars from the paths
/srv/hadoop-2.7.3/share/hadoop/common/*.jar
/srv/hadoop-2.7.3/share/hadoop/common/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/hdfs/*.jar
/srv/hadoop-2.7.3/share/hadoop/hdfs/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/mapreduce/*.jar
/srv/hadoop-2.7.3/share/hadoop/mapreduce/lib/*.jar
/srv/hadoop-2.7.3/share/hadoop/yarn/*.jar
/srv/hadoop-2.7.3/share/hadoop/yarn/lib/*.jar
you can use the below unix command to copy jar files from sub directories
$cp **/*.jar /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/libext/
Now let us setup war file and sharelib with below commands.
$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0
$ bin/oozie-setup.sh prepare-war
Before creating sharelib, we have to copy all properties in /srv/hadoop-2.7.3/etc/hadoop/core-site.xml to
the file /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf/core-site.xml
Also copy yarn and mapred site.xml files to oozie conf directory
cp /srv/hadoop-2.7.3/etc/hadoop/yarn-site.xml /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-confcp /srv/hadoop-2.7.3/etc/hadoop/mapred-site.xml /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/hadoop-conf And add below property to the /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/conf/oozie-site.xml
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>hdfs:///user/hadoop/share/lib/</value>
<description>
System library path to use for workflow applications.
This path is added to workflow application if their job properties sets
the property 'oozie.use.system.libpath' to true.
</description>
</property>
where hadoop is the your login user for hadoop. This way we dont face the "oozie could not locate sharelib" error.
Now run the below command
$bin/oozie-setup.sh sharelib create -fs hdfs://localhost:54310 -locallib oozie-sharelib-4.3.0.tar.gz
Create database with below command
$bin/ooziedb.sh create -sqlfile oozie.sql -run
And run this command to start oozie daemon
$bin/oozied.sh start
or to run as a process,
$bin/oozied.sh run
If you want to set up client node of oozie
$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0
$tar xvfz oozie-client-4.3.0.tar.gz
$cd oozie-client-4.3.0/bin
$oozie
You can access the web console of oozie, go to the url
or to check the status of oozie process, run the below command
$bin/oozie admin -oozie http://localhost:11000/oozie -status
to check available sharelibs
$bin/oozie admin -oozie http://localhost:11000/oozie -shareliblist
to see what sharelib is being used by oozie while the daemon is running,
$bin/oozie admin -sharelibupdate
to check available sharelibs
$bin/oozie admin -oozie http://localhost:11000/oozie -shareliblist
to see what sharelib is being used by oozie while the daemon is running,
$bin/oozie admin -sharelibupdate
Installation of Oozie 4.3.0 is different to its lower versions, mainly prior versions will generate required hadoop libraries when build is complete. But in this version, we have to manually copy hadoop library jar files from hadoop distribution.
Also standard Oozie distribution comes with derby database which does not support multiple connections at the same time. Configuring Oozie with MySql is detailed in the below external references (by Apache) if required.
From Hadoop 2.x, you must give resource manager port details to the Jobtracker property.
Run the example oozie jobs and test the installation.
Firstly, navigate to the path /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/
and untar the examples file
$tar xvfz oozie-examples.tar.gz
$cd examples/apps
You see different example folders such as pig,hive,mapreduce etc.., i am taking mapreduce to explain below steps. In the job.properties files of each folder, change the name
node port and job tracker port as highlighted below and add the parameter
highlighted below. It should look like
From Hadoop 2.x, you must give resource manager port details to the Jobtracker property.
nameNode=hdfs://localhost:54310
jobTracker=localhost:8032
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/hive
Then move the examples folder to hdfs
$cd /srv/oozie-4.3.0/distro/target/oozie-4.3.0-distro/oozie-4.3.0/
$Hadoop fs –put examples /user/Hadoop/examples
Submit the job, here I’m submitting oozie for mapreduce job
to run the oozie example
$bin/oozie job -oozie http://localhost:11000/oozie -config
examples/apps/map-reduce/job.properties -run
to check status of the job
$bin/oozie job -oozie http://localhost:11000/oozie -info
0000000-161215085944975-oozie-hado-W
to check the log of the job
$bin/oozie job -oozie http://localhost:11000/oozie -log
0000000-161215085944975-oozie-hado-W
Highlighted is the job id you need to replace. You can
monitor the jobs in the web console.
Please leave the comment for any questions or clarifications.
External references for further information
https://oozie.apache.org/docs/4.2.0/ENG_Building.html
Possible errors while building oozie
If you face any warnings with maven-clover2-plugin, navigate to the below link and see the latest version https://repo.maven.apache.org/maven2/com/atlassian/maven/plugins/maven-clover2-plugin/
and add the version tag in the pom.xml in the oozie directory for the property maven-clover2-plugin
Hi Nith,
ReplyDeleteNice Post on Oozie installation. Now i followed your post to install oozie, but i didn't find conf directory in oozie directory.
you said to follow us like
"Before running oozie build command, correct the paths in the action-conf/hive.xml file under oozie/conf directory, as per your hadoop implementation path"
Please suggest me how to get this conf directory.
Hi I am instailling oozie on my ubantu system and I m facing this issue. could you guid me ?
ReplyDeleteWhen I fire below command then I am getting this error.
./oozie-setup.sh sharelib create -fs hdfs://master:9000 -locallib /usr/local/oozie/oozie-4.3.0/sharelib/target/oozie-sharelib-4.3.0.tar.gz
Error
java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "master":9000; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost
Hello,
ReplyDeletePlease re check the hostname of your machine, you passed "master" as hostname in the above command, it is throwing "invalid hostname" error.
Run this command in shell to verify if you are passing right hostname.
$hostname
Hello Nith,
ReplyDeleteThanks for reply.
I am using below host name in /etc/hosts file.
192.168.1.149 master (Master server)
127.0.0.1 developer4
192.168.1.161 slave (Slave server)
I can able to check hdfs respository using hdfs://master:9000
If I am wrong then please guide me which name I need to use.
Thanks in advance.
Hello Nith,
DeleteI resolved above issue thanks for your reply.
Hello Nith,
ReplyDeleteI have installed all hadoop tool like (Hadoop (2.7.3),Pig,Hive,Hbase(1.2.4),Sqoop1,Hue(3.11.0) and Oozie(4.3.0). I am facing one issue with running oozie jobs through command line or hue web interface.
Error
Exception in check(). Message[org.apache.hadoop.security.authorize.AuthorizationException: User: developer4 is not allowed to impersonate developer4]
java.io.IOException: org.apache.hadoop.security.authorize.AuthorizationException: User: developer4 is not allowed to impersonate developer4
I had set prozxyuser in core-site.xml in etc/hadoop/core-site.xml
hadoop.proxyuser.developer4.hosts
*
hadoop.proxyuser.developer4.groups
*
Let me know what I am doing wrong. I am facing this issue from last two days. your help will be appreciate.
Thanks in advance.
Let me know if you need more information.
Hello,
ReplyDeleteI am getting error 500 internal server error when i run map-reduce job. and on web console i got 500 null pointer exception.
hadoop version: 2.7.1
oozie version: 4.3.0
OS: Ubuntu
Please help.