Setting Up Apache Oozie Using the Command Line
Configuring Oozie after a New Installation
When you install Oozie from an RPM or Debian package, Oozie server creates all configuration, documentation, and runtime files in the standard Linux directories, as follows.
Type of File | Where Installed |
---|---|
binaries |
/usr/lib/oozie/ |
configuration |
/etc/oozie/conf/ |
documentation |
|
examples TAR.GZ |
|
sharelib |
/usr/lib/oozie/ |
data |
/var/lib/oozie/ |
logs |
/var/log/oozie/ |
temp |
/var/tmp/oozie/ |
PID file |
/var/run/oozie/ |
Deciding Which Database to Use
- Derby runs in embedded mode and it is not possible to monitor its health.
- Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
- Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen with server-based databases.
Configuring Oozie to Use PostgreSQL
Use the procedure that follows to configure Oozie to use PostgreSQL instead of Apache Derby.
Create the Oozie User and Oozie Database
For example, using the PostgreSQL psql command-line tool:
$ psql -U postgres Password for user postgres: ***** postgres=# CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'oozie' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE; CREATE ROLE postgres=# CREATE DATABASE "oozie" WITH OWNER = oozie ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' CONNECTION LIMIT = -1; CREATE DATABASE postgres=# \q
Configure PostgreSQL to Accept Network Connections for the Oozie User
- Edit the postgresql.conf file and set the listen_addresses property to *, to make sure that the PostgreSQL server starts listening on all your network interfaces. Also make sure that the standard_conforming_strings property is set to off.
- Edit the PostgreSQL data/pg_hba.conf file as follows:
host oozie oozie 0.0.0.0/0 md5
Configure Oozie to Use PostgreSQL
Edit the oozie-site.xml file as follows:
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>org.postgresql.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:postgresql://localhost:5432/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Configuring Oozie to Use MariaDB
Continue reading:
Use the procedure that follows to configure Oozie to use MariaDB instead of Apache Derby.
Create the Oozie Database and Oozie MariaDB User
For example, using the MariaDB mysql command-line tool:
$ mysql -u root -p Enter password: MariaDB [(none)]> create database oozie default character set utf8; Query OK, 1 row affected (0.00 sec) MariaDB [(none)]> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> exit Bye
Configure Oozie to Use MariaDB
Edit properties in the oozie-site.xml file as follows:
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Configuring Oozie to Use MySQL
Use the procedure that follows to configure Oozie to use MySQL instead of Apache Derby.
Create the Oozie Database and Oozie MySQL User
For example, using the MySQL mysql command-line tool:
$ mysql -u root -p Enter password: mysql> create database oozie default character set utf8; Query OK, 1 row affected (0.00 sec) mysql> grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie'; Query OK, 0 rows affected (0.00 sec) mysql> grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie'; Query OK, 0 rows affected (0.00 sec) mysql> exit Bye
Configure Oozie to Use MySQL
Edit properties in the oozie-site.xml file as follows:
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Configuring Oozie to use Oracle
Use the procedure that follows to configure Oozie to use Oracle 11g instead of Apache Derby.
Create the Oozie Oracle User and Grant Privileges
The following example uses the Oracle sqlplus command-line tool, and shows the privileges Cloudera recommends. Oozie needs CREATE SESSION to start and manage workflows. The additional roles are needed for creating and upgrading the Oozie database.
$ sqlplus system@localhost Enter password: ****** SQL> create user oozie identified by oozie default tablespace users temporary tablespace temp; User created. SQL> grant alter index to oozie; grant alter table to oozie; grant create index to oozie; grant create sequence to oozie; grant create session to oozie; grant create table to oozie; grant drop sequence to oozie; grant select dictionary to oozie; grant drop table to oozie; alter user oozie quota unlimited on users; alter user oozie quota unlimited on system; SQL> exit $
Configure Oozie to Use Oracle
Edit the oozie-site.xml file as follows.
... <property> <name>oozie.service.JPAService.jdbc.driver</name> <value>oracle.jdbc.OracleDriver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:oracle:thin:@//myhost:1521/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property> ...
Creating the Oozie Database Schema
The Oozie database tool works in two modes: it can create the database, or it can produce an SQL script that a database administrator can run to create the database manually. If you use the tool to create the database schema, you must have the permissions needed to execute DDL operations.
As the oozie Unix user, run the Oozie database tool against the database:
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
You should see output such as the following (the output of the script may differ slightly depending on your database vendor):
Validate DB Connection. DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE DONE Create OOZIE_SYS table DONE Oozie DB has been created for Oozie version '4.0.0-cdh5.0.0' The SQL commands have been written to: /tmp/ooziedb-5737263881793872034.sql
As the oozie Unix user, generate the create script:
In a terminal window, run:
/usr/lib/oozie/bin/ooziedb.sh create -sqlfile SCRIPT
For example:
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -sqlfile oozie-create.sql
You should see output such as the following (the output of the script may differ slightly depending on your database vendor):
Validate DB Connection. DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE DONE Create OOZIE_SYS table DONE Oozie DB has been created for Oozie version '4.0.0-cdh5.0.0' The SQL commands have been written to: oozie-create.sql WARN: The SQL commands have NOT been executed, you must use the '-run' option
Enabling the Oozie Web Console
To enable the Oozie web console, download and add the ExtJS library to the Oozie server.
Step 1: Download the Library
Download the ExtJS version 2.2 library from https://archive.cloudera.com/gplextras/misc/ext-2.2.zip and place it a convenient location.
Step 2: Install the Library
Copy the ext-2.2.zip file into /usr/lib/oozie/embedded-oozie-server/webapp.
Step 3: Configure SPNEGO authentication (in Kerberos clusters only)
The web console shares a port with the Oozie REST API, and the API allows modifications of Oozie jobs (kill, submission, and inspection). SPNEGO authentication ensures that the Kerberos realm trusts the client browser credentials and that configuration of the client web browser passes these credentials. If this configuration is not possible, use the Hue Oozie Dashboard instead of the Oozie Web Console.
See How to Configure Browsers for Kerberos Authentication and Configuring a Dedicated MIT KDC for Cross-Realm Trust.
Configuring Oozie with Kerberos Security
To configure Oozie with Kerberos security, see Configuring Oozie Authentication.
Installing the Oozie Shared Library in Hadoop HDFS
The Oozie installation includes the shared library for YARN (oozie-sharelib-yarn), which contains all of the JARs required to enable workflow jobs to run streaming, DistCp, Pig, Hive, and Sqoop actions.
To install the Oozie shared library in Hadoop HDFS in the oozie user home directory
$ sudo -u hdfs hadoop fs -mkdir /user/oozie $ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie $ sudo oozie-setup sharelib create -fs <FS_URI> -locallib /usr/lib/oozie/oozie-sharelib-yarn
where FS_URI is the HDFS URI of the filesystem that the shared library should be installed on. For example: hdfs://<HOST>:<PORT>.
Configuring Oozie Support for MapReduce Uber JARs
An uber JAR is a JAR that contains other JARs with dependencies in a lib/ folder inside the JAR.
You can configure the cluster to handle uber JARs properly for the MapReduce action (as long as it does not include any streaming) by setting the following property in the oozie-site.xml file:
... <property> <name>oozie.action.mapreduce.uber.jar.enable</name> <value>true</value> ...
When this property is set, users can use the oozie.mapreduce.uber.jar configuration property in their MapReduce workflows to notify Oozie that the specified JAR file is an uber JAR.