Installation Path C - Manual Installation Using Cloudera Manager Tarballs
Before proceeding with this path for a new installation, review Cloudera Manager Deployment. If you are upgrading an existing Cloudera Manager installation, see Cloudera Upgrade Overview.
In this procedure, you install the Oracle JDK, Cloudera Manager Server, and Cloudera Manager Agent software as tarballs and use Cloudera Manager to automate installation of CDH and managed service software as parcels. For a full discussion of deployment options, see Installing Cloudera Manager and CDH.
To avoid using system packages, and to use tarballs and parcels instead, follow the instructions in this section.
- Before You Begin
- Install the Cloudera Manager Server and Agents
- Create Parcel Directories
- Start the Cloudera Manager Server
- Start the Cloudera Manager Agents
- Install Package Dependencies
- Start and Log into the Cloudera Manager Admin Console
- Choose Cloudera Manager Edition
- Choose Cloudera Manager Hosts
- Install CDH and Managed Service Software
- Add Services
- Configure Database Settings
- Review Configuration Changes and Start Services
- (Optional) Change the Cloudera Manager User
- Change the Default Administrator Password
- Configure Oozie Data Purge Settings
- Test the Installation
Before You Begin
Install and Configure External Databases
Read Cloudera Manager and Managed Service Datastores. Install and configure an external database for services or Cloudera Management Service roles using the instructions in External Databases for Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Hue Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
Cloudera Manager also requires a database. Prepare the Cloudera Manager Server database as described in Preparing a Cloudera Manager Server External Database.
On CentOS 5 and RHEL 5, Install Python 2.6/2.7 and psycopg2 for Hue
Hue in CDH 5 only works with the operating system's native version of Python when that version is 2.6 and higher.
CentOS/RHEL 5 ships with Python 2.4 so you must install Python 2.6 (or Python 2.7) and the Python-PostgreSQL Database Adapter, psycopg2 (not psycopg).
## Navigate to Hue within your specific CDH parcel version cd /opt/cloudera/parcels/`ls -l /opt/cloudera/parcels | grep CDH | tail -1 | awk '{print $9}'`/lib/hue/build/env/bin ./python2.6 >>>> import psycopg2
or …
cd /opt/cloudera/parcels/`ls -l /opt/cloudera/parcels | grep CDH | tail -1 | awk '{print $9}'`/lib/hue/build/env/lib/python2.6/site-packages/ ln -s /usr/lib64/python2.6/site-packages/psycopg2 psycopg2
Install the Cloudera Manager Server and Agents
To install the Cloudera Manager Server and Agents, you download and extract tarballs, create users, and configure the server and agents.
Download and Extract Tarballs
- Download tarballs from the locations listed in Cloudera Manager Version and Download Information.
- Copy the tarballs and unpack them on all hosts on which you intend to install Cloudera Manager Server and Cloudera Manager Agents, in a directory you choose. You can create a new
directory to accommodate the files you extract from the tarball. For example, if /opt/cloudera-manager does not exist, create it using a command similar to:
$ sudo mkdir /opt/cloudera-manager
- Extract the contents of the tarball to the selected directory. For example, to copy a tar file to your home directory and extract the contents of all tar files to the /opt/ directory, use a command similar to the following:
$ sudo tar xzf cloudera-manager*.tar.gz -C /opt/cloudera-manager
The files are extracted to a subdirectory named according to the Cloudera Manager version being extracted. For example, files could be extracted to /opt/cloudera-manager/cm-5.0/. This full path is required later and is referred to as $CMF_DEFAULTS directory.
Perform Configuration Required by Single User Mode
If you are creating a Cloudera Manager deployment that employs single user mode, perform the configuration steps described in Configuring Single User Mode.Create Users
The Cloudera Manager Server and managed services require a user account. When installing Cloudera Manager from tarballs, you must create this user account on all hosts manually. Because Cloudera Manager Server and managed services are configured to use tcloudera-scm by default, creating a user with this name is the simplest approach. This created user is used automatically after installation is complete.
$ sudo useradd --system --home=/opt/cloudera-manager/cm-5.6.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scmEnsure that the --home argument path matches your environment. This argument varies according to where you place the tarball, and the version number varies among releases. For example, the --home location could be /opt/cm-5.6.0/run/cloudera-scm-server.
Create the Cloudera Manager Server Local Data Storage Directory
- Create the following directory: /var/lib/cloudera-scm-server.
- Change the owner of the directory so that the cloudera-scm user and group have ownership of the directory. For example:
$ sudo mkdir /var/lib/cloudera-scm-server $ sudo chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-server
Configure Cloudera Manager Agents
Property | Description |
---|---|
server_host | Name of the host where Cloudera Manager Server is running. |
server_port | Port on the host where Cloudera Manager Server is running. |
Configuring for a Custom Cloudera Manager User and Custom Directories
- /var/log/cloudera-scm-headlamp
- /var/log/cloudera-scm-firehose
- /var/log/cloudera-scm-alertpublisher
- /var/log/cloudera-scm-eventserver
- /var/lib/cloudera-scm-headlamp
- /var/lib/cloudera-scm-firehose
- /var/lib/cloudera-scm-alertpublisher
- /var/lib/cloudera-scm-eventserver
- /var/lib/cloudera-scm-server
- Change ownership of existing directories:
Use the chown command to change ownership of all existing directories to the Cloudera Manager user. If the Cloudera Manager username and group are cloudera-scm, to change the ownership of the headlamp log directory, issue a command similar to the following:
$ sudo chown -R cloudera-scm:cloudera-scm /var/log/cloudera-scm-headlamp
- Use alternate directories:
- If the directories you plan to use do not exist, create them. For example, to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, run the following commands:
mkdir /var/cm_logs/cloudera-scm-headlamp chown cloudera-scm /var/cm_logs/cloudera-scm-headlamp
- Connect to the Cloudera Manager Admin Console.
- Select
- Select .
- Click the Configuration tab.
- Enter a term in the Search field to find the settings to be changed. For example, you can enter /var or directory.
- Update each value with the new locations for Cloudera Manager to use.
- Click Save Changes to commit the changes.
- If the directories you plan to use do not exist, create them. For example, to create /var/cm_logs/cloudera-scm-headlamp for use by the cloudera-scm user, run the following commands:
Create Parcel Directories
- On the Cloudera Manager Server host, create a parcel repository directory:
$ sudo mkdir -p /opt/cloudera/parcel-repo
- Change the directory ownership to be the username you are using to run Cloudera Manager:
$ sudo chown username:groupname /opt/cloudera/parcel-repo
where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would run the command:$ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
- On each cluster host, create a parcels directory:
$ sudo mkdir -p /opt/cloudera/parcels
- Change the directory ownership to be the username you are using to run Cloudera Manager:
$ sudo chown username:groupname /opt/cloudera/parcels
where username and groupname are the user and group names (respectively) you are using to run Cloudera Manager. For example, if you use the default username cloudera-scm, you would run the command:$ sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
Start the Cloudera Manager Server
- As root:
$ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-server start
- As another user. If you run as another user, ensure that the user you created for Cloudera Manager owns the location to which you extracted the tarball including the newly created
database files. If you followed the earlier examples and created the directory /opt/cloudera-manager and the user cloudera-scm, you could
use the following command to change ownership of the directory:
$ sudo chown -R cloudera-scm:cloudera-scm /opt/cloudera-manager
Once you have established ownership of directory locations, you can start Cloudera Manager Server using the user account you chose. For example, you might run the Cloudera Manager Server as cloudera-service. In this case, you have the following options:
- Run the following command:
$ sudo -u cloudera-service $CMF_DEFAULTS/etc/init.d/cloudera-scm-server start
- Edit the configuration files so the script internally changes the user, and then run the script as root:
- Remove the following line from $CMF_DEFAULTS/etc/default/cloudera-scm-server:
export CMF_SUDO_CMD=" "
- Change the user and group in $CMF_DEFAULTS/etc/init.d/cloudera-scm-server to the
user you want the server to run as. For example, to run as cloudera-service, change the user and group as follows:
USER=cloudera-service GROUP=cloudera-service
- Run the server script as root:
$ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-server start
- Remove the following line from $CMF_DEFAULTS/etc/default/cloudera-scm-server:
- Run the following command:
- To start the Cloudera Manager Server automatically after a reboot:
- Run the following commands on the Cloudera Manager Server host:
- RHEL-compatible and SLES
$ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server $ chkconfig cloudera-scm-server on
- Debian/Ubuntu
$ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server $ update-rc.d cloudera-scm-server defaults
- RHEL-compatible and SLES
- On the Cloudera Manager Server host, open the /etc/init.d/cloudera-scm-server file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to $CMF_DEFAULTS/etc/default.
- Run the following commands on the Cloudera Manager Server host:
If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
Start the Cloudera Manager Agents
- To start the Cloudera Manager Agent, run this command on each Agent host:
$ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent start
When the Agent starts, it contacts the Cloudera Manager Server. - If you are running single user mode, start
Cloudera Manager Agent using the user account you chose. For example, to run the Cloudera Manager Agent as cloudera-scm, you have the following options:
- Run the following command:
$ sudo -u cloudera-scm $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent start
- Edit the configuration files so the script internally changes the user, and then run the script as root:
- Remove the following line from $CMF_DEFAULTS/etc/default/cloudera-scm-agent:
export CMF_SUDO_CMD=" "
- Change the user and group in $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent to the
user you want the Agent to run as. For example, to run as cloudera-scm, change the user and group as follows:
USER=cloudera-scm GROUP=cloudera-scm
- Run the Agent script as root:
$ sudo $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent start
- Remove the following line from $CMF_DEFAULTS/etc/default/cloudera-scm-agent:
- Run the following command:
- To start the Cloudera Manager Agents automatically after a reboot:
- Run the following commands on each Agent host:
- RHEL-compatible and SLES
$ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent $ chkconfig cloudera-scm-agent on
- Debian/Ubuntu
$ cp $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent $ update-rc.d cloudera-scm-agent defaults
- RHEL-compatible and SLES
- On each Agent, open the $CMF_DEFAULTS/etc/init.d/cloudera-scm-agent file and change the value of CMF_DEFAULTS from ${CMF_DEFAULTS:-/etc/default} to $CMF_DEFAULTS/etc/default.
- Run the following commands on each Agent host:
Install Package Dependencies
When you install with tarballs and parcels, some services may require additional dependencies that are not provided by Cloudera. On each host, install the required packages:
When you install with tarballs and parcels, some services may require additional dependencies that are not provided by Cloudera. On each host, install the required packages:
- bind-utils
- chkconfig
- cyrus-sasl-gssapi
- cyrus-sasl-plain
- fuse
- fuse-libs
- gcc
- httpd
- init-functions
- libxslt
- mod_ssl
- MySQL-python
- openssl
- openssl-devel
- openssl-devel
- perl
- portmap
- postgresql-server >= 8.4
- psmisc
- python >= 2.4.3-43
- python-devel >= 2.4.3-43
- python-psycopg2
- python-setuptools
- sed
- service
- sqlite
- swig
- useradd
- zlib
- apache2
- bind-utils
- chkconfig
- cyrus-sasl-gssapi
- cyrus-sasl-plain
- fuse
- gcc
- libfuse2
- libxslt
- openssl
- openssl-devel
- perl
- portmap
- postgresql-server >= 8.4
- psmisc
- python >= 2.4.3-43
- python-devel >= 2.4.3-43
- python-mysql
- python-setuptools
- python-xml
- sed
- service
- sqlite
- swig
- useradd
- zlib
- ant
- apache2
- bash
- chkconfig
- debhelper (>= 7)
- fuse-utils | fuse
- gcc
- libfuse2
- libsasl2-modules
- libsasl2-modules-gssapi-mit
- libsqlite3-0
- libssl-dev
- libxslt1.1
- lsb-base
- make
- openssl
- perl
- postgresql-client@@PG_PKG_VERSION@@
- postgresql@@PG_PKG_VERSION@@
- psmisc
- python-dev (>=2.4)
- python-mysqldb
- python-psycopg2
- python-setuptools
- rpcbind
- sed
- swig
- useradd
- zlib1g
Start and Log into the Cloudera Manager Admin Console
- Wait several minutes for the Cloudera Manager Server to start. To observe the startup process, run tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log on the Cloudera Manager Server host. If the Cloudera Manager Server does not start, see Troubleshooting Installation and Upgrade Problems.
- In a web browser, enter http://Server host:7180, where Server host is the FQDN or IP address of the host where the Cloudera Manager Server is running.
The login screen for Cloudera Manager Admin Console displays.
- Log into Cloudera Manager Admin Console. The default credentials are: Username: admin Password: admin. Cloudera Manager does not support changing the admin username for the installed account. You can change the password using Cloudera Manager after you run the installation wizard. Although you cannot change the admin username, you can add a new user, assign administrative privileges to the new user, and then delete the default admin account.
- After you log in, the Cloudera Manager End User License Terms and Conditions page displays. Read the terms and conditions and then select Yes to accept them.
- Click Continue.
The Welcome to Cloudera Manager page displays.
Choose Cloudera Manager Edition
From the Welcome to Cloudera Manager page, you can select the edition of Cloudera Manager to install and, optionally, install a license:
- Choose which edition to install:
- Cloudera Express, which does not require a license, but provides a limited set of features.
- Cloudera Enterprise Enterprise Data Hub Edition Trial, which does not require a license, but expires after 60 days and cannot be renewed.
- Cloudera Enterprise with one of the following license types:
- Basic Edition
- Flex Edition
- Enterprise Data Hub Edition
- If you elect Cloudera Enterprise, install a license:
- Click Upload License.
- Click the document icon to the left of the Select a License File text field.
- Go to the location of your license file, click the file, and click Open.
- Click Upload.
- Information is displayed indicating what the CDH installation includes. At this point, you can click the Support drop-down menu to access online Help or the Support Portal.
- Click Continue to proceed with the installation.
Choose Cloudera Manager Hosts
- Click the Currently Managed Hosts tab.
- Choose the hosts to add to the cluster.
- Click Continue.
The Cluster Installation Select Repository screen displays.
Install CDH and Managed Service Software
- Install CDH and managed services using parcels:
- Use Parcels
- Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels
for the latest supported service versions are configured by default.
You can add additional parcels for lower versions by specifying custom repositories. For example, you can find the locations of the lower CDH 4 parcels at https://username:password@archive.cloudera.com/p/cdh4/parcels/. Or, if you are installing CDH 4.3 and want to use policy-file authorization, you can add the Sentry parcel using this mechanism.
- To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through
which parcels are downloaded, click the More Options button and do one or more of the following:
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on
cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents,
restart the Agents:
sudo service cloudera-scm-agent restart
- Parcel Repository - In the Remote Parcel Repository URLs field, click the button and enter the URL of the repository. The URL you specify is added to the list of repositories listed in the Configuring Cloudera Manager Server Parcel Settings page and a parcel is added to the list of parcels on the Select Repository page. If you have multiple repositories configured, you see all the unique parcels contained in all your repositories.
- Proxy Server - Specify the properties of a proxy server.
- Parcel Directory and Local Parcel Repository Path - Specify the location of parcels on
cluster hosts and the Cloudera Manager Server host. If you change the default value for Parcel Directory and have already installed and started Cloudera Manager Agents,
restart the Agents:
- Click OK.
- To specify the parcel directory, specify the local parcel repository, add a parcel repository, or specify the properties of a proxy server through
which parcels are downloaded, click the More Options button and do one or more of the following:
- If you are using Cloudera Manager to install software, select the release of Cloudera Manager Agent. You can choose either the version that matches the Cloudera Manager Server you are currently using or specify a version in a custom repository. If you opted to use custom repositories for installation files, you can provide a GPG key URL that applies for all repositories.
- Choose the parcels to install. The choices depend on the repositories you have chosen; a repository can contain multiple parcels. Only the parcels
for the latest supported service versions are configured by default.
- Click Continue. Cloudera Manager installs the CDH and managed service parcels. During parcel installation, progress is indicated for the phases of the parcel installation process in separate progress bars. If you are installing multiple parcels, you see progress bars for each parcel. When the Continue button at the bottom of the screen turns blue, the installation process is completed. Click Continue.
- Use Parcels
- Click Continue.
The Host Inspector runs to validate the installation and provides a summary of the results, including all the versions of the installed components. If the validation is successful, click Finish.
Add Services
- On the first page of the Add Services wizard, choose the combination of services to install and whether to install Cloudera Navigator:
- Select the combination of services to install:
CDH 4 CDH 5 - Core Hadoop - HDFS, MapReduce, ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- All Services - HDFS, MapReduce, ZooKeeper, HBase, Impala, Oozie, Hive, Hue, and Sqoop
- Custom Services - Any combination of services.
- Core Hadoop - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, and Hue
- Core with HBase
- Core with Impala
- Core with Search
- Core with Spark
- All Services - HDFS, YARN (includes MapReduce 2), ZooKeeper, Oozie, Hive, Hue, HBase, Impala, Solr, Spark, and Key-Value Store Indexer
- Custom Services - Any combination of services.
- Some services depend on other services; for example, HBase requires HDFS and ZooKeeper. Cloudera Manager tracks dependencies and installs the correct combination of services.
- In a Cloudera Manager deployment of a CDH 4 cluster, the MapReduce service is the default MapReduce computation framework. Choose Custom Services to install YARN, or use the Add Service functionality to add YARN after installation completes.
- In a Cloudera Manager deployment of a CDH 5 cluster, the YARN service is the default MapReduce computation framework. Choose Custom Services to install MapReduce, or use the Add Service functionality to add MapReduce after installation completes.
- The Flume service can be added only after your cluster has been set up.
- If you have chosen Enterprise Data Hub Edition Trial or Cloudera Enterprise, optionally select the Include Cloudera Navigator checkbox to enable Cloudera Navigator. See Cloudera Navigator Data Management Overview.
- Select the combination of services to install:
- Click Continue.
- Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the hosts to determine the best hosts for each
role. The wizard assigns all worker roles to the same set of hosts to which the HDFS DataNode role is assigned. You can reassign role instances.
Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the hosts dialog box.
The following shortcuts for specifying hostname patterns are supported:- Range of hostnames (without the domain portion)
Range Definition Matching Hosts 10.1.1.[1-4] 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4 host[1-3].company.com host1.company.com, host2.company.com, host3.company.com host[07-10].company.com host07.company.com, host08.company.com, host09.company.com, host10.company.com - IP addresses
- Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
- Range of hostnames (without the domain portion)
- When you are finished with the assignments, click Continue.
Configure Database Settings
- Enter the database host, database type, database name, username, and password for the database that you created when you set up the database.
- Click Test Connection to confirm that Cloudera Manager can communicate with the database using the information you have supplied. If the test succeeds
in all cases, click Continue; otherwise, check and correct the information you have provided for the database and then try the test again. (For some servers, if you
are using the embedded database, you will see a message saying the database will be created at a later step in the installation process.)
The Review Changes screen displays.
Review Configuration Changes and Start Services
- Review the configuration changes to be applied. Confirm the settings entered for file system paths. The file paths required vary based on the services to be installed. If you chose to add the Sqoop service, indicate whether to use the default Derby database or the embedded PostgreSQL database. If the latter, type the database name, host, and user credentials that you specified when you created the database.
- Click Continue.
The wizard starts the services.
- When all of the services are started, click Continue. You see a success message indicating that your cluster has been successfully started.
- Click Finish to proceed to the Cloudera Manager Admin Console Home Page.
(Optional) Change the Cloudera Manager User
- Connect to the Cloudera Manager Admin Console.
- Do one of the following:
- Select .
- On the Cloudera Management Service table, click the Cloudera Management Service link. tab, in
- Click the Configuration tab.
- Use the search box to find the property to change. For example, you might enter "system" to find the System User and System Group properties.
- Make any changes required to the System User and System Group to ensure Cloudera Manager uses the proper user accounts.
- Click Save Changes.
- Start the Cloudera Management Service roles.
Change the Default Administrator Password
- Click the logged-in username at the far right of the top navigation bar and select Change Password.
- Enter the current password and a new password twice, and then click OK.
Configure Oozie Data Purge Settings
If you added an Oozie service, you can change your Oozie configuration to control when data is purged to improve performance, cut down on database disk usage, or to keep the history for a longer period of time. Limiting the size of the Oozie database can also improve performance during upgrades. See Configuring Oozie Data Purge Settings Using Cloudera Manager.
Test the Installation
You can test the installation following the instructions in Testing the Installation.