Upgrading from CDH 5.4.0 or Higher to the Latest Release
Use the instructions that follow to upgrade from CDH 5.4.0 or higher to the latest version of CDH 5.
To upgrade from CDH 5.4.0 or higher, proceed as follows.
- Step 1: Prepare the cluster for the upgrade
- Step 2: If necessary, download the CDH 5 "1-click" package on each host in your cluster
- Step 3: Upgrade the Packages on the Appropriate Hosts
- Step 4: In an HA Deployment, Upgrade and Start the JournalNodes
- Step 5: Start HDFS
- Step 6: Start MapReduce (MRv1) or YARN
- Step 7: Set the Sticky Bit
- Step 8: Upgrade Components
- Step 9: Apply Configuration File Changes if Necessary
Step 1: Prepare the cluster for the upgrade
- Put the NameNode into safe mode and save thefsimage:
- Put the NameNode (or active NameNode in an HA configuration) into safe mode:
$ sudo -u hdfs hdfs dfsadmin -safemode enter
- Perform a saveNamespace operation:
$ sudo -u hdfs hdfs dfsadmin -saveNamespace
This will result in a new fsimage being written out with no edit log entries.
- With the NameNode still in safe mode, shut down all services as instructed below.
- Put the NameNode (or active NameNode in an HA configuration) into safe mode:
- Shut down Hadoop services across your entire cluster by running the following command on every host in your cluster:
$ for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
- Check each host to make sure that there are no processes running as the hdfs, yarn, mapred
or httpfs users from root:
# ps -aef | grep java
- Back up the HDFS metadata on the NameNode machine, as follows.
- Find the location of your dfs.name.dir (or dfs.namenode.name.dir); for example:
$ grep -C1 dfs.name.dir /etc/hadoop/conf/hdfs-site.xml <property> <name>dfs.name.dir</name> <value>/mnt/hadoop/hdfs/name</value> </property>
- Back up the directory. The path inside the <value> XML element is the path to your HDFS metadata. If you see a comma-separated list of paths, there is no need to back up all of
them; they store the same data. Back up the first directory, for example, by using the following commands:
$ cd /mnt/hadoop/hdfs/name # tar -cvf /root/nn_backup_data.tar . ./ ./current/ ./current/fsimage ./current/fstime ./current/VERSION ./current/edits ./image/ ./image/fsimage
- Find the location of your dfs.name.dir (or dfs.namenode.name.dir); for example:
Step 2: If necessary, download the CDH 5 "1-click" package on each host in your cluster
Before you begin: Check whether you have the CDH 5 "1-click" repository installed.
- On Red Hat/CentOS-compatible and SLES systems:
rpm -q cdh5-repository
If you are upgrading from CDH 5 Beta 1 or higher, you should see:
cdh5-repository-1-0
In this case, skip to Step 3. If instead you see:
package cdh5-repository is not installed
proceed with this step.
- On Ubuntu and Debian systems:
dpkg -l | grep cdh5-repository
If the repository is installed, skip to Step 3; otherwise proceed with this step.
If the CDH 5 "1-click" repository is not already installed on each host in the cluster, follow the instructions below for that host's operating system:
Instructions for Red Hat-compatible systems
Instructions for Ubuntu and Debian systems
On Red Hat-compatible systems:
- Download the CDH 5 "1-click Install" package (or RPM).
Click the appropriate RPM and Save File to a directory with write access (for example, your home directory).
OS Version Link to CDH 5 RPM RHEL/CentOS/Oracle 5 RHEL/CentOS/Oracle 5 link RHEL/CentOS/Oracle 6 RHEL/CentOS/Oracle 6 link RHEL/CentOS/Oracle 7 RHEL/CentOS/Oracle 7 link - Install the RPM for all RHEL versions:
$ sudo yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
On SLES systems:
- Download the CDH 5 "1-click Install" package.
Download the RPM file, choose Save File, and save it to a directory to which you have write access (for example, your home directory).
- Install the RPM:
$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
- Update your system package index by running the following:
$ sudo zypper refresh
$ sudo rpm -i cloudera-cdh-5-0.x86_64.rpm
On Ubuntu and Debian systems:
- Download the CDH 5 "1-click Install" package:
OS Version Package Link Jessie Jessie package Wheezy Wheezy package Precise Precise package Trusty Trusty package - Install the package by doing one of the following:
- Choose Open with in the download window to use the package manager.
- Choose Save File, save the package to a directory to which you have write access (for example, your home directory), and install it from the command line.
For example:
sudo dpkg -i cdh5-repository_1.0_all.deb
Step 3: Upgrade the Packages on the Appropriate Hosts
Upgrade MRv1, YARN, or both, depending on what you intend to use.
Before installing MRv1 or YARN: (Optionally) add a repository key on each system in the cluster, if you have not already done so. Add the Cloudera Public GPG Key to your repository by executing one of the following commands:
- For Red Hat/CentOS/Oracle 5 systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
- For Red Hat/CentOS/Oracle 6 systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
- For all SLES systems:
$ sudo rpm --import https://archive.cloudera.com/cdh5/sles/11/x86_64/cdh/RPM-GPG-KEY-cloudera
- For Ubuntu Precise systems:
$ curl -s https://archive.cloudera.com/cdh5/ubuntu/precise/amd64/cdh/archive.key | sudo apt-key add -
- For Debian Wheezy systems:
$ curl -s https://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key | sudo apt-key add -
Step 3a: If you are using MRv1, upgrade the MRv1 packages on the appropriate hosts.
Skip this step if you are using YARN exclusively. Otherwise upgrade each type of daemon package on the appropriate hosts as follows:
- Install and deploy ZooKeeper:
Follow instructions under ZooKeeper Installation.
- Install each type of daemon package on the appropriate systems(s), as follows.
Where to install
Install commands
JobTracker host running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-jobtracker
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-jobtracker
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-jobtracker
NameNode host running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-hdfs-namenode
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-hdfs-namenode
Secondary NameNode host (if used) running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-hdfs-secondarynamenode
All cluster hosts except the JobTracker, NameNode, and Secondary (or Standby) NameNode hosts, running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-0.20-mapreduce-tasktracker hadoop-hdfs-datanode
All client hosts, running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-client
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-client
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-client
Step 3b: If you are using YARN, upgrade the YARN packages on the appropriate hosts.
Skip this step if you are using MRv1 exclusively. Otherwise upgrade each type of daemon package on the appropriate hosts as follows:
- Install and deploy ZooKeeper:
Follow instructions under ZooKeeper Installation.
- Install each type of daemon package on the appropriate systems(s), as follows.
Where to install
Install commands
Resource Manager host (analogous to MRv1 JobTracker) running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-yarn-resourcemanager
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-yarn-resourcemanager
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-yarn-resourcemanager
NameNode host running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-hdfs-namenode
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-namenode
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-hdfs-namenode
Secondary NameNode host (if used) running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-hdfs-secondarynamenode
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-hdfs-secondarynamenode
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-hdfs-secondarynamenode
All cluster hosts except the Resource Manager (analogous to MRv1 TaskTrackers) running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
One host in the cluster running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
Ubuntu or Debian
$ sudo apt-get update; sudo apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
All client hosts, running:
Red Hat/CentOS compatible
$ sudo yum clean all; sudo yum install hadoop-client
SLES
$ sudo zypper clean --all; sudo zypper install hadoop-client
Ubuntu or Debian
sudo apt-get update; sudo apt-get install hadoop-client
Step 4: In an HA Deployment, Upgrade and Start the JournalNodes
- Install the JournalNode daemons on each of the machines where they will run.
To install JournalNode on RHEL-compatible systems:
$ sudo yum install hadoop-hdfs-journalnode
To install JournalNode on Ubuntu and Debian systems:
$ sudo apt-get install hadoop-hdfs-journalnode
To install JournalNode on SLES systems:
$ sudo zypper install hadoop-hdfs-journalnode
- Start the JournalNode daemons on each of the machines where they will run:
sudo service hadoop-hdfs-journalnode start
Wait for the daemons to start before proceeding to the next step.
Step 5: Start HDFS
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
Step 6: Start MapReduce (MRv1) or YARN
You are now ready to start and test MRv1 or YARN.
For MRv1 |
For YARN |
---|---|
Step 6a: Start MapReduce (MRv1)
After you have verified HDFS is operating correctly, you are ready to start MapReduce. On each TaskTracker system:
$ sudo service hadoop-0.20-mapreduce-tasktracker start
On the JobTracker system:
$ sudo service hadoop-0.20-mapreduce-jobtracker start
Verify that the JobTracker and TaskTracker started properly.
$ sudo jps | grep Tracker
If the permissions of directories are not configured correctly, the JobTracker and TaskTracker processes start and immediately fail. If this happens, check the JobTracker and TaskTracker logs and set the permissions correctly.
Verify basic cluster operation for MRv1
At this point your cluster is upgraded and ready to run jobs. Before running your production jobs, verify basic cluster operation by running an example from the Apache Hadoop web site.
- Create a home directory on HDFS for the user who will be running the job (for example, joe):
$ sudo -u hdfs hadoop fs -mkdir -p /user/joe $ sudo -u hdfs hadoop fs -chown joe /user/joe
Do the following steps as the user joe.
- Make a directory in HDFS called input and copy some XML files into it by running the following commands:
$ hadoop fs -mkdir input $ hadoop fs -put /etc/hadoop/conf/*.xml input $ hadoop fs -ls input Found 3 items: -rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml -rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml -rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml
- Run an example Hadoop job to grep with a regular expression in your input data.
$ /usr/bin/hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar grep input output 'dfs[a-z.]+'
- After the job completes, you can find the output in the HDFS directory named output because you specified that output directory to Hadoop.
$ hadoop fs -ls Found 2 items drwxr-xr-x - joe supergroup 0 2009-08-18 18:36 /user/joe/input drwxr-xr-x - joe supergroup 0 2009-08-18 18:38 /user/joe/output
You can see that there is a new directory called output.
- List the output files.
$ hadoop fs -ls output Found 2 items drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output/_logs -rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output/part-00000 -rw-r--r- 1 joe supergroup 0 2009-02-25 10:33 /user/joe/output/_SUCCESS
- Read the results in the output file; for example:
$ hadoop fs -cat output/part-00000 | head 1 dfs.datanode.data.dir 1 dfs.namenode.checkpoint.dir 1 dfs.namenode.name.dir 1 dfs.replication 1 dfs.safemode.extension 1 dfs.safemode.min.datanodes
You have now confirmed your cluster is successfully running CDH 5.
Step 6b: Start MapReduce with YARN
After you have verified HDFS is operating correctly, you are ready to start YARN. First, if you have not already done so, create directories and set the correct permissions.
$ sudo -u hdfs hadoop fs -mkdir -p /user/history $ sudo -u hdfs hadoop fs -chmod -R 1777 /user/history $ sudo -u hdfs hadoop fs -chown yarn /user/history
$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn $ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
Verify the directory structure, ownership, and permissions:
$ sudo -u hdfs hadoop fs -ls -R /
drwxrwxrwt - hdfs supergroup 0 2012-04-19 14:31 /tmp drwxr-xr-x - hdfs supergroup 0 2012-05-31 10:26 /user drwxrwxrwt - yarn supergroup 0 2012-04-19 14:31 /user/history drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var drwxr-xr-x - hdfs supergroup 0 2012-05-31 15:31 /var/log drwxr-xr-x - yarn mapred 0 2012-05-31 15:31 /var/log/hadoop-yarn
To start YARN, start the ResourceManager and NodeManager services:
On the ResourceManager system:
$ sudo service hadoop-yarn-resourcemanager start
On each NodeManager system (typically the same ones where DataNode service runs):
$ sudo service hadoop-yarn-nodemanager start
To start the MapReduce JobHistory Server
On the MapReduce JobHistory Server system:
$ sudo service hadoop-mapreduce-historyserver start
For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop 1 in a YARN installation, set the HADOOP_MAPRED_HOME environment variable as follows:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
Verify basic cluster operation for YARN.
At this point your cluster is upgraded and ready to run jobs. Before running your production jobs, verify basic cluster operation by running an example from the Apache Hadoop web site.
- Create a home directory on HDFS for the user who will be running the job (for example, joe):
$ sudo -u hdfs hadoop fs -mkdir -p /user/joe $ sudo -u hdfs hadoop fs -chown joe /user/joe
Do the following steps as the user joe.
- Make a directory in HDFS called input and copy some XML files into it by running the following commands in pseudo-distributed mode:
$ hadoop fs -mkdir input $ hadoop fs -put /etc/hadoop/conf/*.xml input $ hadoop fs -ls input Found 3 items: -rw-r--r-- 1 joe supergroup 1348 2012-02-13 12:21 input/core-site.xml -rw-r--r-- 1 joe supergroup 1913 2012-02-13 12:21 input/hdfs-site.xml -rw-r--r-- 1 joe supergroup 1001 2012-02-13 12:21 input/mapred-site.xml
- Set HADOOP_MAPRED_HOME for user joe:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
- Run an example Hadoop job to grep with a regular expression in your input data.
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
- After the job completes, you can find the output in the HDFS directory named output23 because you specified that output directory to Hadoop.
$ hadoop fs -ls Found 2 items drwxr-xr-x - joe supergroup 0 2009-08-18 18:36 /user/joe/input drwxr-xr-x - joe supergroup 0 2009-08-18 18:38 /user/joe/output23
You can see that there is a new directory called output23.
- List the output files:
$ hadoop fs -ls output23 Found 2 items drwxr-xr-x - joe supergroup 0 2009-02-25 10:33 /user/joe/output23/_SUCCESS -rw-r--r-- 1 joe supergroup 1068 2009-02-25 10:33 /user/joe/output23/part-r-00000
- Read the results in the output file:
$ hadoop fs -cat output23/part-r-00000 | head 1 dfs.safemode.min.datanodes 1 dfs.safemode.extension 1 dfs.replication 1 dfs.permissions.enabled 1 dfs.namenode.name.dir 1 dfs.namenode.checkpoint.dir 1 dfs.datanode.data.dir
You have now confirmed your cluster is successfully running CDH 5.
Step 7: Set the Sticky Bit
For security reasons Cloudera strongly recommends you set the sticky bit on directories if you have not already done so.
The sticky bit prevents anyone except the superuser, directory owner, or file owner from deleting or moving the files within a directory. (Setting the sticky bit for a file has no effect.) Do this for directories such as /tmp. (For instructions on creating /tmp and setting its permissions, see these instructions).
Step 8: Upgrade Components
CDH 5 Components
- Crunch Installation
- Setting Up Apache Flume Using the Command Line
- Setting Up Apache HBase Using the Command Line
- HCatalog
- Setting Up Apache Hive Using the Command Line
- Setting Up HttpFS Using the Command Line
- Setting Up Hue Using the Command Line
- Setting Up Apache Impala Using the Command Line
- Setting Up KMS Using the Command Line
- Setting Up Apache Mahout Using the Command Line
- Setting Up Apache Oozie Using the Command Line
- Setting Up Apache Pig Using the Command Line
- Setting Up Cloudera Search Using the Command Line
- Setting Up Apache Sentry Using the Command Line
- Setting Up Apache Spark Using the Command Line
- Setting Up Apache Sqoop Using the Command Line
- Setting Up Apache Sqoop 2 Using the Command Line
- Setting Up Apache Whirr Using the Command Line
- ZooKeeper Installation
Step 9: Apply Configuration File Changes if Necessary
For example, if you have modified your zoo.cfg configuration file (/etc/zookeeper/zoo.cfg), the upgrade renames and preserves a copy of your modified zoo.cfg as /etc/zookeeper/zoo.cfg.rpmsave. If you have not already done so, you should now compare this to the new /etc/zookeeper/conf/zoo.cfg, resolve differences, and make any changes that should be carried forward (typically where you have changed property value defaults). Do this for each component you upgrade.