Configuring Apache Hive Metastore High Availability in CDH

You can enable Hive metastore high availability (HA) so that your cluster is resilient to failures if a metastore becomes unavailable. The HA mode is recommended to address fail-over situations. No load balancing is done.

Recommendations

Cloudera recommends that each instance of the metastore runs on a separate cluster host, to maximize high availability.

Limitations

Sentry HDFS synchronization does not support Hive metastore HA.

Enabling Hive Metastore High Availability Using Cloudera Manager

Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)

  1. Go to the Hive service.
  2. If you have a secure cluster, change the Hive Delegation Token Store implementation. Non-secure clusters can skip this step.

    To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.

    1. Click the Configuration tab.
    2. Select Scope > Hive Metastore Server.
    3. Select Category > Advanced.
    4. Locate the Hive Metastore Delegation Token Store property or search for it by typing its name In the Search box.
    5. Select org.apache.hadoop.hive.thrift.DBTokenStore.
    6. Click Save Changes to commit the changes.
  3. Click the Instances tab.
  4. Click Add Role Instances.
  5. Click the text field under Hive Metastore Server.
  6. Check the box by the host on which to run the additional metastore and click OK.
  7. Click Continue and click Finish.
  8. Check the box by the new Hive Metastore Server role.
  9. Select Actions for Selected > Start, and click Start to confirm.
  10. Click Close and click to display the stale configurations page.
  11. Click Restart Stale Services and click Restart Now.
  12. Click Finish after the cluster finishes restarting.

Enabling Hive Metastore High Availability Using the Command Line

To configure the Hive metastore for high availability, configure each metastore to store its state in a replicated database, then provide the metastore clients with a list of URIs where metastores are available. The client starts with the first URI in the list. If it does not get a response, it randomly picks another URI in the list and attempts to connect. This continues until the client receives a response.

  1. Configure Hive on each of the cluster hosts where you want to run a metastore, following the instructions at Configuring the Hive Metastore for CDH.
  2. On the server where the master metastore instance runs, edit the /etc/hive/conf.server/hive-site.xml file, setting the hive.metastore.uris property's value to a list of URIs where a Hive metastore is available for failover.
    <property>
     <name>hive.metastore.uris</name>
     <value>thrift://metastore1.example.com,thrift://metastore2.example.com,thrift://metastore3.example.com</value>
     <description> URI for client to contact metastore server </description>
    </property>
  3. If you use a secure cluster, enable the Hive token store by configuring the value of the hive.cluster.delegation.token.store.class property to org.apache.hadoop.hive.thrift.DBTokenStore. Non-secure clusters can skip this step.
    <property>
     <name>hive.cluster.delegation.token.store.class</name>
     <value>org.apache.hadoop.hive.thrift.DBTokenStore</value>
    </property>
  4. Save your changes and restart each Hive instance.
  5. Connect to each metastore and update it to use a nameservice instead of a NameNode, as a requirement for high availability.
    1. From the command-line, as the Hive user, retrieve the list of URIs representing the filesystem roots:
      hive --service metatool -listFSRoot
    2. Run the following command with the --dry-run option, to be sure that the nameservice is available and configured correctly. This will not change your configuration.
      hive --service metatool -updateLocation nameservice-uri namenode-uri -dryRun
    3. Run the same command again without the --dry-run option to direct the metastore to use the nameservice instead of a NameNode.
      hive --service metatool -updateLocation nameservice-uri namenode-uri 
  6. Test your configuration by stopping your main metastore instance, and then attempting to connect to one of the other metastores from a client. The following is an example of doing this on a RHEL or Fedora system. The example first stops the local metastore, then connects to the metastore on the host metastore2.example.com and runs the SHOW TABLES command.
    $ sudo service hive-metastore stop
    $ /usr/lib/hive/bin/beeline
    beeline> !connect jdbc:hive2://metastore2.example.com:10000 username password org.apache.hive.jdbc.HiveDriver
    0: jdbc:hive2://localhost:10000> SHOW TABLES;
    show tables;
    +-----------+
    | tab_name  |
    +-----------+
    +-----------+
    No rows selected (0.238 seconds)
    0: jdbc:hive2://localhost:10000> 
  7. Restart the local metastore when you have finished testing.
    $ sudo service hive-metastore start