Configure Hue for High Availability
Configuring Hue for High Availability (HA) means configuring Hue, Hive, and Impala.
Configure Hue for High Availability
Prerequisites
- SSH network access to host machines with an Hue Server/Kerberos Ticket Renewer role.
- External database configured for each Hue Server. See Hue Databases.
Add Hue Roles
Hue HA requires at least two Hue server roles and one Load Balancer role. If the cluster is authenticating with Kerberos, you need one Kerberos Ticket Renewer on each host with a Hue Server.
- Log on to Cloudera Manager and go to the Hue service.
- Go to the Hue service and select .
- Click Hue Server, assign to one or more hosts, and click .
- Click Kerberos Ticket Renewer, assign to each host with a Hue Server, and click .
- Click Load Balancer, assign to one or more hosts, and click .
- Check each role and select Start. and click
Enable TLS for Hue Load Balancer
- Go to TLS/SSL. and search on
- Check Enable TLS/SSL for Hue for the Hue Server Default Group.
- Set other TLS/SSL properties appropriate for your setup. Some to consider are:
- Hue Load Balancer Port - Apache Load Balancer listens on this port (default is 8889).
- Path to TLS/SSL Certificate File - Must be multi-domain with CN = Load Balancer in PEM format.
- Path to TLS/SSL Private Key File - Must be in PEM format.
- Click Save Changes and Restart Hue.
Configure Hive and Impala for High Availability
Prerequisites & Requirements
- SSH network access to host machines with a HiveServer2 or Impala Daemon role.
- External database configured for each H2S and Impala Daemon.
- Hue Load Balancer Hive/Impala Load Balancer configured with Source IP Persistence.
Source IP Persistence
Without IP Persistence, you may encounter the error, “Results have expired, rerun the query if needed.
Hue supports High Availability through a "load balancer" to HiveServer2 and Impala. Because the underlying Hue thrift libraries reuse TCP connections in a pool, a single user session may not have the same TCP connection. If a TCP connection is balanced away from a HiveServer2 or Impalad instance, the user session and its queries (running or returned) can be lost and trigger the “Results have expired" error.
To prevent sessions from being lost, configure the Hive/Impala Load Balancer with Source IP Persistence so that each Hue instance sends all traffic to a single HiveServer2/Impala instance. Of course, this is not true load balancing, but a configuration for failover High Availability.
To prevent sessions from timing out while in use, add more Hue Server instances, so that each can be pinned to another HiveServer2/Impala instance. And for both HiveServer2/Impala, set the affinity timeout (that is, the timeout to close persisted sessions) to be longer than the impala query and session timeouts.
For the best load distribution, create multiple profiles in your load balancer, per port, for both non-Hue clients and Hue clients. Have non-Hue clients distribute loads in a round robin and configure Hue clients with source IP Persistence on dedicated ports, for example, 21000 for impala-shell, 21050 for impala-jdbc, and 21051 for Hue.
Add Hive and Impala Roles
- Configure the cluster with at least two roles for HiveServer2:
- Go to the Hive service and select .
- Click HiveServer2, assign one or more hosts, and click .
- Check each role and select Start. and click
- Configure the cluster with at least two roles for Impala Daemon:
- Go to the Impala service and select .
- Click Impala Daemon, assign one or more hosts, and click .
- Check each role and select Start. and click
Install Proxy Service
This is an example of how to add a proxy server for each HiveServer2 and Impala Daemon with multiple profiles.
- Install haproxy (for either RHEL / Ubuntu / SLES):
yum install haproxy
apt-get install haproxy
zypper addrepo http://download.opensuse.org/repositories/server:http/SLE_12/server:http.repo zypper refresh zypper install haproxy
- Configure haproxy for each role, for example:
vi /etc/haproxy/haproxy.cfg
listen impala-shell bind :21001 mode tcp option tcplog balance roundrobin stick-table type ip size 20k expire 5m server impala_0 host shortname-2.domain:21000 check server impala_1 host shortname-3.domain:21000 check listen impala-jdbc bind :21051 mode tcp option tcplog balance roundrobin stick-table type ip size 20k expire 5m server impala_0 host shortname-2.domain:21050 check server impala_1 host shortname-3.domain:21050 check listen impala-hue bind :21052 mode tcp option tcplog balance source server impala_0 host shortname-2.domain:21050 check server impala_1 host shortname-3.domain:21050 check listen hiveserver2-jdbc bind :10001 mode tcp option tcplog balance roundrobin stick-table type ip size 20k expire 5m server hiveserver2_0 host shortname-1.domain:10000 check server hiveserver2_1 host shortname-2.domain:10000 check listen hiveserver2-hue bind :10002 mode http option tcplog balance source server hiveserver2_0 host shortname-1.domain:10000 check server hiveserver2_1 host shortname-2.domain:10000 check
Replace shortname-#.domain with those in your environment:sed -i "s/host shortname/your host shortname/g" /etc/haproxy/haproxy.cfg sed -i "s/domain/your domain/g" /etc/haproxy/haproxy.cfg
- Restart haproxy:
service haproxy restart
- Run netstat to ensure your proxies are running:
netstat | grep LISTEN
-
For information about using HUE with the configured load balancer for either Impala or Hive, see one of the following references: