High Availability
This guide is for Apache Hadoop system administrators who want to enable continuous availability by configuring clusters without single points of failure.
Not all Hadoop components currently support highly availability configurations. However, some currently SPOF (single point of failure) components can be configured to restart automatically in the event of a failure (Auto-Restart Configurable, in the table below). Some components support high availability implicitly because they comprise distributed processes (identified with an asterisk (*) in the table). In addition, some components depend on external databases which must also be configured to support high availability.
High Availability | Auto-Restart Configurable | Components with External Databases |
---|---|---|
Alert Publisher | Hive Metastore (not possible with Sentry enabled) | Activity Monitor |
Cloudera Manager Agent* | Impala catalog service | Cloudera Navigator Audit Server |
Cloudera Manager Server | Impala statestore | Cloudera Navigator Metadata Server |
Data Node* | Sentry Service | Hive Metastore Server |
Event Server | Spark Job History Server | Oozie Server |
Flume* | YARN Job History Server | Reports Manager |
HBase Master | Sentry Server | |
Host Monitor | Sqoop Server | |
Hue (add multiple services, use load balancer) | ||
Impalad* (add multiple services, use load balancer) | ||
NameNode | ||
Navigator Key Trustee | ||
Node Manager* | ||
Oozie Server | ||
RegionServer* | ||
Reports Manager | ||
Resource Manager | ||
Service Monitor | ||
Solr Server* | ||
Zookeeper server* |
Continue reading:
- HDFS High Availability
- MapReduce (MRv1) and YARN (MRv2) High Availability
- Cloudera Navigator Key Trustee Server High Availability
- Enabling Key Trustee KMS High Availability
- Enabling Navigator HSM KMS High Availability
- High Availability for Other CDH Components
- Configuring Cloudera Manager for High Availability With a Load Balancer