Recommended Cluster Hosts and Role Distribution
When you install CDH using the Cloudera Manager installation wizard, Cloudera Manager attempts to spread the roles among cluster hosts (except for roles assigned to gateway hosts) based on the resources available in the hosts. You can change these assignments on the Customize Role Assignments page that appears in the wizard. You can also change and add roles at a later time using Cloudera Manager. See Role Instances.
If your cluster uses data-at-rest encryption, see Allocating Hosts for Key Trustee Server and Key Trustee KMS.
For information about where to locate various databases that are required for Cloudera Manager and other services, see Step 4: Install and Configure Databases.
CDH Cluster Hosts and Role Assignments
- Master hosts run Hadoop master processes such as the HDFS NameNode and YARN Resource Manager.
- Utility hosts run other cluster processes that are not master processes such as Cloudera Manager and the Hive Metastore.
- Gateway hosts are client access points for launching jobs in the cluster. The number of gateway hosts required varies depending on the type and size of the workloads.
- Worker hosts primarily run DataNodes and other distributed processes such as Impalad.
The following tables describe the recommended role allocations for different cluster sizes:
3 - 10 Worker Hosts without High Availability
Master Hosts | Utility Hosts | Gateway Hosts | Worker Hosts |
---|---|---|---|
Master Host 1:
|
One host for all Utility and Gateway roles:
|
3 - 10 Worker Hosts:
|
3 - 20 Worker Hosts with High Availability
Master Hosts | Utility Hosts | Gateway Hosts | Worker Hosts |
---|---|---|---|
Master Host 1:
Master Host 2:
Master Host 3:
|
Utility Host 1:
|
One or more Gateway Hosts:
|
3 - 20 Worker Hosts:
|
20 - 80 Worker Hosts with High Availability
Master Hosts | Utility Hosts | Gateway Hosts | Worker Hosts |
---|---|---|---|
Master Host 1:
Master Host 2:
Master Host 3:
|
Utility Host 1:
Utility Host 2:
|
One or more Gateway Hosts:
|
20 - 80 Worker Hosts:
|
80 - 200 Worker Hosts with High Availability
Master Hosts | Utility Hosts | Gateway Hosts | Worker Hosts |
---|---|---|---|
Master Host 1:
Master Host 2:
Master Host 3:
|
Utility Host 1:
Utility Host 2:
Utility Host 3:
Utility Host 4:
Utility Host 5:
Utility Host 6:
Utility Host 7:
Utility Host 8:
|
One or more Gateway Hosts:
|
80 - 200 Worker Hosts:
|
200 - 500 Worker Hosts with High Availability
Master Hosts | Utility Hosts | Gateway Hosts | Worker Hosts |
---|---|---|---|
Master Host 1:
Master Host 2:
Master Host 3:
Master Host 4:
Master Host 5:
We recommend no more than three Kudu masters. |
Utility Host 1:
Utility Host 2:
Utility Host 3:
Utility Host 4:
Utility Host 5:
Utility Host 6:
Utility Host 7:
Utility Host 8:
|
One or more Gateway Hosts:
|
200 - 500 Worker Hosts:
|
500 -1000 Worker Hosts with High Availability
Master Hosts | Utility Hosts | Gateway Hosts | Worker Hosts |
---|---|---|---|
Master Host 1:
Master Host 2:
Master Host 3:
Master Host 4:
Master Host 5:
We recommend no more than three Kudu masters. |
Utility Host 1:
Utility Host 2:
Utility Host 3:
Utility Host 4:
Utility Host 5:
Utility Host 6:
Utility Host 7:
Utility Host 8:
|
One or more Gateway Hosts:
|
500 - 1000 Worker Hosts:
|
Allocating Hosts for Key Trustee Server and Key Trustee KMS
If you are enabling data-at-rest encryption for a CDH cluster, Cloudera recommends that you isolate the Key Trustee Server from other enterprise data hub (EDH) services by deploying the Key Trustee Server on dedicated hosts in a separate cluster managed by Cloudera Manager. Cloudera also recommends deploying Key Trustee KMS on dedicated hosts in the same cluster as the EDH services that require access to Key Trustee Server. This architecture helps users avoid having to restart the Key Trustee Server when restarting a cluster.
For more information about encrypting data at rest in an EDH, see Encrypting Data at Rest.
For production environments in general, or if you have enabled high availability for HDFS and are using data-at-rest encryption, Cloudera recommends that you enable high availability for Key Trustee Server and Key Trustee KMS.