Using the Cloudera Manager API for Cluster Automation
One of the complexities of Apache Hadoop is the need to deploy clusters of servers, potentially on a regular basis. If you maintain hundreds of test and development clusters in different configurations, this process can be complex and cumbersome if not automated.
Cluster Automation Use Cases
Cluster automation is useful in various situations. For example, you might work on many versions of CDH, which works on a wide variety of OS distributions (RHEL 5 and RHEL 6, Ubuntu Precise and Lucid, Debian Wheezy, and SLES 11). You might have complex configuration combinations—highly available HDFS or simple HDFS, Kerberized or non-secure, YARN or MRv1, and so on. With these requirements, you need an easy way to create a new cluster that has the required setup. This cluster can also be used for integration, testing, customer support, demonstrations, and other purposes.
You can install and configure Hadoop according to precise specifications using the Cloudera Manager REST API. Using the API, you can add hosts, install CDH, and define the cluster and its services. You can also tune heap sizes, set up HDFS HA, turn on Kerberos security and generate keytabs, and customize service directories and ports. Every configuration available in Cloudera Manager is exposed in the API.
- Obtaining logs and monitoring the system
- Starting and stopping services
- Polling cluster events
- Creating a disaster recovery replication schedule
Use scenarios for the Cloudera Manager API for cluster automation might include:
- OEM and hardware partners that deliver Hadoop-in-a-box appliances using the API to set up CDH and Cloudera Manager on bare metal in the factory.
- Automated deployment of new clusters, using a combination of Puppet and the Cloudera Manager API. Puppet does the OS-level provisioning and installs the software. The Cloudera Manager API sets up the Hadoop services and configures the cluster.
- Integrating the API with reporting and alerting infrastructure. An external script can poll the API for health and metrics information, as well as the stream of events and alerts, to feed into a custom dashboard.
Java API Example
This example covers the Java API client.
To use the Java client, add this dependency to your project's pom.xml:
<project> <repositories> <repository> <id>cdh.repo</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <name>Cloudera Repository</name> </repository> … </repositories> <dependencies> <dependency> <groupId>com.cloudera.api</groupId> <artifactId>cloudera-manager-api</artifactId> <version>4.6.2</version> <!-- Set to the version of Cloudera Manager you use --> </dependency> … </dependencies> ... </project>
The Java client works like a proxy. It hides from the caller any details about REST, HTTP, and JSON. The entry point is a handle to the root of the API:
RootResourcev17 apiRoot = new ClouderaManagerClientBuilder().withHost("cm.cloudera.com") .withUsernamePassword("admin", "admin").build().getRootv17();
- RootResourcev17
- ClustersResourcev17 - host membership, start cluster
- ServicesResourcev17 - configuration, get metrics, HA, service commands
- RolesResource - add roles, get metrics, logs
- RoleConfigGroupsResource - configuration
- ParcelsResource - parcel management
- ServicesResourcev17 - configuration, get metrics, HA, service commands
- ClustersResourcev17 - host membership, start cluster
- HostsResource - host management, get metrics
- UsersResource - user management
For more information, see the Javadoc.
The following example lists and starts a cluster:
// List of clusters ApiClusterList clusters = apiRoot.getClustersResource().readClusters(DataView.SUMMARY); for (ApiCluster cluster : clusters) { LOG.info("{}: {}", cluster.getName(), cluster.getVersion()); } // Start the first cluster ApiCommand cmd = apiRoot.getClustersResource().startCommand(clusters.get(0).getName()); while (cmd.isActive()) { Thread.sleep(100); cmd = apiRoot.getCommandsResource().readCommand(cmd.getId()); } LOG.info("Cluster start {}", cmd.getSuccess() ? "succeeded" : "failed " + cmd.getResultMessage());
Python Example
You can see an example of automation with Python at the following link: Python example. The example contains information on the requirements and steps to automate a cluster deployment.