Deploying HBase on a Cluster
After you have HBase running in pseudo-distributed mode, the same configuration can be extended to running on a distributed cluster.
Choosing Where to Deploy the Processes
For small clusters, Cloudera recommends designating one node in your cluster as the HBase Master node. On this node, you will typically run the HBase Master and a ZooKeeper quorum peer. These master processes may be collocated with the Hadoop NameNode and JobTracker for small clusters.
Designate the remaining nodes as RegionServer nodes. On each node, Cloudera recommends running a RegionServer, which may be collocated with a Hadoop TaskTracker (MRv1) and a DataNode. When co-locating with TaskTrackers, be sure that the resources of the machine are not oversubscribed – it's safest to start with a small number of MapReduce slots and work up slowly.
The HBase Thrift service is light-weight, and can be run on any node in the cluster.
Configuring for Distributed Operation
After you have decided which machines will run each process, you can edit the configuration so that the nodes can locate each other. In order to do so, you should make sure that the configuration files are synchronized across the cluster. Cloudera strongly recommends the use of a configuration management system to synchronize the configuration files, though you can use a simpler solution such as rsync to get started quickly.
The only configuration change necessary to move from pseudo-distributed operation to fully-distributed operation is the addition of the ZooKeeper Quorum address in hbase-site.xml. Insert the following XML property to configure the nodes with the address of the node where the ZooKeeper quorum peer is running:
<property> <name>hbase.zookeeper.quorum</name> <value>mymasternode</value> </property>
<property> <name>hbase.zookeeper.quorum</name> <value>zk1.example.com:2181,zk2.example.com:20000,zk3.example.com:31111</value> </property>For more information, see this chapter of the Apache HBase Reference Guide.
To start the cluster, start the services in the following order:
- The ZooKeeper Quorum Peer
- The HBase Master
- Each of the HBase RegionServers
After the cluster is fully started, you can view the HBase Master web interface on port 60010 and verify that each of the RegionServer nodes has registered properly with the master.
See also Accessing HBase by using the HBase Shell, Using MapReduce with HBase and Troubleshooting HBase. For instructions on improving the performance of local reads, see Optimizing Performance in CDH.