Tuning and Troubleshooting Host Decommissioning
Decommissioning a host decommissions and stops all roles on the host without requiring you to individually decommission the roles on each service. The decommissioning process can take a long time and uses a great deal of cluster resources, including network bandwidth. You can tune the decommissioning process to improve performance and mitigate the performance impact on the cluster.
You can use the Decommission and Recommission features to perform minor maintenance on cluster hosts using Cloudera Manager to manage the process. See Performing Maintenance on a Cluster Host.
Tuning HDFS Prior to Decommissioning DataNodes
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
When a DataNode is decommissioned, the NameNode ensures that every block from the DataNode will still be available across the cluster as dictated by the replication factor. This procedure involves copying blocks from the DataNode in small batches. If a DataNode has thousands of blocks, decommissioning can take several hours. Before decommissioning hosts with DataNodes, you should first tune HDFS:
- Run the following command to identify any problems in the HDFS file system:
hdfs fsck / -list-corruptfileblocks -openforwrite -files -blocks -locations 2>&1 > /tmp/hdfs-fsck.txt
- Fix any issues reported by the fsck command. If the command output lists corrupted files, use the fsck command to move
them to the lost+found directory or delete them:
hdfs fsck file_name -move
orhdfs fsck file_name -delete
- Raise the heap size of the DataNodes. DataNodes should be configured with at least 4 GB heap size to allow for the increase in iterations and max streams.
- Go to the HDFS service page.
- Click the Configuration tab.
- Select .
- Select .
- Set the Java Heap Size of DataNode in Bytes property as recommended.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Increase the replication work multiplier per iteration to a larger number (the default is 2, however 10 is recommended):
- Select .
- Expand the category.
- Configure the Replication Work Multiplier Per Iteration property to a value such as 10.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Increase the replication maximum threads and maximum replication thread hard limits:
- Select .
- Expand the category.
- Configure the Maximum number of replication threads on a DataNode and Hard limit on the number of replication threads on a
DataNode properties to 50 and 100 respectively. You can decrease the number of threads (or use the default values) to minimize the impact of decommissioning on the cluster, but the trade off
is that decommissioning will take longer.
To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart the HDFS service.
For additional tuning recommendations, see Performance Considerations.
Tuning HBase Prior to Decommissioning DataNodes
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
To increase the speed of a rolling restart of the HBase service, set the Region Mover Threads property to a higher value. This increases the number of regions that can be moved in parallel, but places additional strain on the HMaster. In most cases, Region Mover Threads should be set to 5 or lower.
Performance Considerations
Decommissioning a DataNode does not happen instantly because the process requires replication of a potentially large number of blocks. During decommissioning, the performance of your cluster may be impacted. This section describes the decommissioning process and suggests solutions for several common performance issues.
- The Commission State of the DataNode is marked as Decommissioning and the data is replicated from this node to other available nodes. Until all blocks are replicated, the node remains in a Decommissioning state. You can view this state from the NameNode Web UI. (Go to the HDFS service and select .)
- When all data blocks are replicated to other nodes, the node is marked as Decommissioned.
Decommissioning can impact performance in the following ways:
- There must be enough disk space on the other active DataNodes for the data to be replicated. After decommissioning, the remaining active DataNodes have more blocks and therefore decommissioning these DataNodes in the future may take more time.
- There will be increased network traffic and disk I/O while the data blocks are replicated.
- Data balance and data locality can be affected, which can lead to a decrease in performance of any running or submitted jobs.
- Decommissioning a large numbers of DataNodes at the same time can decrease performance.
- If you are decommissioning a minority of the DataNodes, the speed of data reads from these nodes limits the performance of decommissioning because decommissioning maxes out network bandwidth when reading data blocks from the DataNode and spreads the bandwidth used to replicate the blocks among other DataNodes in the cluster. To avoid performance impacts in the cluster, Cloudera recommends that you only decommission a minority of the DataNodes at the same time.
- You can decrease the number of replication threads to decrease the performance impact of the replications, but this will cause the decommissioning process to take longer to complete. See Tuning HDFS Prior to Decommissioning DataNodes.
Cloudera recommends that you add DataNodes and decommission DataNodes in parallel, in smaller groups. For example, if the replication factor is 3, then you should add two DataNodes and decommission two DataNodes at the same time.
Troubleshooting Performance of Decommissioning
- Open Files
- Write operations on the DataNode do not involve the NameNode. If there are blocks associated with open files located on a DataNode, they are not relocated until the file is closed.
This commonly occurs with:
- Clusters using HBase
- Open Flume files
- Long running tasks
Run the following command to list all the open files blocking decommissioning:hdfs dfsadmin -listOpenFiles -blockingDecommission
The command returns a list similar to the following example:Client Host Client Name Open File Path 172.26.12.77 DFSClient_NONMAPREDUCE_-698274460_1 /hbase/oldWALs/dn3.cloudera.com%2C22101%2C1540973344249.dn3.cloudera.com%2C22101%2C1540973344249.regiongroup-0.1540998570989
Using the list of open file(s), perform the appropriate action to restart process to close the file. For example, major compaction closes all files in a region for HBase.
- A block cannot be relocated because there are not enough DataNodes to satisfy the block placement policy.
- For example, for a 10 node cluster, if the mapred.submit.replication is set to the default of 10 while attempting to decommission one
DataNode, there will be difficulties relocating blocks that are associated with map/reduce jobs. This condition will lead to errors in the NameNode logs similar to the
following:
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault: Not able to place enough replicas, still in need of 3 to reach 3
Use the following steps to find the number of files where the block replication policy is equal to or above your current cluster size:- Provide a listing of open files, their blocks, the locations of those blocks by running the following command:
hadoop fsck / -files -blocks -locations -openforwrite 2>&1 > openfiles.out
- Run the following command to return a list of how many files have a given replication factor:
grep repl= openfiles.out | awk '{print $NF}' | sort | uniq -c
For example, when the replication factor is 10 , and decommissioning one:egrep -B4 "repl=10" openfiles.out | grep -v '<dir>' | awk '/^\//{print $1}'
- Examine the paths, and decide whether to reduce the replication factor of the files, or remove them from the cluster.
- Provide a listing of open files, their blocks, the locations of those blocks by running the following command: