Decommissioning DataNodes Using the Command Line
Decommissioning a DataNode excludes it from a cluster after its data is replicated to active nodes. To decommision a DataNode:
- Create a file named dfs.exclude in the HADOOP_CONF_DIR (default is /etc/hadoop/conf).
- Add the name of each DataNode host to be decommissioned on individual lines.
- Stop the TaskTracker on the DataNode to be decommissioned.
- Add the following property to hdfs-site.xml on the NameNode host.
<property> <name>dfs.hosts.exclude</name> <value>/etc/hadoop/conf/dfs.exclude</value> <property>
When a DataNode is marked for decommission, all of the blocks on that DataNode are marked as under replicated. In the NameNode UI under Decommissioning DataNodes you can see a total number of under replicated blocks, which will reduce over time, indicating decommissioning progress.
Cloudera recommends that you decommission no more than two DataNodes at one time.