Configuring Storage Balancing for DataNodes
You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.
By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.
- how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and
- what percentage of new block allocations will be sent to volumes with more available disk space than others.
Configuring Storage Balancing for DataNodes Using Cloudera Manager
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
- Go to the HDFS service.
- Click the Configuration tab.
- Select .
- Select .
- Configure the following properties (you can use the Search box to locate the properties):
Property
Value
Description
dfs.datanode. fsdataset. volume.choosing. policy
org.apache.hadoop. hdfs.server.datanode. fsdataset. AvailableSpaceVolumeChoosingPolicy
Enables storage balancing among the DataNode's volumes.
dfs.datanode. available-space- volume-choosing- policy.balanced- space-threshold
10737418240 (default) The amount by which volumes are allowed to differ from each other in terms of bytes of free disk space before they are considered imbalanced. The default is 10737418240 (10 GB).
If the free space on each volume is within this range of the other volumes, the volumes will be considered balanced and block assignments will be done on a pure round-robin basis.
dfs.datanode. available-space- volume-choosing- policy.balanced- space-preference- fraction
0.75 (default) What proportion of new block allocations will be sent to volumes with more available disk space than others. The allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block allocations. To apply this configuration property to other role groups as needed, edit the value for the appropriate role group. See Modifying Configuration Properties Using Cloudera Manager.
- Click Save Changes to commit the changes.
- Restart the role.
Configuring Storage Balancing for DataNodes Using the Command Line
This section applies to unmanaged deployments without Cloudera Manager. See Configuring Storage Balancing for DataNodes.