Deploying HDFS on a Cluster

Proceed as follows to deploy HDFS on a cluster. Do this for all clusters, whether you are deploying MRv1 or YARN:

Copying the Hadoop Configuration and Setting Alternatives

To customize the Hadoop configuration:

  1. Copy the default configuration to your custom directory:
    $ sudo cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
    You can call this configuration anything you like; in this example, it's called my_cluster.
  2. CDH uses the alternatives setting to determine which Hadoop configuration to use. Set alternatives to point to your custom directory, as follows.

    To manually set the configuration on RHEL-compatible systems:

    $ sudo alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    $ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

    To manually set the configuration on Ubuntu and SLES systems:

    $ sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    $ sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
    This tells CDH to use the configuration in /etc/hadoop/conf.my_cluster.

You can display the current alternatives setting as follows.

To display the current setting on RHEL-compatible systems:
sudo alternatives --display hadoop-conf
To display the current setting on Ubuntu, Debian, and SLES systems:
sudo update-alternatives --display hadoop-conf
You should see output such as the following:
hadoop-conf - status is auto. 
link currently points to /etc/hadoop/conf.my_cluster
/etc/hadoop/conf.my_cluster - priority 50 
/etc/hadoop/conf.empty - priority 10 
Current `best' version is /etc/hadoop/conf.my_cluster. 

Because the configuration in /etc/hadoop/conf.my_cluster has the highest priority (50), that is the one CDH will use. For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On Red Hat-compatible systems.

Customizing Configuration Files

The following tables show the most important properties that you must configure for your cluster.

Property

Configuration File

Description

fs.defaultFS

core-site.xml

Note: fs.default.name is deprecated. Specifies the NameNode and the default file system, in the form hdfs://<namenode host>:<namenode port>/. The default value is file///. The default file system is used to resolve relative paths; for example, if fs.default.name or fs.defaultFS is set to hdfs://mynamenode/, the relative URI /mydir/myfile resolves to hdfs://mynamenode/mydir/myfile. Note: for the cluster to function correctly, the <namenode> part of the string must be the hostname (for example mynamenode), or the HA-enabled logical URI, not the IP address.

dfs.permissions.superusergroup

hdfs-site.xml

Specifies the UNIX group containing users that will be treated as superusers by HDFS. You can stick with the value of 'hadoop' or pick your own group depending on the security policies at your site.

Sample Configuration

core-site.xml:

<property>
 <name>fs.defaultFS</name>
 <value>hdfs://namenode-host.company.com:8020</value>
</property>

hdfs-site.xml:

<property>
 <name>dfs.permissions.superusergroup</name>
 <value>hadoop</value>
</property>

Configuring Local Storage Directories

You need to specify, create, and assign the correct permissions to the local directories where you want the HDFS daemons to store data. You specify the directories by configuring the following two properties in the hdfs-site.xml file.

Property

Configuration File Location

Description

dfs.name.dir or dfs.namenode.name.dir

hdfs-site.xml on the NameNode

This property specifies the URIs of the directories where the NameNode stores its metadata and edit logs. Cloudera recommends that you specify at least two directories. One of these should be located on an NFS mount point, unless you will be using a HDFS HA configuration.

dfs.data.dir or dfs.datanode.data.dir

hdfs-site.xml on each DataNode

This property specifies the URIs of the directories where the DataNode stores blocks. Cloudera recommends that you configure the disks on the DataNode in a JBOD configuration, mounted at /data/1/ through /data/N, and configure dfs.data.dir or dfs.datanode.data.dir to specify file:///data/1/dfs/dn through file:///data/N/dfs/dn/.

Sample configuration:

hdfs-site.xml on the NameNode:

<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value>
</property>

hdfs-site.xml on each DataNode:

<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
</property>

After specifying these directories as shown above, you must create the directories and assign the correct file permissions to them on each node in your cluster.

In the following instructions, local path examples are used to represent Hadoop parameters. Change the path examples to match your configuration.

Local directories:

  • The dfs.name.dir or dfs.namenode.name.dir parameter is represented by the /data/1/dfs/nn and /nfsmount/dfs/nn path examples.
  • The dfs.data.dir or dfs.datanode.data.dir parameter is represented by the /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn, and /data/4/dfs/dn examples.

To configure local storage directories for use by HDFS:

  1. On a NameNode host: create the dfs.name.dir or dfs.namenode.name.dir local directories:
    $ sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
  2. On all DataNode hosts: create the dfs.data.dir or dfs.datanode.data.dir local directories:
    $ sudo mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
  3. Configure the owner of the dfs.name.dir or dfs.namenode.name.dir directory, and of the dfs.data.dir or dfs.datanode.data.dir directory, to be the hdfs user:
    $ sudo chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
    Here is a summary of the correct owner and permissions of the local directories:

    Directory

    Owner

    Permissions (see Footnote 1)

    dfs.name.dir or dfs.namenode.name.dir

    hdfs:hdfs

    drwx------

    dfs.data.dir or dfs.datanode.data.dir

    hdfs:hdfs

    drwx------

    Footnote: 1 The Hadoop daemons automatically set the correct permissions for you on dfs.data.dir or dfs.datanode.data.dir. But in the case of dfs.name.dir or dfs.namenode.name.dir, permissions are currently incorrectly set to the file-system default, usually drwxr-xr-x (755). Use the chmod command to reset permissions for these dfs.name.dir or dfs.namenode.name.dir directories to drwx------ (700); for example:
    $ sudo chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
    or
    $ sudo chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn

Configuring DataNodes to Tolerate Local Storage Directory Failure

By default, the failure of a single dfs.data.dir or dfs.datanode.data.dir will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed.

To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir or dfs.datanode.data.dir directories; use the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml. For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup; in this example the DataNode will start up as long as no more than three directories have failed.

Formatting the NameNode

Before starting the NameNode for the first time you need to format the file system.

$ sudo -u hdfs hdfs namenode -format

You'll get a confirmation prompt; for example:

Re-format filesystem in /data/namedir ? (Y or N)

Configuring a Remote NameNode Storage Directory

You should configure the NameNode to write to multiple storage directories, including one remote NFS mount. To keep NameNode processes from hanging when the NFS server is unavailable, configure the NFS mount as a soft mount (so that I/O requests that time out fail rather than hang), and set other options as follows:

tcp,soft,intr,timeo=10,retrans=10

These options configure a soft mount over TCP; transactions will be retried ten times (retrans=10) at 1-second intervals (timeo=10) before being deemed to have failed.

Example:

mount -t nfs -o tcp,soft,intr,timeo=10,retrans=10, <server>:<export> <mount_point>

where <server> is the remote host, <export> is the exported file system, and <mount_point> is the local mount point.

Example for HA:

mount -t nfs -o tcp,soft,intr,timeo=50,retrans=12, <server>:<export> <mount_point>

Note that in the HA case timeo should be set to 50 (five seconds), rather than 10 (1 second), and retrans should be set to 12, giving an overall timeout of 60 seconds.

For more information, see the man pages for mount and nfs.

Configuring Remote Directory Recovery

You can enable the dfs.namenode.name.dir.restore option so that the NameNode will attempt to recover a previously failed NameNode storage directory on the next checkpoint. This is useful for restoring a remote storage directory mount that has failed because of a network outage or intermittent NFS failure.

Configuring the Secondary NameNode

In non-HA deployments, configure a Secondary NameNode that will periodically merge the EditLog with the FSImage, creating a new FSImage which incorporates the changes which were in the EditLog. This reduces the amount of disk space consumed by the EditLog on the NameNode, and also reduces the restart time for the Primary NameNode.

A standard Hadoop cluster (not a Hadoop Federation or HA configuration), can have only one Primary NameNode plus one Secondary NameNode. On production systems, the Secondary NameNode should run on a different machine from the Primary NameNode to improve scalability (because the Secondary NameNode does not compete with the NameNode for memory and other resources to create the system snapshot) and durability (because the copy of the metadata is on a separate machine that is available if the NameNode hardware fails).

Configuring the Secondary NameNode on a Separate Machine

To configure the Secondary NameNode on a separate machine from the NameNode, proceed as follows.

  1. Add the name of the machine that will run the Secondary NameNode to the masters file.
  2. Add the following property to the hdfs-site.xml file:
    <property>
      <name>dfs.namenode.http-address</name>
      <value><namenode.host.address>:50070</value>
      <description>
        The address and the base port on which the dfs NameNode Web UI will listen.
      </description>
    </property>

For more information, see Multi-host SecondaryNameNode Configuration.

More about the Secondary NameNode

  • The NameNode stores the HDFS metadata information in RAM to speed up interactive lookups and modifications of the metadata.
  • For reliability, this information is flushed to disk periodically. To ensure that these writes are not a speed bottleneck, only the list of modifications is written to disk, not a full snapshot of the current filesystem. The list of modifications is appended to a file called edits.
  • Over time, the edits log file can grow quite large and consume large amounts of disk space.
  • When the NameNode is restarted, it takes the HDFS system state from the fsimage file, then applies the contents of the edits log to construct an accurate system state that can be loaded into the NameNode's RAM. If you restart a large cluster that has run for a long period with no Secondary NameNode, the edits log may be quite large, and so it can take some time to reconstruct the system state to be loaded into RAM.

When the Secondary NameNode is configured, it periodically constructs a checkpoint by compacting the information in the edits log and merging it with the most recent fsimage file; it then clears the edits log. So, when the NameNode restarts, it can use the latest checkpoint and apply the contents of the smaller edits log. The interval between checkpoints is determined by the checkpoint period (dfs.namenode.checkpoint.period) or the number of edit transactions (dfs.namenode.checkpoint.txns). The default checkpoint period is one hour, and the default number of edit transactions before a checkpoint is 1,000,000. The SecondaryNameNode will checkpoint in an hour if there have not been 1,000,000 edit transactions within the hour; it will checkpoint after 1,000,000 transactions have been committed if they were committed in under one hour.

Secondary NameNode Parameters

The behavior of the Secondary NameNode is controlled by the following parameters in hdfs-site.xml.

  • dfs.namenode.checkpoint.check.period
  • dfs.namenode.checkpoint.txns
  • dfs.namenode.checkpoint.dir
  • dfs.namenode.checkpoint.edits.dir
  • dfs.namenode.num.checkpoints.retained

See https://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml for details.

Enabling Trash

The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.

Trash is configured with the following properties in the core-site.xml file:

CDH Parameter

Value

Description

fs.trash.interval

minutes or 0

The number of minutes after which a trash checkpoint directory is deleted. This option can be configured both on the server and the client.

  • If trash is enabled on the server configuration, then the value configured on the server is used and the client configuration is ignored.
  • If trash is disabled in the server configuration, then the client side configuration is checked.
  • If the value of this property is zero (the default), then the trash feature is disabled.

fs.trash.checkpoint.interval

minutes or 0

The number of minutes between trash checkpoints. Every time the checkpointer runs on the NameNode, it creates a new checkpoint of the "Current" directory and removes checkpoints older than fs.trash.interval minutes. This value should be smaller than or equal to fs.trash.interval. This option is configured on the server. If configured to zero (the default), then the value is set to the value of fs.trash.interval.

For example, to enable trash so that files deleted using the Hadoop shell are not deleted for 24 hours, set the value of the fs.trash.interval property in the server's core-site.xml file to a value of 1440.

Configuring Storage Balancing for DataNodes

You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.

By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.

You can configure
  • how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and
  • what percentage of new block allocations will be sent to volumes with more available disk space than others.
To configure storage balancing, set the following properties in hdfs-site.xml.

Property

Value

Description

dfs.datanode.
fsdataset.
volume.choosing.
policy
org.apache.hadoop.
hdfs.server.datanode.
fsdataset.
AvailableSpaceVolumeChoosingPolicy

Enables storage balancing among the DataNode's volumes.

dfs.datanode.
available-space-
volume-choosing-
policy.balanced-
space-threshold 
10737418240 (default)

The amount by which volumes are allowed to differ from each other in terms of bytes of free disk space before they are considered imbalanced. The default is 10737418240 (10 GB).

If the free space on each volume is within this range of the other volumes, the volumes will be considered balanced and block assignments will be done on a pure round-robin basis.

dfs.datanode.
available-space-
volume-choosing-
policy.balanced-
space-preference-
fraction
0.75 (default) What proportion of new block allocations will be sent to volumes with more available disk space than others. The allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block allocations.

Enabling WebHDFS

If you want to use WebHDFS, you must first enable it.

To enable WebHDFS:

Set the following property in hdfs-site.xml:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

To enable numeric usernames in WebHDFS:

By default, WebHDFS supports the following username pattern:
^[A-Za-z_][A-Za-z0-9._-]*[$]?$
You can override the default username pattern by setting the dfs.webhdfs.user.provider.user.pattern property in hdfs-site.xml. For example, to allow numerical usernames, the property can be set as follows:
<property>
  <name>dfs.webhdfs.user.provider.user.pattern</name>
  <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value>
</property>

Configuring LZO

If you have installed LZO, configure it as follows.

To configure LZO:

Set the following property in core-site.xml.
<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

For more information about LZO, see Using LZO Compression.

Start HDFS

To deploy HDFS now, proceed as follows.
  1. Deploy the configuration.
  2. Start HDFS.
  3. Create the /tmp directory.

Deploy the configuration

To deploy your configuration to your entire cluster:

  1. Push your custom directory (for example /etc/hadoop/conf.my_cluster) to each node in your cluster; for example:
    $ scp -r /etc/hadoop/conf.my_cluster myuser@myCDHnode-<n>.mycompany.com:/etc/hadoop/conf.my_cluster
  2. Manually set alternatives on each node to point to that directory, as follows.

    To manually set the configuration on RHEL-compatible systems:

    $ sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    $ sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

    To manually set the configuration on Ubuntu and SLES systems:

    $ sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    $ sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster

    For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On RHEL-compatible systems.

Start HDFS

Start HDFS on each node in the cluster, as follows:
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

Create the /tmp Directory

Create the /tmp directory after HDFS is up and running, and set its permissions to 1777 (drwxrwxrwt), as follows:

$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

Deploy YARN or MRv1

To to deploy MRv1 or YARN, and start HDFS services if you have not already done so, see