Set Up HDFS Using the Command Line
Proceed as follows to deploy HDFS on a cluster:
- Copying the Hadoop Configuration and Setting Alternatives
- Customizing Configuration Files
- Configuring Local Storage Directories
- Configuring DataNodes to Tolerate Local Storage Directory Failure
- Formatting the NameNode
- Configuring a Remote NameNode Storage Directory
- Configuring the Secondary NameNode
- Enabling Trash
- Configuring Storage Balancing for DataNodes
- Enabling WebHDFS
- Configuring LZO
- Start HDFS
- Deploy YARN
Copying the Hadoop Configuration and Setting Alternatives
To customize the Hadoop configuration:
- Copy the default configuration to your custom directory:
sudo cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
You can call this configuration anything you like; in this example, it's called my_cluster. - CDH uses the alternatives setting to determine which Hadoop configuration to use. Set alternatives to point to your custom
directory, as follows.
To manually set the configuration on RHEL-compatible systems:
sudo alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
To manually set the configuration on Ubuntu and SLES systems:
sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
This tells CDH to use the configuration in /etc/hadoop/conf.my_cluster.
You can display the current alternatives setting as follows.
sudo alternatives --display hadoop-conf
sudo update-alternatives --display hadoop-confYou should see output such as the following:
hadoop-conf - status is auto. link currently points to /etc/hadoop/conf.my_cluster /etc/hadoop/conf.my_cluster - priority 50 /etc/hadoop/conf.empty - priority 10 Current `best' version is /etc/hadoop/conf.my_cluster.
Because the configuration in /etc/hadoop/conf.my_cluster has the highest priority (50), that is the one CDH will use. For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On Red Hat-compatible systems.
Customizing Configuration Files
The following tables show the most important properties that you must configure for your cluster.
Property |
Configuration File |
Description |
---|---|---|
fs.defaultFS |
core-site.xml |
Note: fs.default.name is deprecated. Specifies the NameNode and the default file system, in the form hdfs://<namenode host>:<namenode port>/. The default value is file///. The default file system is used to resolve relative paths; for example, if fs.default.name or fs.defaultFS is set to hdfs://mynamenode/, the relative URI /mydir/myfile resolves to hdfs://mynamenode/mydir/myfile. Note: for the cluster to function correctly, the <namenode> part of the string must be the hostname (for example mynamenode), or the HA-enabled logical URI, not the IP address. |
dfs.permissions.superusergroup |
hdfs-site.xml |
Specifies the UNIX group containing users that will be treated as superusers by HDFS. You can stick with the value of 'hadoop' or pick your own group depending on the security policies at your site. |
Configuring Local Storage Directories
You need to specify, create, and assign the correct permissions to the local directories where you want the HDFS daemons to store data. You specify the directories by configuring the following two properties in the hdfs-site.xml file.
Property |
Configuration File Location |
Description |
---|---|---|
dfs.name.dir or dfs.namenode.name.dir |
hdfs-site.xml on the NameNode |
This property specifies the URIs of the directories where the NameNode stores its metadata and edit logs. Cloudera recommends that you specify at least two directories. One of these should be located on an NFS mount point, unless you will be using a HDFS HA configuration. |
dfs.data.dir or dfs.datanode.data.dir |
hdfs-site.xml on each DataNode |
This property specifies the URIs of the directories where the DataNode stores blocks. Cloudera recommends that you configure the disks on the DataNode in a JBOD configuration, mounted at /data/1/ through /data/N, and configure dfs.data.dir or dfs.datanode.data.dir to specify file:///data/1/dfs/dn through file:///data/N/dfs/dn/. |
Sample configuration:
hdfs-site.xml on the NameNode:
<property> <name>dfs.namenode.name.dir</name> <value>file:///data/1/dfs/nn,file:///nfsmount/dfs/nn</value> </property>
hdfs-site.xml on each DataNode:
<property> <name>dfs.datanode.data.dir</name> <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value> </property>
After specifying these directories as shown above, you must create the directories and assign the correct file permissions to them on each node in your cluster.
In the following instructions, local path examples are used to represent Hadoop parameters. Change the path examples to match your configuration.
Local directories:
- The dfs.name.dir or dfs.namenode.name.dir parameter is represented by the /data/1/dfs/nn and /nfsmount/dfs/nn path examples.
- The dfs.data.dir or dfs.datanode.data.dir parameter is represented by the /data/1/dfs/dn, /data/2/dfs/dn, /data/3/dfs/dn, and /data/4/dfs/dn examples.
To configure local storage directories for use by HDFS:
- On a NameNode host: create the dfs.name.dir or dfs.namenode.name.dir local directories:
sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn
- On all DataNode hosts: create the dfs.data.dir or dfs.datanode.data.dir local directories:
sudo mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
- Configure the owner of the dfs.name.dir or dfs.namenode.name.dir directory, and of the dfs.data.dir or dfs.datanode.data.dir directory, to be the hdfs user:
sudo chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
Here is a summary of the correct owner and permissions of the local directories:Directory
Owner
Permissions (see Footnote 1)
dfs.name.dir or dfs.namenode.name.dir
hdfs:hdfs
drwx------
dfs.data.dir or dfs.datanode.data.dir
hdfs:hdfs
drwx------
sudo chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
orsudo chmod go-rx /data/1/dfs/nn /nfsmount/dfs/nn
Configuring DataNodes to Tolerate Local Storage Directory Failure
By default, the failure of a single dfs.data.dir or dfs.datanode.data.dir will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed.
To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir or dfs.datanode.data.dir directories; use the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml. For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup; in this example the DataNode will start up as long as no more than three directories have failed.
Formatting the NameNode
Before starting the NameNode for the first time you need to format the file system.
sudo -u hdfs hdfs namenode -format
You'll get a confirmation prompt; for example:
Re-format filesystem in /data/namedir ? (Y or N)
Configuring a Remote NameNode Storage Directory
You should configure the NameNode to write to multiple storage directories, including one remote NFS mount. To keep NameNode processes from hanging when the NFS server is unavailable, configure the NFS mount as a soft mount (so that I/O requests that time out fail rather than hang), and set other options as follows:
tcp,soft,intr,timeo=10,retrans=10
These options configure a soft mount over TCP; transactions will be retried ten times (retrans=10) at 1-second intervals (timeo=10) before being deemed to have failed.
Example:
mount -t nfs -o tcp,soft,intr,timeo=10,retrans=10, <server>:<export> <mount_point>
where <server> is the remote host, <export> is the exported file system, and <mount_point> is the local mount point.
Example for HA:
mount -t nfs -o tcp,soft,intr,timeo=50,retrans=12, <server>:<export> <mount_point>
Note that in the HA case timeo should be set to 50 (five seconds), rather than 10 (1 second), and retrans should be set to 12, giving an overall timeout of 60 seconds.
For more information, see the man pages for mount and nfs.
Configuring Remote Directory Recovery
You can enable the dfs.namenode.name.dir.restore option so that the NameNode will attempt to recover a previously failed NameNode storage directory on the next checkpoint. This is useful for restoring a remote storage directory mount that has failed because of a network outage or intermittent NFS failure.
Configuring the Secondary NameNode
In non-HA deployments, configure a Secondary NameNode that will periodically merge the EditLog with the FSImage, creating a new FSImage which incorporates the changes which were in the EditLog. This reduces the amount of disk space consumed by the EditLog on the NameNode, and also reduces the restart time for the Primary NameNode.
A standard Hadoop cluster (not a Hadoop Federation or HA configuration), can have only one Primary NameNode plus one Secondary NameNode. On production systems, the Secondary NameNode should run on a different machine from the Primary NameNode to improve scalability (because the Secondary NameNode does not compete with the NameNode for memory and other resources to create the system snapshot) and durability (because the copy of the metadata is on a separate machine that is available if the NameNode hardware fails).
Configuring the Secondary NameNode on a Separate Machine
To configure the Secondary NameNode on a separate machine from the NameNode, proceed as follows.
- Add the name of the machine that will run the Secondary NameNode to the masters file.
- Add the following property to the hdfs-site.xml file:
<property> <name>dfs.namenode.http-address</name> <value><namenode.host.address>:50070</value> <description> The address and the base port on which the dfs NameNode Web UI will listen. </description> </property>
For more information, see Multi-host SecondaryNameNode Configuration.
More about the Secondary NameNode
- The NameNode stores the HDFS metadata information in RAM to speed up interactive lookups and modifications of the metadata.
- For reliability, this information is flushed to disk periodically. To ensure that these writes are not a speed bottleneck, only the list of modifications is written to disk, not a full snapshot of the current filesystem. The list of modifications is appended to a file called edits.
- Over time, the edits log file can grow quite large and consume large amounts of disk space.
- When the NameNode is restarted, it takes the HDFS system state from the fsimage file, then applies the contents of the edits log to construct an accurate system state that can be loaded into the NameNode's RAM. If you restart a large cluster that has run for a long period with no Secondary NameNode, the edits log may be quite large, and so it can take some time to reconstruct the system state to be loaded into RAM.
When the Secondary NameNode is configured, it periodically constructs a checkpoint by compacting the information in the edits log and merging it with the most recent fsimage file; it then clears the edits log. So, when the NameNode restarts, it can use the latest checkpoint and apply the contents of the smaller edits log. The interval between checkpoints is determined by the checkpoint period (dfs.namenode.checkpoint.period) or the number of edit transactions (dfs.namenode.checkpoint.txns). The default checkpoint period is one hour, and the default number of edit transactions before a checkpoint is 1,000,000. The SecondaryNameNode will checkpoint in an hour if there have not been 1,000,000 edit transactions within the hour; it will checkpoint after 1,000,000 transactions have been committed if they were committed in under one hour.
Secondary NameNode Parameters
The behavior of the Secondary NameNode is controlled by the following parameters in hdfs-site.xml.
- dfs.namenode.checkpoint.check.period
- dfs.namenode.checkpoint.txns
- dfs.namenode.checkpoint.dir
- dfs.namenode.checkpoint.edits.dir
- dfs.namenode.num.checkpoints.retained
See https://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml for details.
Enabling Trash
The Hadoop trash feature helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory in the user's home directory instead of being deleted. Deleted files are initially moved to the Current sub-directory of the .Trash directory, and their original path is preserved. If trash checkpointing is enabled, the Current directory is periodically renamed using a timestamp. Files in .Trash are permanently removed after a user-configurable time delay. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory.
Trash is configured with the following properties in the core-site.xml file:
CDH Parameter |
Value |
Description |
---|---|---|
fs.trash.interval |
minutes or 0 |
The number of minutes after which a trash checkpoint directory is deleted. This option can be configured both on the server and the client.
|
fs.trash.checkpoint.interval |
minutes or 0 |
The number of minutes between trash checkpoints. Every time the checkpointer runs on the NameNode, it creates a new checkpoint of the "Current" directory and removes checkpoints older than fs.trash.interval minutes. This value should be smaller than or equal to fs.trash.interval. This option is configured on the server. If configured to zero (the default), then the value is set to the value of fs.trash.interval. |
Configuring Storage Balancing for DataNodes
You can configure HDFS to distribute writes on each DataNode in a manner that balances out available storage among that DataNode's disk volumes.
By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.
- how much DataNode volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced, and
- what percentage of new block allocations will be sent to volumes with more available disk space than others.
Property |
Value |
Description |
---|---|---|
dfs.datanode. fsdataset. volume.choosing. policy |
org.apache.hadoop. hdfs.server.datanode. fsdataset. AvailableSpaceVolumeChoosingPolicy |
Enables storage balancing among the DataNode's volumes. |
dfs.datanode. available-space- volume-choosing- policy.balanced- space-threshold |
10737418240 (default) |
The amount by which volumes are allowed to differ from each other in terms of bytes of free disk space before they are considered imbalanced. The default is 10737418240 (10 GB). If the free space on each volume is within this range of the other volumes, the volumes will be considered balanced and block assignments will be done on a pure round-robin basis. |
dfs.datanode. available-space- volume-choosing- policy.balanced- space-preference- fraction |
0.75 (default) | What proportion of new block allocations will be sent to volumes with more available disk space than others. The allowable range is 0.0-1.0, but set it in the range 0.5 - 1.0 (that is, 50-100%), since there should be no reason to prefer that volumes with less available disk space receive more block allocations. |
Enabling WebHDFS
If you want to use WebHDFS, you must first enable it.
To enable WebHDFS:
Set the following property in hdfs-site.xml:
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
To enable numeric usernames in WebHDFS:
^[A-Za-z_][A-Za-z0-9._-]*[$]?$You can override the default username pattern by setting the dfs.webhdfs.user.provider.user.pattern property in hdfs-site.xml. For example, to allow numerical usernames, the property can be set as follows:
<property> <name>dfs.webhdfs.user.provider.user.pattern</name> <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value> </property>
Configuring LZO
If you have installed LZO, configure it as follows.
To configure LZO:
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value> </property>
For more information about LZO, see Using LZO Compression.
Start HDFS
Deploy the configuration
To deploy your configuration to your entire cluster:
- Push your custom directory (for example /etc/hadoop/conf.my_cluster) to each node in your cluster; for example:
scp -r /etc/hadoop/conf.my_cluster myuser@myCDHnode-<n>.mycompany.com:/etc/hadoop/conf.my_cluster
- Manually set alternatives on each node to point to that directory, as follows.
To manually set the configuration on RHEL-compatible systems:
sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
To manually set the configuration on Ubuntu and SLES systems:
sudo update-alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 sudo update-alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
For more information on alternatives, see the update-alternatives(8) man page on Ubuntu and SLES systems or the alternatives(8) man page On RHEL-compatible systems.
Start HDFS
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done
Create the /tmp Directory
Create the /tmp directory after HDFS is up and running, and set its permissions to 1777 (drwxrwxrwt), as follows:
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Deploy YARN
To deploy YARN, see Setting Up MapReduce v2 with YARN Using the Command Line